Complete monitoring infrastructure for LiteLLM multi-provider gateway using Prometheus, Grafana, and AlertManager.
# Services Deployed on:
- Prometheus: http://localhost:9090
- Grafana: http://localhost:3001 (user: admin, pass: admin)
- AlertManager: http://localhost:9093
- Node Exporter: http://localhost:9100
- Postgres Exporter: http://localhost:9187
- Redis Exporter: http://localhost:9121
- Docker Exporter: http://localhost:9417Location: /config/grafana/dashboards/litellm_dashboard.json
Panels:
- LiteLLM Status (UP/DOWN indicator)
- Request Rate (total requests vs errors)
- P95 Latency (5-minute window)
- Requests by Model (stacked bar chart)
- Requests by Provider (time series)
- Total Tokens Used (cumulative counter)
- Total Cost USD (cumulative cost)
- Error Rate (percentage)
Access:
# Local access (port 3001 to avoid conflict with Lago)
http://localhost:3001
# Via Traefik (production)
https://grafana.kubeworkz.ioDefault Credentials:
- Username:
admin - Password:
admin(change on first login)
Location: /config/prometheus/prometheus.yml
LiteLLM Scrape Jobs:
# LiteLLM proxy metrics (native Prometheus export)
- job_name: 'litellm'
scrape_interval: 10s
static_configs:
- targets: ['unicorn-litellm-wilmer:4000']
metrics_path: /metrics
# LiteLLM usage metrics (custom backend endpoint)
- job_name: 'litellm-usage'
scrape_interval: 30s
static_configs:
- targets: ['ops-center-direct:8084']
metrics_path: /api/v1/llm/metricsLocation: /config/prometheus/rules/litellm_alerts.yml
Alert Groups (6 groups, 15 alerts):
- LiteLLMDown (critical): Proxy down for 2+ minutes
- LiteLLMUnhealthy (warning): Health check fails for 5+ minutes
- LiteLLMHighLatency (warning): P95 latency > 10s for 10 minutes
- LiteLLMHighErrorRate (warning): Error rate > 5% for 5 minutes
- LiteLLMCriticalErrorRate (critical): Error rate > 20% for 2 minutes
- LiteLLMHighRequestRate (warning): > 100 req/s for 10 minutes (abuse detection)
- LiteLLMTokenLimitApproaching (info): > 1M tokens used in 1 hour
- LiteLLMModelFailures (warning): Model error rate > 10% for 5 minutes
- LiteLLMHighDailyCost (warning): Daily cost > $100
- LiteLLMCostSpike (warning): Cost increases 2x vs previous hour for 15 minutes
- LiteLLMDatabasePoolExhausted (critical): All DB connections active for 5 minutes
- LiteLLMSlowDatabaseQueries (warning): P95 query time > 1s for 10 minutes
- LiteLLMGroqProviderDown (critical): Groq failure rate > 90% for 5 minutes
- LiteLLMProviderRateLimited (warning): > 10 rate limit errors in 5 minutes
Location: /scripts/monitor_litellm.sh
Features:
- Container status check
- Health endpoint validation
- Model availability verification
- Database connectivity test
- Performance metrics (CPU, memory, network)
- Live inference test
- Active alerts summary
Usage:
# Run once
./scripts/monitor_litellm.sh
# Run continuously (every 30s)
watch -n 30 ./scripts/monitor_litellm.sh# Request metrics
litellm_requests_total # Total requests counter
litellm_request_errors_total # Total errors counter
litellm_request_duration_seconds # Latency histogram
# Model metrics
litellm_model_requests_total{model="..."} # Requests per model
litellm_model_errors_total{model="..."} # Errors per model
# Token metrics
litellm_tokens_used_total # Total tokens consumed
litellm_prompt_tokens_total # Prompt tokens
litellm_completion_tokens_total # Completion tokens
# Cost metrics
litellm_total_cost_usd # Total cost in USD
# Provider metrics
litellm_provider_requests_total{provider="groq"} # Requests per provider
litellm_provider_errors_total{provider="groq"} # Errors per provider
# Usage analytics from database
llm_credits_used_total{tenant_id="..."} # Credits consumed per tenant
llm_requests_by_model{model="..."} # Historical request counts
llm_cost_by_tenant{tenant_id="..."} # Cost tracking per tenant
llm_response_time_seconds{model="..."} # Response time histograms
docker-compose -f docker-compose.monitoring.yml up -ddocker-compose -f docker-compose.monitoring.yml down# Prometheus
docker logs ops-center-prometheus -f
# Grafana
docker logs ops-center-grafana -f
# AlertManager
docker logs ops-center-alertmanager -f# Restart all monitoring
docker-compose -f docker-compose.monitoring.yml restart
# Restart specific service
docker restart ops-center-prometheus
docker restart ops-center-grafanacurl -X POST http://localhost:9090/-/reload- Edit
/config/prometheus/rules/litellm_alerts.yml - Reload Prometheus:
curl -X POST http://localhost:9090/-/reload
- Verify rules loaded:
curl -s http://localhost:9090/api/v1/rules | jq '.data.groups[] | .name'
- Place JSON file in
/config/grafana/dashboards/ - Restart Grafana:
docker restart ops-center-grafana
✅ Operational:
- Prometheus scraping 7 exporters
- Grafana dashboard created
- 15 alert rules configured and loaded
- Monitoring script functional
- All monitoring containers running
-
LiteLLM native metrics not exposed -
/metricsendpoint returns 404- Need to enable Prometheus exporter in LiteLLM config
- May require setting
LITELLM_ENABLE_PROMETHEUS=trueenvironment variable - Alternative: Use LiteLLM database logging and export metrics from backend
-
Backend metrics endpoint not implemented -
/api/v1/llm/metricsdoesn't exist- Need to create FastAPI route in backend
- Query litellm_db for usage statistics
- Export as Prometheus format (counter/gauge/histogram)
-
AlertManager configuration needed - Currently restarting
- Need to configure
/config/alertmanager/alertmanager.yml - Set up notification channels (email, Slack, webhook)
- Define routing rules for alert severity
- Need to configure
# View all targets
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | {job: .labels.job, health: .health, error: .lastError}'
# View only LiteLLM targets
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | select(.labels.job | contains("litellm"))'# Test LiteLLM native metrics (currently 404)
curl http://localhost:4000/metrics
# Test backend custom metrics (not implemented)
curl http://localhost:8084/api/v1/llm/metrics# Check if metrics exist
curl -s 'http://localhost:9090/api/v1/label/__name__/values' | jq '.data[] | select(contains("litellm"))'
# Query specific metric
curl -s 'http://localhost:9090/api/v1/query?query=up{job="litellm"}' | jq .# All active alerts
curl -s http://localhost:9090/api/v1/alerts | jq '.data.alerts[] | {alert: .labels.alertname, state: .state, value: .value}'
# Only firing alerts
curl -s http://localhost:9090/api/v1/alerts | jq '.data.alerts[] | select(.state=="firing")'- Check LiteLLM documentation for Prometheus export
- Update environment variables or config to enable metrics
- Verify
/metricsendpoint responds with Prometheus format - Confirm Prometheus can scrape the endpoint
- Create
/api/v1/llm/metricsroute in backend - Query
litellm_dbfor:- Total requests (from
llm_transactions) - Credits used (from
llm_credits) - Response times (from
llm_transactions) - Error counts (from
llm_transactionsWHERE status='error')
- Total requests (from
- Format as Prometheus metrics:
from prometheus_client import generate_latest, Counter, Histogram, Gauge # Example metrics llm_requests = Counter('llm_requests_total', 'Total LLM requests', ['model', 'tenant']) llm_credits = Counter('llm_credits_used_total', 'Total credits used', ['tenant']) llm_latency = Histogram('llm_response_time_seconds', 'LLM response time', ['model'])
- Create
/config/alertmanager/alertmanager.yml - Configure notification receivers:
- Email for critical alerts
- Slack for warnings
- Webhook for custom integrations
- Set up routing based on severity and alert group
- Test alert delivery
- Create provider-specific dashboards (Groq, HuggingFace, local)
- Add cost analysis dashboard
- Create tenant usage dashboard
- Build capacity planning dashboard
- Prometheus Docs: https://prometheus.io/docs/
- Grafana Docs: https://grafana.com/docs/
- LiteLLM Docs: https://docs.litellm.ai/
- Prometheus Query Examples: https://prometheus.io/docs/prometheus/latest/querying/examples/
-
Change Grafana default password immediately
# Access http://localhost:3001 # Login with admin/admin # You'll be prompted to change password
-
Restrict Prometheus access - Currently open on port 9090
- Use Traefik auth middleware in production
- Or restrict to internal network only
-
Protect metrics endpoints - Ensure LiteLLM metrics require authentication
- Set
LITELLM_MASTER_KEYrequirement for metrics endpoint - Use network policies to restrict scraper access
- Set
-
AlertManager webhook security - Use signed payloads for webhooks
- Configure HMAC signatures
- Validate webhook sources
For monitoring issues:
- Check monitoring script output:
./scripts/monitor_litellm.sh - Review Prometheus targets: http://localhost:9090/targets
- Check container logs:
docker logs ops-center-prometheus - Verify alert rules: http://localhost:9090/rules
Last Updated: 2026-02-13
Version: 1.0
Status: Monitoring stack deployed, metrics endpoints pending implementation