This exporter provides comprehensive Prometheus metrics for LiteLLM, exposing usage, spend, performance, and operational data from the LiteLLM database.
litellm_total_spend: Total spend across all users by modellitellm_user_spend: Spend by user and model (labels: user_id, user_alias, model)litellm_team_spend: Spend by team and model (labels: team_id, team_alias, model)litellm_org_spend: Spend by organization and model (labels: organization_id, organization_alias, model)litellm_tag_spend: Spend by request taglitellm_key_spend: Spend by API key and model (labels: key_name, key_alias, model)
litellm_total_tokens: Total tokens used by modellitellm_prompt_tokens: Prompt tokens used by modellitellm_completion_tokens: Completion tokens used by model
litellm_request_duration_seconds: Request duration histogramlitellm_requests_total: Total number of requests by model and statuslitellm_parallel_requests: Current parallel requests by entity (labels: entity_type, entity_id, entity_alias)
litellm_tpm_limit: Tokens per minute limit by entity (labels: entity_type, entity_id, entity_alias)litellm_rpm_limit: Requests per minute limit by entity (labels: entity_type, entity_id, entity_alias)litellm_current_tpm: Current tokens per minute usage (labels: entity_type, entity_id, entity_alias)litellm_current_rpm: Current requests per minute usage (labels: entity_type, entity_id, entity_alias)
litellm_cache_hits_total: Total number of cache hits by modellitellm_cache_misses_total: Total number of cache misses by model
litellm_budget_utilization: Budget utilization percentage (labels: entity_type, entity_id, entity_alias)litellm_max_budget: Maximum budget (labels: entity_type, entity_id, entity_alias)litellm_soft_budget: Soft budget limit (labels: entity_type, entity_id, entity_alias)litellm_budget_reset_time: Time until budget reset in seconds (labels: entity_type, entity_id, entity_alias)
litellm_errors_total: Total number of errors by model and error typelitellm_error_rate: Rate of errors per minute by model
litellm_blocked_status: Entity blocked status (labels: entity_type, entity_id, entity_alias)litellm_member_count: Number of members in a team (labels: team_id, team_alias)litellm_admin_count: Number of admins in a team (labels: team_id, team_alias)
litellm_active_keys: Number of active API keys (labels: entity_type, entity_id, entity_alias)litellm_expired_keys: Number of expired API keys (labels: entity_type, entity_id, entity_alias)litellm_key_expiry: Time until key expiry in seconds (labels: key_name, key_alias)
litellm_available_models: Number of available models (labels: entity_type, entity_id, entity_alias)litellm_model_info: Model information (name, configuration, etc.)
For a comprehensive guide to all environment variables, their impacts, and best practices, see ENV_VARS.md.
LITELLM_DB_HOST: PostgreSQL host (default: localhost)LITELLM_DB_PORT: PostgreSQL port (default: 5432)LITELLM_DB_NAME: Database name (default: litellm)LITELLM_DB_USER: Database user (default: postgres)LITELLM_DB_PASSWORD: Database password (default: empty)DB_MIN_CONNECTIONS: Minimum number of database connections in the pool (default: 1)DB_MAX_CONNECTIONS: Maximum number of database connections in the pool (default: 10)
For security best practices, it's recommended to create a dedicated read-only PostgreSQL user for the exporter. See POSTGRES_SETUP.md for detailed instructions on setting up a read-only database user.
METRICS_PORT: Port to expose metrics on (default: 9090)METRICS_UPDATE_INTERVAL: How frequently metrics are updated in seconds (default: 15)METRICS_SPEND_WINDOW: Time window for spend metrics (default: 30d)METRICS_REQUEST_WINDOW: Time window for request metrics (default: 24h)METRICS_ERROR_WINDOW: Time window for error metrics (default: 1h)
Time windows can be specified using:
- 'd' for days (e.g., '30d')
- 'h' for hours (e.g., '24h')
- 'm' for minutes (e.g., '30m')
Different time windows affect both metric accuracy and database performance:
-
Spend Window (default: 30d)
- Longer windows provide better historical spend analysis
- Affects memory usage and query performance
- Consider your retention needs when adjusting
-
Request Window (default: 24h)
- Shorter windows provide more accurate recent usage patterns
- Useful for monitoring current system load
- Affects request rate and throughput calculations
-
Error Window (default: 1h)
- Short window for immediate error detection
- Helps identify current system issues
- Minimal impact on database performance
-
Update Interval (default: 15s)
- Controls how frequently metrics are refreshed
- Lower values provide more real-time data but increase database load
- Higher values reduce database load but decrease metric freshness
- Adjust based on your monitoring needs and database capacity
The easiest way to get started is using Docker Compose:
- Create a docker-compose.yml file:
services:
litellm-exporter:
image: nicholascecere/exporter-litellm:latest
platform: linux/amd64
ports:
- "9090:9090"
environment:
- LITELLM_DB_HOST=your-db-host
- LITELLM_DB_PORT=5432
- LITELLM_DB_NAME=your-db-name
- LITELLM_DB_USER=your-db-user
- LITELLM_DB_PASSWORD=your-db-password
- DB_MIN_CONNECTIONS=1
- DB_MAX_CONNECTIONS=10
- METRICS_UPDATE_INTERVAL=15
- METRICS_SPEND_WINDOW=30d
- METRICS_REQUEST_WINDOW=24h
- METRICS_ERROR_WINDOW=1h- Start the exporter:
docker-compose up -dThe exporter will start on port 9090 and connect to your existing LiteLLM database.
The exporter can be deployed to Kubernetes using the provided manifests in the k8s directory:
- First, encode your database credentials:
echo -n "your-db-host" | base64
echo -n "5432" | base64
echo -n "your-db-name" | base64
echo -n "your-db-user" | base64
echo -n "your-db-password" | base64- Update the Secret in
k8s/exporter-litellm.yamlwith your base64-encoded values:
apiVersion: v1
kind: Secret
metadata:
name: litellm-exporter-secrets
type: Opaque
data:
LITELLM_DB_HOST: "base64-encoded-host"
LITELLM_DB_PORT: "base64-encoded-port"
LITELLM_DB_NAME: "base64-encoded-name"
LITELLM_DB_USER: "base64-encoded-user"
LITELLM_DB_PASSWORD: "base64-encoded-password"- Apply the Kubernetes manifests:
kubectl apply -f k8s/exporter-litellm.yamlThis will create:
- A ConfigMap with exporter configuration
- A Secret containing database credentials
- A Deployment running the exporter
- A Service exposing the metrics endpoint
The exporter will be available at http://litellm-exporter:9090 within your cluster. The deployment includes:
- Resource limits and requests
- Liveness and readiness probes
- Prometheus scrape annotations
- Configurable replicas (default: 1)
You can run the exporter directly with Docker:
docker run -d \
-p 9090:9090 \
-e LITELLM_DB_HOST=your-db-host \
-e LITELLM_DB_PORT=5432 \
-e LITELLM_DB_NAME=your-db-name \
-e LITELLM_DB_USER=your-db-user \
-e LITELLM_DB_PASSWORD=your-db-password \
-e DB_MIN_CONNECTIONS=1 \
-e DB_MAX_CONNECTIONS=10 \
-e METRICS_UPDATE_INTERVAL=15 \
-e METRICS_SPEND_WINDOW=30d \
-e METRICS_REQUEST_WINDOW=24h \
-e METRICS_ERROR_WINDOW=1h \
nicholascecere/exporter-litellm:latest- Install dependencies:
pip install -r requirements.txt- Set environment variables:
export LITELLM_DB_HOST=your-db-host
export LITELLM_DB_PORT=5432
export LITELLM_DB_NAME=your-db-name
export LITELLM_DB_USER=your-db-user
export LITELLM_DB_PASSWORD=your-db-password
export DB_MIN_CONNECTIONS=1
export DB_MAX_CONNECTIONS=10
export METRICS_UPDATE_INTERVAL=15
export METRICS_SPEND_WINDOW=30d
export METRICS_REQUEST_WINDOW=24h
export METRICS_ERROR_WINDOW=1h- Run the exporter:
python litellm_exporter.pyTo build and test the Docker container locally, use the provided Makefile for a streamlined workflow. This approach allows you to quickly build and test the Docker image in your local environment, streamlining development and troubleshooting.
- Docker installed and running
- A valid
.envfile in the project root (see Configuration for required variables)
make buildThis will build the image as litellm-exporter:local by default. You can override the image name if needed:
make build IMAGE_NAME=my-custom-image:devmake runThis will start the container interactively, mapping port 9090 and loading environment variables from .env.
make run-detachedThis will start the container in the background. To view logs:
make logsmake stopThis stops and removes the running container. To remove the local image as well:
make cleanYou can override the following variables at runtime:
IMAGE_NAME(default:litellm-exporter:local)CONTAINER_NAME(default:litellm-exporter)PORT(default:9090)
Example:
make run-detached IMAGE_NAME=my-image:dev CONTAINER_NAME=litellm-dev PORT=8080Add the following to your prometheus.yml:
scrape_configs:
- job_name: 'litellm'
static_configs:
- targets: ['localhost:9090']Here are some example Prometheus queries for creating Grafana dashboards:
- Total spend rate:
rate(litellm_total_spend[1h]) - Spend by model:
sum by (model) (litellm_total_spend) - Team spend by alias:
sum by (team_alias) (litellm_team_spend) - Organization spend by alias:
sum by (organization_alias) (litellm_org_spend) - API key spend by alias:
sum by (key_alias) (litellm_key_spend) - API key spend by key name:
sum by (key_name) (litellm_key_spend)
- Request latency:
rate(litellm_request_duration_seconds_sum[5m]) / rate(litellm_request_duration_seconds_count[5m]) - Error rate:
rate(litellm_errors_total[5m]) - Cache hit ratio:
rate(litellm_cache_hits_total[5m]) / (rate(litellm_cache_hits_total[5m]) + rate(litellm_cache_misses_total[5m]))
- TPM utilization by alias:
sum by (entity_alias) (litellm_current_tpm / litellm_tpm_limit * 100) - RPM utilization by alias:
sum by (entity_alias) (litellm_current_rpm / litellm_rpm_limit * 100)
- Active teams by alias:
count by (team_alias) (litellm_member_count) - Blocked users by alias:
sum by (entity_alias) (litellm_blocked_status{entity_type="user"})
- Expiring keys alert:
litellm_key_expiry{key_alias="important-service"} < 86400(keys expiring within 24h) - Active keys by entity:
sum by (entity_alias) (litellm_active_keys)
- High budget utilization alert:
litellm_budget_utilization > 80 - Budget utilization by alias:
sum by (entity_alias) (litellm_budget_utilization)
These metrics provide comprehensive monitoring of your LiteLLM deployment, enabling you to track usage, performance, costs, and potential issues. The addition of alias labels and configurable time windows makes it easier to create meaningful dashboards and manage database performance.
This project is licensed under the GLWT (Good Luck With That) Public License - see the LICENSE file for details.
See CHANGELOG.md for a list of changes and version history.