This repository demonstrates a production-ready LLM application with model fallback, monitoring, and testing.
- FastAPI Application: REST API for LLM interactions with cascade fallback
- LiteLLM Proxy: Unified interface for multiple LLM providers (OpenAI, Gemini, OpenRouter)
- MLflow: Experiment tracking and prompt tracing
- Docker and Docker Compose
- API keys for:
- OpenAI (GPT-4o)
- Gemini 2.0 Flash
- OpenRouter (Mistral 7B fallback)
-
Setup environment:
cp env.example .env # Edit .env with your API keys -
Start services:
docker-compose up -d --build
-
Access services:
- API: http://localhost:8000
- LiteLLM: http://localhost:8001
- MLflow UI: http://localhost:5000
POST /generate
Content-Type: application/json
{
"prompt": "Your prompt here",
"model": "smart-router", # Uses cascade fallback
"temperature": 0.7
}GET /modelsGET /health- Primary:
gpt-4o-primary(OpenAI GPT-4o) - Secondary:
gemini-secondary(Gemini 2.0 Flash) - Fallback:
openrouter-fallback(Mistral 7B via OpenRouter)
Use smart-router model name to enable automatic fallback.
All LLM calls are tracked with:
- Input/Output parameters
- Token usage and latency
- Success/Failure status
- Full prompt/response history
Access the MLflow UI at http://localhost:5000
.
├── docker-compose.yml # Service definitions
├── litellm-config.yaml # LiteLLM model configuration
├── .env.example # Template for environment variables
├── test-requirements.txt # Testing dependencies
├── tests/ # Integration tests
├── mlflow-data/ # MLflow experiment data
└── src/
└── api/ # FastAPI application
├── main.py # API endpoints
└── Dockerfile # API container setup
Tests run inside the container:
docker-compose exec api pytest /app/tests/docker-compose downdocker-compose logs -f- MLflow data:
./mlflow-data - Test coverage reports:
./htmlcov
The Makefile provides a set of commands to manage the environment and run tests. Here are the available commands:
Note:
jqis required to parse the API responses.
# Check API health
make api-test
# List available models
make api-models
# Generate text with fallback model
make api-generate PROMPT="What is the capital of France?"
# Generate text specifically with Gemini
make api-generate-gemini PROMPT="Explain quantum computing in simple terms"