A comprehensive, production-ready observability demonstration featuring a FastAPI application integrated with the full LGTP stack (Loki, Grafana, Tempo, Prometheus) using OpenTelemetry.
This project showcases how to implement a complete observability lifecycle:
- Tracing: Distributed tracing with Tempo.
- Metrics: Infrastructure and application metrics with Prometheus.
- Logging: Centralized logging with Loki.
- Alerting: Proactive monitoring with Alertmanager.
- Visualization: Unified dashboards in Grafana.
- Instrumentation: Vendor-neutral telemetry via OpenTelemetry.
- Core: FastAPI, Python 3.12+
- Infrastructure: Docker Compose, Traefik (Reverse Proxy)
- Database & Cache: PostgreSQL, Redis
- Storage: MinIO (S3-compatible storage for Loki/Tempo)
- Observability:
- OpenTelemetry (SDKs & Collector)
- Prometheus (Metrics)
- Grafana Loki (Logs)
- Grafana Tempo (Traces)
- Grafana (Dashboards)
- Alertmanager (Alerting)
The system is composed of multiple microservices orchestrated by Docker Compose:
- Server: 5 replicas of the FastAPI application, load-balanced by Traefik.
- Telemetry Flow:
- Traces: Sent directly to Tempo via OTLP/gRPC.
- Metrics: Sent to OpenTelemetry Collector, then scraped by Prometheus.
- Logs: Sent directly to Loki via OTLP/HTTP.
- Exporters: Dedicated exporters for PostgreSQL and Redis to provide deep infrastructure insights.
- Docker and Docker Compose
- Node.js (optional, for load testing with k6)
-
Clone the repository:
git clone <repository-url> cd observability
-
Configure Environment: Copy the example environment file and adjust values if necessary:
cp .env.example .env
-
Launch the Stack:
docker compose up -d
The project uses Traefik to provide clean local hostnames (ensure your hosts file points these to 127.0.0.1 or use a tool like dnsmasq):
| Service | URL | Description |
|---|---|---|
| FastAPI App | http://server.localhost | The main application API |
| Grafana | http://grafana.localhost | Visualization & Dashboards |
| Prometheus | http://prometheus.localhost | Metrics exploration |
| Alertmanager | http://alertmanager.localhost | Alert management |
| MinIO Console | http://minio.localhost | Object storage UI |
| Traefik Dashboard | http://localhost:8080 | Proxy & Load Balancer status |
To see the observability stack in action, generate some traffic using the provided k6 script:
# Using local k6 installation
k6 run scripts/load-test.jsThe load test simulates concurrent users hitting various endpoints, which will populate Grafana dashboards with real-time metrics, traces, and logs.
- Correlation: Logs are automatically enriched with
trace_idandspan_id, allowing seamless navigation from a log entry to its corresponding trace in Grafana. - Custom Metrics: The application tracks
app_requests_totalandapp_request_duration_secondswith high-cardinality attributes like HTTP status and path. - Infrastructure Monitoring: Built-in dashboards for hardware (Node Exporter), Database (Postgres Exporter), and Cache (Redis Exporter).
The project includes pre-configured alerting rules in Prometheus to monitor the application's health:
- HighErrorRate: Triggered if the rate of
5xxresponses exceeds 1/sec for 30s. - HighLatency: Triggered if the P95 latency is above 1 second for 1 minute.
- PostgresDown: Immediate alert if the database becomes unreachable.
Alerts are routed via Alertmanager, which can be accessed at http://alertmanager.localhost.
This project is licensed under the MIT License - see the LICENSE file for details.