Production-ready observability platform for GatewayZ AI Backend — Centralized metrics, logs, traces, profiles, and alerting deployed on Railway.
GatewayZ is an AI inference gateway. It sits between your application and the LLM providers (OpenAI, Anthropic, OpenRouter, Groq, Cerebras, and 25+ more), handling routing, load balancing, rate limiting, circuit breaking, and token tracking across all of them in one FastAPI service.
When you route production AI traffic through 30+ providers, you quickly hit questions that standard logging can't answer:
- Which provider is degrading right now — and is it affecting only one model or all of them?
- My P99 latency spiked to 8 seconds. Is it the provider HTTP call, the token counter, or the Redis cache layer?
- A circuit breaker opened on
openrouter/claude-3-5-sonnet. How long has it been open and when did it first fire? - My token budget burned 40% faster this week. Which model and which user drove that?
- Is the spike in errors a transient provider blip or a real regression in my routing logic?
None of these questions have simple answers if all you have is basic application logging or a generic APM tool. You need four separate observability signals working together:
| Signal | What it tells you | Stored in |
|---|---|---|
| Metrics | Counts, rates, and percentiles over time | Prometheus → Mimir |
| Logs | Detailed text events per request | Loki |
| Traces | The exact path and timing of each request through each service | Tempo |
| Profiles | Which lines of Python code are burning the CPU | Pyroscope |
This repository is the complete observability stack for GatewayZ — 8 services deployed together, pre-wired, pre-configured, with 17 Grafana dashboards covering every aspect of AI inference operations:
- 430+ dashboard panels across Four Golden Signals, Model Performance, Provider Directory, Infrastructure Health, Log-Derived Metrics, and more
- Two-tier alerting: 40+ Grafana alert rules + 16 standalone Prometheus rules, both routing to ops/critical email
- 32 Prometheus recording rules + 35 Loki recording rules that pre-compute anomaly detection baselines and aggregate high-cardinality log data into bounded metrics
- Cross-signal navigation: click a slow metric → jump to the Tempo trace → jump to the Pyroscope flamegraph at that exact timestamp
- Provider/model tagged CPU profiles — every inference call in the backend is tagged with
providerandmodelso you can filter flamegraphs to exactly the calls you care about
If you're new to this stack, here's why each tool was chosen over the alternatives:
| Tool | Why we use it | What we considered instead |
|---|---|---|
| Grafana 11.5.2 | Industry-standard visualization with native plugins for every service in this stack. Alerting, dashboards, and datasource management in one place. | Datadog / New Relic — expensive SaaS with per-host pricing that doesn't scale for inference volume |
| Prometheus | De-facto standard for metrics scraping. Native prometheus_client integration for FastAPI. Rich PromQL ecosystem. |
OpenTelemetry Collector alone — doesn't provide the same scrape model or alerting rules |
| Mimir 2.11.0 | Long-term Prometheus-compatible storage (30-day retention). Prometheus alone only holds 15 days and loses data on restart. Mimir is drop-in compatible — same PromQL, same API. | Prometheus-only — too short retention, no HA; VictoriaMetrics — slightly different API surface |
| Loki 3.4 | Log aggregation built for Kubernetes/container labels. No full-text indexing cost — only indexes labels (app, level, service), which is enough for correlation. Native Grafana integration. | Elasticsearch / Splunk — full-text indexing is expensive and overkill when you have trace_id correlation; self-managed ELK is heavyweight |
| Tempo | Distributed tracing with native OTLP receiver (the OpenTelemetry standard). Generates span metrics (traces_spanmetrics_*) that feed directly into Mimir. No per-span billing. |
Jaeger — doesn't generate span metrics, less Grafana integration; Zipkin — limited OpenTelemetry support |
| Pyroscope 1.7.1 | Always-on continuous profiling at 100 Hz. Catches every P99 CPU outlier, not just sampled transactions. Supports provider/model tags on flamegraphs. Links directly from Tempo spans. | py-spy / sampling profilers — miss tail latency; Sentry performance (5% sample rate, no flamegraphs) |
| Alertmanager v0.27.0 | Standalone alert routing service. Fires even if Grafana is down. Provides inhibition rules (suppress warning floods when a critical fires). Mirrors Grafana's notification policy tree for two-tier ops/critical routing. | Grafana alerting only — single point of failure; no inhibition rules |
| Railway | Zero-ops container hosting with internal private DNS (.railway.internal) for free inter-service networking. Services in the same Railway project can reach each other without public URLs or VPNs. |
AWS ECS / GCP Cloud Run — more infrastructure overhead; Fly.io — no built-in internal networking for free |
If you just joined and have no background in any of these tools, start here:
Step 1 — Understand the system conceptually Read the Architecture Overview section below. Focus on the data flow diagram and the data type separation table. The key insight: each service stores a different data type and is optimized for it. Do not try to store logs in Prometheus or metrics in Loki.
Step 2 — Read the full architectural wiki Open MASTER.md — it's the single source of truth for every architectural decision, every datasource UID, every dashboard, every alert rule, and every known gap. Pay attention to §16 (Backend Telemetry Architecture) if you're integrating the backend.
Step 3 — Understand what the backend must expose Read docs/backend/BACKEND_METRICS_REQUIREMENTS.md. This tells you exactly which Prometheus metrics, Loki labels, and OTLP attributes the backend must emit for the dashboards to show data.
Step 4 — Run it locally
export FASTAPI_TARGET="host.docker.internal:8000" # or your backend's address
docker compose up --build
open http://localhost:3000 # Grafana — admin / yourpassword123All 8 services will start. Check http://localhost:9090/targets to confirm Prometheus is scraping your backend.
Step 5 — Explore the dashboards
- Four Golden Signals — Start here. Latency, Traffic, Errors, Pyroscope profiling row.
- Provider Directory — Per-provider health scores, circuit breaker states, availability matrix.
- Inference Call Profile — CPU cost anatomy broken down by provider and model.
- Infrastructure Health — Stack health, Mimir remote write, Loki ingestion rates.
For deployment to Railway: Follow docs/deployment/RAILWAY_DEPLOYMENT_QUICK_START.md.
Acceptance criteria for all 25 project tasks: See ACCEPTANCE_CRITERIA.md.
- Grafana Dashboard: https://logs.gatewayz.ai
# 1. Clone and navigate to the stack
git clone <repo-url>
cd railway-grafana-stack
# 2. Configure backend metrics scraping (REQUIRED for data to show)
export FASTAPI_TARGET="host.docker.internal:8000" # If backend runs on host
# OR
export FASTAPI_TARGET="gatewayz-backend:8000" # If backend is in Docker network
# 3. Start all services
docker compose up --build
# 4. Access services
open http://localhost:3000 # Grafana (admin/yourpassword123)
open http://localhost:9090 # Prometheus
open http://localhost:9009 # Mimir
open http://localhost:9093 # Alertmanager (alert routing UI)Documentation:
- Complete Documentation Index — Start here for all guides and references
- Cheatsheet — Common commands and queries
- Troubleshooting — Fix common issues
- Architecture — System design and components
Before looking at the diagram, here's the conceptual model:
Metrics are numbers sampled over time — request counts, error rates, latency percentiles. Prometheus scrapes them from your backend's /metrics endpoint every 15 seconds and stores them locally (15-day retention). Prometheus then remote-writes everything to Mimir, which holds 30 days of history and survives Prometheus restarts.
Logs are text events — structured JSON lines your backend emits for every request, error, and state change. The backend pushes them directly to Loki over HTTP. Loki stores them for 30 days, indexed only by labels (app, level, service) rather than full-text, keeping storage costs low.
Traces are maps of a request's journey through your system — which functions ran, in what order, how long each took. The backend uses the OpenTelemetry SDK to push OTLP-format traces to Tempo on every request. Tempo also generates derived span metrics (traces_spanmetrics_*) and writes them to Mimir.
Profiles are CPU and memory flamegraphs — a continuous record of which functions are consuming CPU at 100Hz sampling. The backend uses the Pyroscope SDK to push profiles to Pyroscope every 15 seconds, tagged by provider and model. This tells you which line of code causes that 8-second P99 latency.
Provider health is computed data — health scores, circuit breaker states, and availability percentages calculated server-side by the backend and exposed via /prometheus/data/metrics. The JSON-API-Proxy (a small Flask service) polls this endpoint and translates it into the Simple JSON format that Grafana can query directly for real-time provider status panels.
The key rule: each service stores only its own data type. Prometheus does not store traces. Grafana queries each service for its own data type. Exception: Loki's ruler evaluates LogQL recording rules and remote-writes derived metrics (not raw logs) to Mimir — the same pattern Tempo uses for span metrics.
┌─────────────────────────────────────────────────────────────────────────────┐
│ GatewayZ Backend API │
│ (FastAPI: api.gatewayz.ai) │
└───────────┬─────────────────────┬─────────────────────┬─────────────────────┘
│ │ │
Metrics (Pull) Logs (Push) Traces (Push)
/metrics :3100/loki/push :4317 (gRPC)
:4318 (HTTP)
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ Prometheus │ │ Loki │ │ Tempo │
│ :9090 │ │ :3100 │ │ :3200 │
│ │ │ │ │ │
│ Scrapes every │ │ Log Storage │ │ Trace Storage │
│ 15-30s │ │ 30d Retain │ │ 48h Retain │
└───────┬───────┘ └───────────────┘ └───────┬───────┘
│ │ │
│ remote_write │ ruler remote_write │ metrics_generator
│ /api/v1/push │ /api/v1/push │ remote_write
│ │ │
│ firing alerts │ │
├────────────────►────┤ │
│ ┌──────────┴──────┐ │
│ │ Alertmanager │ │
│ │ :9093 │ │
│ │ │ │
│ │ Routes alerts → │ │
│ │ Email (SMTP) │ │
│ └─────────────────┘ │
▼ │ ▼
┌───────────────┐ │ ┌───────────────┐
│ │◄────────────┘ │ │
│ Mimir │ │ Mimir │
│ :9009 │◄───────────────────────── │ (span metrics│
│ │ Loki ruler sends │ from traces)│
│ Long-term │ log-derived metrics │ │
│ 30d Retain │ (recording rules) │ │
└───────┬───────┘ └───────┬───────┘
│ │
└─────────────────────┬─────────────────────┘
│
▼
┌───────────────┐
│ Grafana │
│ :3000 │
│ │
│ Queries each │
│ source for │
│ its data type │
└───────────────┘
│
┌─────────────┼─────────────┐─────────────┐
▼ ▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│Prometheus│ │ Mimir │ │ Loki │ │ Tempo │
│(metrics) │ │(metrics) │ │ (logs) │ │ (traces) │
│short-term│ │long-term │ │ │ │ │
└──────────┘ └──────────┘ └──────────┘ └──────────┘
| Component | Stores | Writes To | Notes |
|---|---|---|---|
| Prometheus | Metrics (time-series) | Mimir via remote_write | Short-term storage, scrapes every 15-30s |
| Mimir | Metrics (time-series) | Local filesystem | Long-term storage, 30-day retention |
| Loki | Logs (text lines) | Local filesystem + Mimir (recording rule metrics only) | Logs stored locally; ruler aggregates log data into metrics → Mimir |
| Tempo | Traces (spans) | Local filesystem + Mimir (span metrics only) | Traces stored locally, derived metrics to Mimir |
Loki → Mimir flow: Loki stores log lines (text data) locally. Its ruler component evaluates LogQL recording rules every 1 minute and remote-writes the resulting derived metrics (counts, ratios, aggregations) to Mimir — the same pattern Tempo uses for span metrics. Raw logs are never sent to Mimir; only pre-aggregated numeric time-series are.
| Service | Port | Purpose | Status |
|---|---|---|---|
| Grafana 11.5.2 | 3000 | Visualization & dashboards | ✅ 6 dashboard folders |
| Prometheus 3.2.1 | 9090 | Metrics collection + alerting rules | ✅ 6 scrape jobs |
| Alertmanager v0.27.0 | 9093 | Alert routing → email (ops + critical) | ✅ Mirrors Grafana notification policies |
| Mimir 2.11.0 | 9009, 9095 | Long-term metrics storage | ✅ 30-day retention |
| Loki 3.4 | 3100 | Log aggregation + ruler (35 recording rules → Mimir) | ✅ 30-day retention, ruler remote_write |
| Tempo | 3200, 4317, 4318 | Distributed tracing | ✅ OTLP endpoints |
| Pyroscope 1.7.1 | 4040 | Continuous CPU profiling | ✅ Provider/model/cache tagged flamegraphs |
| JSON-API-Proxy | 5050 | Provider health bridge (Flask → Grafana Simple JSON) | ✅ Circuit breaker states, health scores |
The gatewayz-backend (FastAPI) emits telemetry on four channels that feed this stack:
| Channel | Protocol | Destination | Grafana Datasource |
|---|---|---|---|
| Metrics | Prometheus scrape (/metrics) |
Prometheus → Mimir | grafana_prometheus / grafana_mimir |
| Logs | Async Loki push (JSON) | Loki | grafana_loki |
| Traces | OTLP HTTP (/v1/traces) |
Tempo | grafana_tempo |
| Provider health | HTTP scrape (/prometheus/data/metrics) |
JSON-API-Proxy | grafana_json_api |
gatewayz-backend /metrics
│
└─► Prometheus scrapes every 15s
│
├─► Stored locally (short-term, 15d)
└─► Remote write → Mimir (long-term, 30d)
│
└─► Grafana queries Mimir for all panels
gatewayz-backend
├── FastAPI / HTTPX / Redis auto-instrumented by OpenTelemetry
├── LLM calls traced via OpenLLMetry (gen_ai.* semantic conventions)
└── ResilientSpanProcessor (circuit breaker) → OTLP HTTP → Tempo :4318
│
Grafana queries Tempo
(+ Pyroscope flamegraphs
linked via service.name)
gatewayz-backend (JSON structured logging via LokiLogHandler)
└── Async queue → Loki push :3100
Stream Labels: app, environment, service, level, logger,
trace_id, span_id, path, method, provider,
model, user_id, error_type
│
├─► Grafana queries Loki directly (query-time LogQL)
│ Log→Trace correlation via trace_id → Tempo
│
└─► Loki ruler evaluates 35 recording rules every 1m
│
└─► Remote write derived metrics → Mimir :9009
│
└─► Grafana queries Mimir (pre-aggregated)
provider×model matrix, error ratios,
anomaly baselines, streaming stats
| Identifier | Value | Used In |
|---|---|---|
| OTEL service name | gatewayz-api |
Tempo trace search, Pyroscope linking |
| Loki app label | app="gatewayz" |
All Loki dashboard queries |
| Prometheus job | gatewayz_production |
Prometheus target filter |
| Scrape target | $FASTAPI_TARGET |
Must be set in Railway env vars |
Full backend telemetry reference: See MASTER.md — Section 16 for all metrics, OTEL config, Loki log format, health monitoring tiers, and dashboard-to-metric mapping.
GatewayZ routes AI inference requests across 30+ providers and 100+ models, generating log data with high-cardinality fields (request IDs, model variants, provider×model combinations, per-user sessions). Tracking every combination as a native Prometheus metric would cause cardinality explosion (~3,000+ time series from provider×model alone). Traditional metrics can't answer questions like "which provider×model combination has the highest error rate this week?"
Loki's ruler component evaluates LogQL recording rules every minute and remote-writes the resulting aggregated metrics to Mimir. This collapses high-cardinality log data into bounded, queryable metrics — the same pattern Tempo uses for span metrics.
Backend logs (JSON, high-cardinality stream labels)
│
▼
Loki (stores raw logs for 30 days)
│
└─► Ruler evaluates 35 LogQL recording rules every 1m
│
└─► Aggregated metrics remote-written to Mimir
│
└─► Grafana queries via grafana_mimir datasource
File: loki/rules/gatewayz_log_recording_rules.yml
| Group | Rules | What It Aggregates |
|---|---|---|
loki_error_metrics |
7 | Error counts by category: total, timeout, rate-limit, database, memory, auth, exceptions |
loki_provider_metrics |
3 | Per-provider error count, timeout count, request volume |
loki_log_health |
4 | Total log volume, level distribution, error-to-total ratio %, circuit breaker events |
loki_request_metrics |
3 | HTTP volume by method, slow request count, token usage events |
loki_high_cardinality_aggregations |
7 | Provider×Model request matrix, per-endpoint volume, error types, per-provider error ratio, streaming completions, slow TTFC |
loki_baselines |
4 | 1h/24h averages for error rate, log volume, error ratio (anomaly detection) |
loki_high_cardinality_baselines |
3 | 1h averages for provider×model volume/errors, per-provider error ratio |
All recording rule metric names use the loki: prefix (e.g., loki:errors:count_per_minute, loki:requests:by_provider_model:count_per_5m) to distinguish them from native Prometheus metrics.
| File | Purpose |
|---|---|
loki/loki.yml |
Ruler block with remote_write to Mimir (X-Scope-OrgID: anonymous) |
loki/entrypoint.sh |
Runtime Mimir URL substitution (Railway vs Docker Compose) |
loki/rules/gatewayz_log_recording_rules.yml |
35 LogQL recording rules in 7 groups |
loki/Dockerfile |
Copies rules + entrypoint into container |
- Cost Optimization — Aggregate thousands of per-request log entries into bounded provider/model metrics without overwhelming Mimir
- Root Cause Analysis — Drill from a metric spike in Grafana to the exact log line using
provider,model,trace_id,error_typestream labels - Retroactive Business Intelligence — Answer historical questions about provider costs, model usage trends, and token consumption from existing logs — no pre-planned instrumentation needed
- Anomaly Detection — Recording rule baselines (1h/24h averages) enable alerts when current error rates exceed 2× the historical average
- Streaming Observability — Track TTFC (time to first chunk), stream completions, and prompt routing across the full request lifecycle
All dashboards use real API endpoints with live data from Prometheus/Mimir — no mock data.
| Folder | Dashboard(s) | Purpose | Status |
|---|---|---|---|
| Four Golden Signals | Four-Golden-Signals | Latency · Traffic · Errors · Pyroscope Profiling (Pillar IV) | ✅ Ready |
| Model Performance | Inference-Call-Profile, Model-Usage, Cache-Layer-Profile, Inference-Profiling, Provider-Directory | AI inference anatomy, token usage, Redis cache CPU, provider metrics | ✅ Ready |
| Loki | Live-GatewayZ-Logs, Error-Analysis, Security-RateLimit, Log-Derived Metrics | Log search, streaming, error patterns, high-cardinality log-to-metrics analytics | ✅ Ready |
| Prometheus | Prometheus self-monitoring | Scrape targets, query stats, remote_write health | ✅ Ready |
| Tempo | Tempo dashboards | Service graph, span metrics, trace search | ✅ Ready |
| Mimir | Mimir dashboards | Historical queries, retention stats | ✅ Ready |
- AI Provider Metrics: Request rates by provider/model
- Latency Analysis: P50/P95/P99 percentiles
- Token Usage: Input/output token tracking
- Error Rates: By provider, model, and error type
- Live-GatewayZ-Logs: Real-time log stream with filters by app, level, environment, free-text search
- Error-Analysis: Error anomaly detection, error-type distribution, reliability scoring
- Security-RateLimit: Auth failures, rate-limit violations, security anomaly thresholds
- Log-Derived Metrics (NEW): 38-panel high-cardinality analytics dashboard — extracts metrics from logs at query time using LogQL and pre-aggregated Mimir recording rules:
- Request performance by endpoint/method, slow request tracking, TTFC (time to first chunk) monitoring
- Provider analytics: per-provider request volume, error rates, top-10 ranking, error distribution
- Model usage: selection frequency, top-10 models, error hotspots, usage distribution pie charts
- Provider×Model cardinality aggregation: recording rules collapse 30+ providers × 100+ models into bounded metrics
- Anomaly detection: error rate vs 2× baseline threshold, log volume trends vs 1h averages
- Streaming & inference: stream completion rate, prompt router selection frequency
- Pure trace data from Tempo datasource
- Service Graph: Distributed tracing visualization
- Span Metrics: Request duration, error rates by service
- Historical Queries: 30-day retention for trend analysis
- Consistent Results: No data loss on Prometheus restarts
File: prometheus/prometheus.yml
| Job Name | Target | Interval | Purpose |
|---|---|---|---|
prometheus |
localhost:9090 | 15s | Self-monitoring |
gatewayz_production |
${FASTAPI_TARGET} |
15s | Production API metrics |
gatewayz_data_metrics_production |
${FASTAPI_TARGET}/prometheus/data/metrics |
30s | Provider health, circuit breakers |
health_service_exporter |
:8002 | 30s | Health service exporter |
mimir |
mimir:9009 | 30s | Mimir self-monitoring |
tempo |
${TEMPO_TARGET} |
15s | Tempo self-monitoring |
The ${FASTAPI_TARGET} placeholder MUST be set for local development:
# For backend on host machine (most common)
export FASTAPI_TARGET="host.docker.internal:8000"
# For backend in same Docker network
export FASTAPI_TARGET="gatewayz-backend:8000"
# Verify Prometheus is scraping
curl http://localhost:9090/api/v1/targetsWithout this variable, the Backend Services dashboard will show no API data.
Directory: grafana/datasources/datasources.yml
| Datasource | UID | Type | URL | Purpose |
|---|---|---|---|---|
| Prometheus | grafana_prometheus |
prometheus | ${PROMETHEUS_INTERNAL_URL} |
Short-term app metrics |
| Mimir | grafana_mimir |
prometheus | ${MIMIR_INTERNAL_URL}/prometheus |
Long-term metrics + span metrics from Tempo |
| Loki | grafana_loki |
loki | ${LOKI_INTERNAL_URL} |
Logs |
| Tempo | grafana_tempo |
tempo | ${TEMPO_INTERNAL_URL} |
Traces |
| Pyroscope | grafana_pyroscope |
grafana-pyroscope-datasource | ${PYROSCOPE_INTERNAL_URL} |
Continuous profiling / flamegraphs |
| Alertmanager | alertmanager |
alertmanager | ${ALERTMANAGER_INTERNAL_URL} |
Alert state visibility |
| JSON API | grafana_json_api |
simplejson | ${JSON_API_URL} |
Provider health scores, circuit breaker states |
Datasource rule:
grafana_prometheus= standard app metrics.grafana_mimir= ONLY Tempo-generatedtraces_spanmetrics_*/traces_service_graph_*metrics. Never mix them in dashboards.
| Service | Retention | Compaction | Notes |
|---|---|---|---|
| Prometheus | 15 days | N/A | Short-term, remote writes to Mimir |
| Mimir | 30 days | Every 2h | Horizontal scaling, HA-ready |
| Loki | 30 days | Every 10m | TSDB index, filesystem storage |
| Tempo | 7 days | 5m blocks | Local filesystem |
Python (FastAPI):
from prometheus_client import Counter, Histogram, make_asgi_app
from fastapi import FastAPI
app = FastAPI()
# Mount Prometheus metrics endpoint
metrics_app = make_asgi_app()
app.mount("/metrics", metrics_app)
# Define custom metrics
request_counter = Counter(
'http_requests_total',
'Total HTTP requests',
['method', 'endpoint', 'status_code', 'env']
)
latency_histogram = Histogram(
'http_request_duration_seconds',
'HTTP request latency',
['method', 'endpoint', 'env']
)
# Use in your routes
@app.get("/v1/models")
async def list_models():
request_counter.labels(
method="GET",
endpoint="/v1/models",
status_code="200",
env="production"
).inc()
return {"models": [...]}Python with python-logging-loki:
import logging
from logging_loki import LokiHandler
loki_handler = LokiHandler(
url="http://loki:3100/loki/api/v1/push",
tags={"app": "gatewayz", "env": "production"},
version="1"
)
logger = logging.getLogger("gatewayz")
logger.addHandler(loki_handler)
logger.setLevel(logging.INFO)
# Use structured logging
logger.info("Request processed", extra={
"user_id": "123",
"duration_ms": 45,
"model": "gpt-4"
})Python with OpenTelemetry:
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
# Configure Tempo exporter
otlp_exporter = OTLPSpanExporter(
endpoint="http://tempo:4318/v1/traces"
)
# Setup tracing
provider = TracerProvider()
provider.add_span_processor(BatchSpanProcessor(otlp_exporter))
trace.set_tracer_provider(provider)
tracer = trace.get_tracer(__name__)
# Instrument your code
with tracer.start_as_current_span("model_inference") as span:
span.set_attribute("model", "gpt-4")
span.set_attribute("provider", "openai")
result = await call_model_api()Complete Integration Guide: docs/backend/BACKEND_METRICS_REQUIREMENTS.md
Symptom: All Redis and API panels show "No data"
Root Cause: FASTAPI_TARGET environment variable not set for local builds
Solution:
# Set the environment variable
export FASTAPI_TARGET="host.docker.internal:8000"
# Restart Prometheus
docker compose restart prometheus
# Verify targets are UP
open http://localhost:9090/targetsCause: Query syntax issues or no logs ingested
Fix:
- Verify Loki is receiving logs:
curl http://localhost:3100/metrics | grep loki_distributor_lines_received_total - Test log ingestion:
curl -X POST http://localhost:3100/loki/api/v1/push \ -H "Content-Type: application/json" \ -d '{"streams":[{"stream":{"app":"test"},"values":[["'$(date +%s%N)'","test message"]]}]}'
- Check query syntax in dashboard (LogQL, not PromQL)
More Solutions: docs/troubleshooting/
Cause: Datasource UID mismatch or incorrect URLs
Fix:
-
Check datasource health:
# Prometheus curl http://localhost:9090/-/healthy # Mimir curl http://localhost:9009/ready # Loki curl http://localhost:3100/ready # Tempo curl http://localhost:3200/ready
-
Verify datasource UIDs in Grafana:
- Prometheus:
grafana_prometheus - Mimir:
grafana_mimir - Loki:
grafana_loki - Tempo:
grafana_tempo
- Prometheus:
-
Check environment variables in
docker-compose.yml
Symptom: Prometheus is scraping metrics but Mimir shows no data, or Grafana's Mimir datasource returns empty results.
Root Cause: Missing X-Scope-OrgID header in Prometheus remote_write or Grafana datasource configuration.
Why This Happens: Even with multitenancy_enabled: false in Mimir, both writes (from Prometheus) and reads (from Grafana) require the X-Scope-OrgID header. When multi-tenancy is disabled, Mimir uses anonymous as the default tenant.
Solution:
-
Verify Prometheus remote_write has the header (
prometheus/prometheus.yml):remote_write: - url: http://mimir:9009/api/v1/push headers: X-Scope-OrgID: anonymous
-
Verify Grafana Mimir datasource has the header (
grafana/datasources/datasources.yml):- name: Mimir type: prometheus url: ${MIMIR_INTERNAL_URL}/prometheus jsonData: httpHeaderName1: X-Scope-OrgID secureJsonData: httpHeaderValue1: anonymous
-
Restart services after configuration changes:
docker compose restart prometheus grafana
-
Verify Mimir is receiving data:
# Check Mimir ingester status curl http://localhost:9009/ingester/ring # Check remote write metrics in Prometheus curl http://localhost:9090/api/v1/query?query=prometheus_remote_storage_samples_total # Test Mimir query directly curl -H "X-Scope-OrgID: anonymous" "http://localhost:9009/prometheus/api/v1/query?query=up"
More Solutions: docs/troubleshooting/REMOTE_WRITE_DEBUG.md
- Long-term metrics storage with 30-day retention.
- Horizontally scalable architecture.
- Remote write from Prometheus, Loki (recording rules), and Tempo (span metrics).
- 35 Loki recording rules aggregate high-cardinality log data (30+ providers × 100+ models) into bounded Mimir metrics.
- Provider×Model matrix, per-endpoint volumes, error categorization, streaming completion rates — all derived from logs without application code changes.
- Anomaly detection baselines: 1h/24h averages enable alerts when current rates exceed 2× historical average.
- Retroactive BI: Answer historical questions from existing logs without pre-planned instrumentation.
- Latency: P50/P95/P99 percentiles + trends.
- Traffic: Request volume and rates by provider and model.
- Errors: Error rate gauge + trends per provider.
- Profiling (Pillar IV): Continuous Pyroscope flamegraphs replace traditional saturation metrics — tells you which line of code is the bottleneck.
- Inference Call Profile: Per-request CPU anatomy by provider/model.
- Cache Layer Profile: Redis CPU cost by cache layer (
auth,rate_limit,model_catalog,response_cache,trial_analytics) via Pyroscope tags. - Loki Logs: Deep log search, error analysis, security monitoring, and log-derived metrics (38-panel high-cardinality analytics dashboard).
- Tempo Traces: Distributed tracing and service graphs.
- Two-layer alerting: Grafana Alerting (dashboard rules) + standalone Alertmanager (Prometheus rules) — both route to ops/critical email via the same severity/category label convention.
- Testing: Comprehensive integration test suite (90+ tests).
- Security: No hardcoded credentials; fully environment-variable driven.
Mimir provides 30-day metric retention with horizontal scaling. This is critical for:
- Historical trend analysis
- Consistent query results across page refreshes
- No data loss on Prometheus restarts
┌────────────┐ remote_write ┌─────────────┐
│ Prometheus │ ────────────────────▶│ Mimir │
│ :9090 │ /api/v1/push │ :9009 │
│ (15d) │ X-Scope-OrgID: │ │
└────────────┘ anonymous │ (30d) │
│ │ │
┌────────────┐ ruler │ │
│ Loki │ remote_write │ │
│ :3100 │ ────────────────────▶ │
│ (35 rules) │ log-derived │ │
└────────────┘ metrics │ │
│ │
┌────────────┐ metrics_generator│ │
│ Tempo │ remote_write │ │
│ :3200 │ ────────────────────▶ │
│ (spans) │ span metrics └─────────────┘
└────────────┘ │
│ stores
▼
┌──────────────┐
│ /data/mimir/ │
│ blocks/ │
│ tsdb/ │
└──────────────┘
# 1. Check Mimir is ready
curl http://localhost:9009/ready
# Expected: "ready"
# 2. Check Mimir ingester ring (must show ACTIVE)
curl http://localhost:9009/ingester/ring | jq '.shards[].state'
# Expected: "ACTIVE"
# 3. Check Prometheus remote_write metrics
curl -s http://localhost:9090/api/v1/query?query=prometheus_remote_storage_samples_total | jq '.data.result[].value[1]'
# Expected: increasing number (samples sent)
# 4. Check for remote_write failures
curl -s http://localhost:9090/api/v1/query?query=prometheus_remote_storage_samples_failed_total | jq '.data.result[].value[1]'
# Expected: 0 or very low
# 5. Query Mimir directly
curl -H "X-Scope-OrgID: anonymous" \
"http://localhost:9009/prometheus/api/v1/query?query=up"
# Expected: JSON with metric data
# 6. Check Mimir logs for write activity
docker compose logs mimir 2>&1 | grep -i "push" | tail -5| File | Purpose |
|---|---|
prometheus/prometheus.yml |
remote_write config with X-Scope-OrgID: anonymous header |
mimir/mimir.yml |
Local development config (30d retention) |
mimir/mimir-railway.yml |
Railway production config |
grafana/datasources/datasources.yml |
Mimir datasource with X-Scope-OrgID header |
# Key settings for 30-day retention
multitenancy_enabled: false # Uses "anonymous" tenant
limits:
compactor_blocks_retention_period: 720h # 30 days
max_query_lookback: 720h # 30 days
ingestion_rate: 50000 # samples/sec
max_global_series_per_user: 500000 # total seriesGatewayZ uses a two-layer alerting architecture — both layers send email using the same routing logic, so no alert falls through whether it originates from a Grafana rule or a raw Prometheus rule.
┌─────────────────────────────────────────────────────────────────┐
│ Layer 1: Grafana Alerting (dashboard-based) │
│ Rules in grafana/provisioning/alerting/rules/ │
│ → contact_points.yml → notification_policies.yml → Email │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ Layer 2: Prometheus → Alertmanager (rule-based) │
│ Rules in prometheus/alert.rules.yml │
│ → alertmanager:9093 → alertmanager/alertmanager.yml → Email │
└─────────────────────────────────────────────────────────────────┘
- Every alert MUST be actionable — if you can't fix it, don't alert on it
- Fewer high-quality alerts — 14 essential alerts instead of 25+ noisy ones
- Warning vs Critical — warning = investigate soon, critical = wake someone up
- No duplicates — consolidated overlapping alerts into single actionable items
Grafana evaluates alert rules against Prometheus/Mimir queries and routes firing alerts to email contact points using a notification policy tree.
Directory: grafana/provisioning/alerting/rules/
| File | Alert Category |
|---|---|
traffic_anomalies.yml |
Traffic spikes, traffic drops |
error_rate_anomalies.yml |
Error rate spikes by provider |
latency_anomalies.yml |
P99 latency spikes and anomalies |
availability_anomalies.yml |
Provider availability drops, circuit breakers |
slo_burn_rate_alerts.yml |
SLO violation burn-rate alerts |
backend_alerts.yml |
Backend service health |
model_alerts.yml |
Model-specific issues |
File: grafana/provisioning/alerting/contact_points.yml
Email addresses are provisioned here. Grafana reads this file at startup — update it to change recipients without touching the Railway dashboard.
File: grafana/provisioning/alerting/notification_policies.yml
| Matcher | Receiver | Repeat |
|---|---|---|
severity=critical |
critical-email | 15m |
category=traffic_spike, severity=critical |
critical-email | 15m |
category=error_rate_spike |
ops-email | 30m |
component=slo, severity=critical |
critical-email | 15m |
category=latency_anomaly |
ops-email | 1h |
category=availability_drop, severity=critical |
critical-email | 15m |
category=circuit_breaker |
critical-email | 15m |
category=model |
ops-email | 30m |
category=logs |
ops-email | 30m |
| (default) | ops-email | 1h |
The alertmanager/ directory is a standalone Docker/Railway service (port 9093). Prometheus forwards its firing rules to Alertmanager via the alerting: block in prometheus/prometheus.yml. Alertmanager then applies its own routing tree and delivers email.
This mirrors Layer 1 exactly — the same severity + category labels drive the same ops/critical split — so teams get alerts from both paths without reconfiguring anything.
| Category / Severity | Receiver | group_wait | repeat_interval |
|---|---|---|---|
severity=critical |
critical-email | 0s | 15m |
category=slo, severity=critical |
critical-email | 0s | 15m |
category=traffic_spike, severity=critical |
critical-email | 10s | 15m |
category=error_rate_spike, severity=critical |
critical-email | 0s | 15m |
category=server_errors |
critical-email | 0s | 15m |
category=latency_spike |
critical-email | 0s | 15m |
category=latency_anomaly, severity=critical |
critical-email | 10s | 15m |
category=availability_drop, severity=critical |
critical-email | 0s | 15m |
category=circuit_breaker |
critical-email | 0s | 15m |
category=multi_provider_degradation |
critical-email | 0s | 15m |
| (traffic/latency/availability non-critical) | ops-email | 30s | 1h |
category=model |
ops-email | 5m | 30m |
category=backend |
ops-email | 5m | 30m |
category=logs |
ops-email | 30s | 30m |
| (default) | ops-email | 30s | 1h |
- Suppress
warningalerts when acriticalfires for the samealertname+instance - Suppress
availability_dropalerts whenmulti_provider_degradationfires (prevents flood from individual providers)
| File | Purpose |
|---|---|
alertmanager/Dockerfile |
prom/alertmanager:v0.27.0, exposes 9093 |
alertmanager/alertmanager.yml |
Routing tree + email receivers (placeholders substituted at startup) |
alertmanager/entrypoint.sh |
Substitutes env var placeholders → hands off to alertmanager binary |
alertmanager/railway.toml |
Railway build + healthcheck config |
Both alerting layers share the same SMTP configuration. Set these once via environment variables.
| Variable | Example | Notes |
|---|---|---|
SMTP_FROM |
alerts@gatewayz.ai |
From address for Alertmanager emails |
SMTP_USER |
alerts@gatewayz.ai |
SMTP auth username |
SMTP_PASSWORD |
app-password-here |
SMTP app-password (not your main password) |
SMTP_HOST |
smtp.gmail.com:465 |
Default: smtp.gmail.com:465 |
ALERT_EMAIL_OPS |
team@company.com |
Operational alerts recipient(s) |
ALERT_EMAIL_CRIT |
oncall@company.com |
Critical/pager alerts recipient(s) |
For Grafana email (Layer 1), also set
GF_SMTP_ENABLED=true,GF_SMTP_HOST,GF_SMTP_USER,GF_SMTP_PASSWORD,GF_SMTP_FROM_ADDRESSon the Grafana service — or use the sharedGF_SMTP_*vars whichalertmanager/entrypoint.shwill fall back to automatically.
The alertmanager service reads SMTP_FROM, SMTP_USER, SMTP_PASSWORD from the environment or docker-compose environment: block. No additional configuration needed — entrypoint.sh substitutes placeholders at startup.
# Test alertmanager is up
curl http://localhost:9093/-/healthy
# Check active alerts
curl http://localhost:9093/api/v2/alerts
# Send a test alert via Prometheus API (fires for 1 minute)
curl -X POST http://localhost:9090/api/v1/alerts \
-H "Content-Type: application/json" \
-d '[{"labels":{"alertname":"TestAlert","severity":"warning"}}]'Deploy Alertmanager as a separate Railway service in the same project as Prometheus and Grafana so it shares the private railway.internal network.
- In the Railway dashboard, add a new service sourced from this repository
- Set the root directory to
alertmanager/(or use thealertmanager/railway.tomlconfig) - Set required env vars on the Alertmanager service:
SMTP_FROM=alerts@gatewayz.ai SMTP_USER=alerts@gatewayz.ai SMTP_PASSWORD=<app-password> ALERT_EMAIL_OPS=team@company.com ALERT_EMAIL_CRIT=oncall@company.com - Set the following env var on the Prometheus service so it routes alerts to Alertmanager:
(If not set,
ALERTMANAGER_INTERNAL_URL=http://alertmanager.railway.internal:9093prometheus/entrypoint.shauto-detects Railway environment and usesalertmanager.railway.internal:9093as the default.)
File: prometheus/alert.rules.yml
| Group | Alerts | Purpose |
|---|---|---|
| service_health | 3 | Is the service up and responding? |
| api_performance | 3 | Are API responses healthy? |
| provider_health | 3 | Are upstream AI providers working? |
| infrastructure | 5 | Is the monitoring stack itself healthy? |
| Alert | Severity | Trigger | Action |
|---|---|---|---|
GatewayZAPIDown |
critical | Prometheus can't scrape API for 2m | Check Railway deployment |
HighErrorRate |
critical | >10% error rate for 5m | Check Loki logs, recent deployments |
AvailabilitySLOBreach |
critical | <99.5% success rate over 1h | Initiate incident response |
HighAPILatency |
warning | P95 > 3s for 5m | Check slow endpoints, providers |
LatencyDegradation |
warning | 50% latency increase vs 1h ago | Check recent changes, resources |
TrafficSpike |
warning | 3x traffic increase for 10m | Analyze traffic, check for abuse |
ProviderHighErrorRate |
critical | >20% errors per provider for 5m | Check provider status, failover |
SlowProviderResponse |
warning | P95 > 5s per provider for 10m | Monitor provider, adjust timeouts |
LowModelHealthScore |
warning | <80% success rate for 5m | Review errors across providers |
ScrapeTargetDown |
warning | Any scrape target down for 5m | Check target health, network |
MimirRemoteWriteFailures |
warning | Failed samples to Mimir | Check Mimir health, storage |
MimirDown |
critical | Mimir unreachable for 2m | Check container, storage volume |
TempoNoTraces |
warning | No traces for 15m | Check OTLP endpoint, backend config |
LokiNoLogs |
warning | No logs for 15m | Check Loki health, log shipping |
- Grafana Alerting UI: Alerting → Alert rules → firing/pending
- Prometheus UI: http://localhost:9090/alerts
- Alertmanager UI: http://localhost:9093 (silence management, active alerts)
- API:
# Firing alerts from Alertmanager curl http://localhost:9093/api/v2/alerts | jq '.[].labels' # Firing alerts from Grafana curl http://localhost:3000/api/alertmanager/grafana/api/v2/alerts \ -H "Authorization: Bearer $GRAFANA_API_KEY"
- 25+ Real API Endpoints — Integration tests with performance validation
- 7 Production Dashboards — Schema validation and configuration checks
- 90+ Test Methods — Comprehensive coverage across all components
- GitHub Actions Workflows — Automated validation on every deployment
# Setup (required once)
cp .env.example .env
# Edit .env and add your API keys
# Run all tests
pytest tests/ -v
# Dashboard validation
./scripts/validate_dashboards.sh strict
# Endpoint testing
export API_KEY="your_api_key"
./scripts/test_all_endpoints.sh "$API_KEY" https://api.gatewayz.ai
# Specific test categories
pytest tests/test_dashboards.py -v -m dashboard
pytest tests/test_api_endpoints.py -v -m endpoint# 1. Push to Railway
git push railway main
# 2. Or use Railway CLI
railway upEnvironment variables — set per service in the Railway dashboard:
Grafana service:
PROMETHEUS_INTERNAL_URL=http://prometheus.railway.internal:9090
MIMIR_INTERNAL_URL=http://mimir.railway.internal:9009
LOKI_INTERNAL_URL=http://loki.railway.internal:3100
TEMPO_INTERNAL_URL=http://tempo.railway.internal:3200
PYROSCOPE_INTERNAL_URL=http://pyroscope.railway.internal:4040
JSON_API_URL=http://json-api-proxy.railway.internal:5050
GF_SMTP_ENABLED=true
GF_SMTP_HOST=smtp.gmail.com:465
GF_SMTP_USER=alerts@gatewayz.ai
GF_SMTP_PASSWORD=<app-password>
GF_SMTP_FROM_ADDRESS=alerts@gatewayz.ai
Prometheus service:
FASTAPI_TARGET=api.gatewayz.ai:443
MIMIR_INTERNAL_URL=http://mimir.railway.internal:9009
ALERTMANAGER_INTERNAL_URL=http://alertmanager.railway.internal:9093
Alertmanager service (deploy alertmanager/ as a separate Railway service):
SMTP_FROM=alerts@gatewayz.ai
SMTP_USER=alerts@gatewayz.ai
SMTP_PASSWORD=<app-password>
SMTP_HOST=smtp.gmail.com:465
ALERT_EMAIL_OPS=team@company.com
ALERT_EMAIL_CRIT=oncall@company.com
JSON-API-Proxy service:
GATEWAYZ_API_URL=https://api.gatewayz.ai
# Start all services
docker compose up --build
# Start specific services
docker compose up grafana prometheus mimir
# View logs
docker compose logs -f grafana
# Stop all services
docker compose down
# Clean volumes (WARNING: deletes all data)
docker compose down -vDeployment Guide: docs/deployment/RAILWAY_DEPLOYMENT_QUICK_START.md
- Documentation Index — Start here for all docs
- MASTER.md — Full architectural wiki (16 sections)
- ACCEPTANCE_CRITERIA.md — Acceptance criteria for all 25 project tasks
- Backend Integration — Required metrics and instrumentation
- Railway Deployment — Deploy to Railway
- Troubleshooting — Service-specific fix guides
- Mimir Architecture — Long-term metrics storage
- Pyroscope Architecture — Continuous profiling setup
- JSON-API-Proxy Architecture — Provider health bridge
railway-grafana-stack/
├── alertmanager/
│ ├── Dockerfile # prom/alertmanager:v0.27.0, port 9093
│ ├── alertmanager.yml # Routing tree + email receivers (placeholders)
│ ├── entrypoint.sh # Substitutes env vars → execs alertmanager
│ └── railway.toml # Railway build + healthcheck config
├── grafana/
│ ├── Dockerfile
│ ├── dashboards/
│ │ ├── golden-signals/ # Four Golden Signals (Latency/Traffic/Errors/Profiling)
│ │ ├── model_performance/ # Inference-Call-Profile, Model-Usage, Cache-Layer-Profile
│ │ ├── loki/ # Loki logs (pure logs)
│ │ ├── prometheus/ # Prometheus self-monitoring
│ │ ├── tempo/ # Tempo traces (pure traces)
│ │ └── mimir/ # Mimir long-term metrics
│ ├── datasources/
│ │ └── datasources.yml # Prometheus, Mimir, Loki, Tempo, Pyroscope, JSON API
│ └── provisioning/
│ ├── dashboards/
│ │ └── dashboards.yml
│ └── alerting/
│ ├── rules/ # Grafana alert rule YAML files (Layer 1)
│ ├── contact_points.yml
│ └── notification_policies.yml
├── json-api-proxy/
│ ├── Dockerfile
│ ├── app.py # Flask service translating /prometheus/data/metrics → Simple JSON
│ └── railway.toml
├── prometheus/
│ ├── Dockerfile
│ ├── entrypoint.sh # Environment-based target resolution
│ ├── prometheus.yml # Scrape jobs + remote_write to Mimir + alerting block
│ ├── alert.rules.yml # Prometheus alert rules (Layer 2 — sent to Alertmanager)
│ └── recording_rules_baselines.yml # 32 recording rules for anomaly detection
├── pyroscope/
│ ├── Dockerfile
│ └── pyroscope.yml # Self-hosted Pyroscope configuration
├── mimir/
│ ├── Dockerfile
│ └── mimir.yml # Mimir configuration (30d retention)
├── loki/
│ ├── Dockerfile
│ └── loki.yml # Loki configuration
├── tempo/
│ ├── Dockerfile
│ ├── entrypoint.sh # Environment-based configuration
│ └── tempo.yml # Tempo configuration
├── scripts/
│ ├── pre-build-cleanup.sh # Railway pre-deploy cleanup
│ └── ... # Other validation scripts
├── tests/ # Pytest test suite
├── docs/ # Documentation
├── MASTER.md # Full architectural wiki
├── ACCEPTANCE_CRITERIA.md # Acceptance criteria for all 25 Kanban tasks
├── railway.toml # Railway deployment configuration
├── docker-compose.yml # Local development
└── README.md # This file
- Create feature branch:
git checkout -b feature/my-feature - Make changes and test locally with
docker compose up - Run tests:
pytest tests/ -v - Validate dashboards:
./scripts/validate_dashboards.sh strict - Create pull request to
main
The biggest mistake in any observability setup is treating each panel as a silo. The stack is configured for end-to-end click-through navigation so you always move toward the root cause rather than copy-pasting IDs between tabs.
| From | To | Mechanism | How to trigger |
|---|---|---|---|
| Metric graph | Tempo trace | Exemplar (blue ◆ dot on latency graphs) | Click the blue dot on any histogram panel |
| Loki log line | Tempo trace | Derived Field on trace_id JSON field |
Click "View Trace" button next to any log entry |
| Loki log label | Tempo trace | Derived Field on trace_id Loki label |
Click "View Trace" button in label sidebar |
| Tempo span | Mimir metric | tracesToMetrics → grafana_mimir |
In Tempo, click a span → "Related metrics" |
| Tempo span | Loki logs | tracesToLogs → grafana_loki |
In Tempo, click a span → "Related logs" |
| Tempo span | Pyroscope flamegraph | tracesToProfiles → grafana_pyroscope |
In Tempo, click a span → "View Profile" |
| Tempo service graph | Node topology | serviceMap → grafana_mimir |
Service Graph & Topology section in dashboard |
Exemplars (Mimir + Prometheus → Tempo): Both datasources have exemplarTraceIdDestinations set to trace_id (underscore, matching the OpenTelemetry field name the backend emits). The field names are now consistent — previously Mimir was using traceId (camelCase), which would have silently failed.
Derived Fields (Loki → Tempo): Two matchers are configured — one for JSON-structured log lines ("trace_id": "...") and one for Loki labels. Both resolve to the Tempo datasource so the button appears regardless of how the backend emits the ID.
Service Graph (Tempo → Mimir): Tempo's metrics_generator with the service-graphs processor generates traces_service_graph_* metrics and remote-writes them directly to Mimir — they are never in Prometheus. The serviceMap datasource is correctly set to grafana_mimir. Similarly, tracesToMetrics points to Mimir because span metrics (traces_spanmetrics_*) are also Mimir-only.
For the $service template variable and cross-signal filtering to work reliably, the backend must emit consistent resource attributes across all four signals:
service.name = "gatewayz-api" # must match in spans, logs, and metrics
instance.id = "<pod-or-host-id>" # must match for per-instance filtering
Configure this once in the OpenTelemetry SDK resource at process startup — it propagates to Tempo (spans), Loki (log labels via the OTEL log handler), and Prometheus (target labels via relabeling).
The stack is LGTMP: Loki · Grafana · Tempo · Mimir · Pyroscope.
The gatewayz-backend and railway-grafana-stack live in two separate Railway projects. Pyroscope runs as a service inside this (grafana-stack) project, so Grafana reaches it via Railway's internal DNS at zero cost. The backend, being in a different project, pushes profiles over Pyroscope's public Railway domain.
gatewayz-backend (Railway project A)
└─ pyroscope-io SDK → PUSH (HTTP) every 15 s
└──────────────────────────────────────────────────────────────►
https://<pyroscope-public-domain>.up.railway.app
│
railway-grafana-stack (Railway project B)
├─ Pyroscope service (:4040, internal)
│ stores profiles on /data/pyroscope
└─ Grafana READ via http://pyroscope.railway.internal:4040
| Service | Port | Notes |
|---|---|---|
| Pyroscope | 4040 | Internal only. Backend pushes via public domain. Grafana reads via .railway.internal. |
The Four Golden Signals tell you what is wrong. Profiling tells you which line of code is causing it.
| Four Golden Signals say | Profiling adds |
|---|---|
| CPU saturation at 90% | Which function burns the cycles? Token counter? JSON streaming serializer? |
| Memory growing 50 MB/hour | Which object accumulates? A model-catalog cache with no TTL? An open SSE connection? |
| P99 latency spiked to 8 s | What was the thread actually doing? Waiting on a Redis lock? A slow provider HTTP call? |
Every inference call in chat_handler.py is wrapped with pyroscope.tag_wrapper() so flamegraphs can be filtered by the upstream provider and model:
Inference Profiling dashboard → filter $provider=openrouter, $model=claude-3-5-sonnet
→ see exactly which Python functions consumed CPU during those calls
Tags applied at the _call_provider / _call_provider_stream boundaries:
provider— e.g.openrouter,cerebras,groqmodel— e.g.claude-3-5-sonnet-20241022service_name— alwaysgatewayz-backendenvironment— Railway environment (production/staging/local)
Step 1 — Generate a public domain for Pyroscope in Railway:
Railway dashboard → grafana-stack project → Pyroscope service → Settings → Generate Domain
Copy the generated URL (e.g. https://pyroscope-production-xxxx.up.railway.app).
Step 2 — Set env vars on the backend service (Project A):
PYROSCOPE_ENABLED=true
PYROSCOPE_SERVER_ADDRESS=https://pyroscope-production-xxxx.up.railway.app
No auth variables needed — self-hosted Pyroscope has no authentication by default.
Step 3 — Set env vars on the Grafana service (Project B):
PYROSCOPE_INTERNAL_URL=http://pyroscope.railway.internal:4040
That's it. Grafana reaches Pyroscope via the internal network; the backend pushes over the public domain.
With docker compose up, Pyroscope starts automatically. Grafana uses the http://pyroscope:4040 default (set via the PYROSCOPE_INTERNAL_URL docker-compose env var). To push profiles locally, set PYROSCOPE_SERVER_ADDRESS=http://localhost:4040 on the backend.
- Inference Profiling dashboard (
model_performance/Inference-Profiling) — flamegraph + CPU bargauges broken down by provider and model, sample rate over time - Trace → Profile drill-down — click any slow span in Tempo → "View Profile" jumps to the Pyroscope flamegraph at that exact timestamp, no manual matching needed
- Always-on sampling — 100 Hz regardless of traffic, catches every P99 outlier (unlike Sentry's 5% transaction sample rate)
- Grafana Documentation
- Prometheus Documentation
- Grafana Mimir Documentation
- Loki Documentation
- Tempo Documentation
- OpenTelemetry Documentation
- Pyroscope Documentation
- Railway Documentation
- Documentation Issues: Check docs/troubleshooting/
- Backend Integration: See docs/backend/BACKEND_METRICS_REQUIREMENTS.md
- Deployment Help: Review docs/deployment/RAILWAY_DEPLOYMENT_QUICK_START.md
- Architecture Questions: See MASTER.md
Proprietary — GatewayZ Network
GatewayZ Observability Stack · Enterprise-grade monitoring for AI infrastructure · Powered by Prometheus, Alertmanager, Mimir, Loki, Tempo, Pyroscope, and Grafana