-
Notifications
You must be signed in to change notification settings - Fork 1
Monitoring System
The Gatewayz backend features a comprehensive, multi-layered monitoring system that tracks the health, performance, and usage of 10,000+ models across 16+ providers. The monitoring system includes real-time metrics, long-term analytics, anomaly detection, circuit breakers, and public status pages.
- Architecture
- Monitoring Endpoints
- Data Collection
- Authentication & Authorization
- Integration Guide
- Configuration
- Best Practices
┌─────────────────────────────────────────────────────────┐
│ Client Applications │
│ (Admin Panel, Status Page, Dashboards) │
└──────────────────┬──────────────────────────────────────┘
│
┌──────────────────▼──────────────────────────────────────┐
│ API Layer │
│ - Monitoring Routes (/api/monitoring/*) │
│ - Health Routes (/health/*) │
│ - Model Health Routes (/v1/model-health/*) │
│ - Status Page Routes (/v1/status/*) │
│ - Analytics Routes (/v1/analytics/*) │
└──────────────────┬──────────────────────────────────────┘
│
┌──────────────────▼──────────────────────────────────────┐
│ Service Layer │
│ - Model Health Monitor (Background Service) │
│ - Metrics Aggregator (Hourly) │
│ - Prometheus Metrics Service │
│ - Redis Metrics Service │
│ - Analytics Service (Anomaly Detection) │
└──────────────────┬──────────────────────────────────────┘
│
┌──────────────────▼──────────────────────────────────────┐
│ Data Storage Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌───────────────┐ │
│ │ Redis │ │ PostgreSQL │ │ Prometheus │ │
│ │ (Real-time) │ │ (Long-term) │ │ (Metrics) │ │
│ └──────────────┘ └──────────────┘ └───────────────┘ │
└─────────────────────────────────────────────────────────┘
│
┌──────────────────▼──────────────────────────────────────┐
│ External Services │
│ - Sentry (Error Monitoring) │
│ - Statsig (Feature Flags & Analytics) │
│ - PostHog (Product Analytics) │
│ - Braintrust (AI Observability) │
└─────────────────────────────────────────────────────────┘
- Request Processing: Every API request is tracked via middleware
- Real-time Metrics: Metrics are stored in Redis for fast access (1-24 hours)
- Hourly Aggregation: Redis metrics are aggregated into PostgreSQL every hour
- Health Monitoring: Background service checks model health based on tiered schedules
- Anomaly Detection: System analyzes metrics for cost/latency spikes and high error rates
- Status Pages: Public and private dashboards pull from materialized views
Authentication: API Key Required
| Endpoint | Method | Description | Query Parameters |
|---|---|---|---|
/api/monitoring/health |
GET | All provider health scores (0-100) | None |
/api/monitoring/health/{provider} |
GET | Specific provider health score | None |
/api/monitoring/errors/{provider} |
GET | Recent errors for a provider |
limit (1-1000, default: 100) |
/api/monitoring/stats/realtime |
GET | Real-time statistics from Redis |
hours (1-24, default: 1) |
/api/monitoring/stats/hourly/{provider} |
GET | Hourly statistics for a provider |
hours (1-168, default: 24) |
/api/monitoring/circuit-breakers |
GET | All circuit breaker states | None |
/api/monitoring/circuit-breakers/{provider} |
GET | Provider-specific circuit breaker states | None |
/api/monitoring/providers/comparison |
GET | Compare all providers across key metrics | None |
/api/monitoring/latency/{provider}/{model} |
GET | Latency percentiles (p50, p95, p99) | None |
/api/monitoring/anomalies |
GET | Detected anomalies | None |
/api/monitoring/trial-analytics |
GET | Trial funnel metrics | None |
/api/monitoring/cost-analysis |
GET | Cost breakdown by provider |
days (1-90, default: 7) |
/api/monitoring/latency-trends/{provider} |
GET | Latency trends over time |
hours (1-168, default: 24) |
/api/monitoring/error-rates |
GET | Error rates by model |
hours (1-168, default: 24) |
/api/monitoring/token-efficiency/{provider}/{model} |
GET | Token efficiency metrics | None |
Example Request:
curl -X GET "https://api.gatewayz.io/api/monitoring/stats/realtime?hours=24" \
-H "Authorization: Bearer YOUR_API_KEY"Example Response (/api/monitoring/stats/realtime):
{
"timeframe": "Last 24 hours",
"providers": [
{
"provider": "openai",
"total_requests": 15420,
"successful_requests": 15234,
"failed_requests": 186,
"total_cost_credits": 1842.56,
"total_tokens": 2456789,
"avg_latency_ms": 342.5,
"error_rate": 0.012
}
]
}Authentication: Mixed (see table)
| Endpoint | Method | Description | Auth Required |
|---|---|---|---|
/health |
GET | Simple health check (always returns 200) | No |
/health/system |
GET | Overall system health metrics | Yes |
/health/providers |
GET | Health metrics for all providers | Yes |
/health/models |
GET | Health metrics for all models | Yes |
/health/model/{model_id} |
GET | Health metrics for specific model | Yes |
/health/provider/{provider} |
GET | Health metrics for specific provider | Yes |
/health/summary |
GET | Comprehensive health summary | Yes |
/health/check |
POST | Perform immediate health check (background) | Yes |
/health/check/now |
POST | Perform immediate health check (sync) | Yes |
/health/uptime |
GET | Uptime metrics for frontend integration | Yes |
/health/dashboard |
GET | Complete health dashboard data | Yes |
/health/status |
GET | Simple health status | Yes |
/health/monitoring/status |
GET | Monitoring service status | Yes |
/health/monitoring/start |
POST | Start health monitoring service | Yes |
/health/monitoring/stop |
POST | Stop health monitoring service | Yes |
/health/google-vertex |
GET | Google Vertex AI provider health | No |
/health/database |
GET | Database connectivity | No |
Example Request:
curl -X GET "https://api.gatewayz.io/health/dashboard" \
-H "Authorization: Bearer YOUR_API_KEY"Example Response (/health/dashboard):
{
"system": {
"status": "healthy",
"uptime_seconds": 864532,
"total_providers": 16,
"healthy_providers": 15,
"total_models": 10234,
"healthy_models": 9856
},
"providers": [
{
"provider": "openai",
"status": "healthy",
"models_count": 45,
"healthy_models": 44,
"avg_response_time_ms": 342.5,
"uptime_24h": 99.8,
"circuit_breaker_state": "closed"
}
],
"recent_incidents": []
}Authentication: Optional (Enhanced features with API key)
| Endpoint | Method | Description | Query Parameters |
|---|---|---|---|
/v1/model-health |
GET | All model health metrics |
provider, min_calls, status, limit, offset
|
/v1/model-health/{provider}/{model} |
GET | Specific model health metrics | None |
/v1/model-health/unhealthy |
GET | Models with high error rates |
threshold (default: 0.1) |
/v1/model-health/stats |
GET | Aggregate model health statistics | None |
/v1/model-health/provider/{provider}/summary |
GET | Provider health summary | None |
/v1/model-health/providers |
GET | List all providers with health data | None |
Example Request:
curl -X GET "https://api.gatewayz.io/v1/model-health/openai/gpt-4"Example Response:
{
"provider": "openai",
"model": "gpt-4",
"status": "healthy",
"last_response_time_ms": 356,
"average_response_time_ms": 342,
"call_count": 45678,
"success_count": 45234,
"error_count": 444,
"error_rate": 0.0097,
"uptime_24h": 99.8,
"uptime_7d": 99.6,
"uptime_30d": 99.5,
"circuit_breaker_state": "closed",
"last_called_at": "2025-12-02T08:45:23Z",
"last_success_at": "2025-12-02T08:45:23Z"
}Authentication: None (Public)
| Endpoint | Method | Description | Query Parameters |
|---|---|---|---|
/v1/status/ |
GET | Overall system status | None |
/v1/status/providers |
GET | Status for all providers | None |
/v1/status/models |
GET | Status for models |
provider, status, limit, offset
|
/v1/status/models/{provider}/{model_id} |
GET | Status for specific model | None |
/v1/status/incidents |
GET | Recent incidents |
provider, severity, limit
|
/v1/status/uptime/{provider}/{model_id} |
GET | Uptime history (24h, 7d, 30d) | None |
/v1/status/search |
GET | Search models |
q (query string) |
/v1/status/stats |
GET | Overall statistics | None |
Example Request:
curl -X GET "https://api.gatewayz.io/v1/status/providers"Example Response:
{
"providers": [
{
"provider": "openai",
"status": "operational",
"total_models": 45,
"healthy_models": 44,
"degraded_models": 1,
"offline_models": 0,
"uptime_24h": 99.8,
"avg_response_time_ms": 342.5
}
],
"last_updated": "2025-12-02T08:45:23Z"
}Authentication: User Authentication Required
| Endpoint | Method | Description | Body |
|---|---|---|---|
/v1/analytics/events |
POST | Log analytics event |
event_name, properties
|
/v1/analytics/batch |
POST | Log multiple analytics events | Array of events |
Authentication: None
Exposes Prometheus-compatible metrics for scraping.
Metrics Categories:
- HTTP request metrics (count, duration, size)
- Model inference metrics (requests, duration, tokens, credits)
- Database metrics (queries, duration)
- Cache metrics (hits, misses, size)
- Provider health metrics (availability, error rate, response time)
- Performance stage metrics (TTFB, streaming, frontend processing)
- Business metrics (credit balance, trials, subscriptions, rate limits)
Example Prometheus Query:
# Average latency per provider
avg(model_inference_duration_seconds) by (provider)
# Error rate by model
rate(model_inference_requests_total{status="error"}[5m]) by (model)
TTL: 24 hours
Service: src/services/redis_metrics.py
-
Request Metrics: Individual request tracking
- Provider, model, latency, success/failure, cost, tokens
- Key pattern:
metrics:request:{timestamp}
-
Hourly Aggregates: Rolling hourly statistics
- Total requests, costs, tokens per provider/hour
- Key pattern:
metrics:hourly:{provider}:{hour}
-
Latency Tracking: Sorted sets for percentile calculations
- Last hour of latency measurements
- Key pattern:
metrics:latency:{provider}:{model}
-
Error Tracking: Recent error messages
- Last 100 errors per provider with timestamps
- Key pattern:
metrics:errors:{provider}
-
Provider Health Scores: 0-100 scale
- Adjusted by success/failure rates
- Key pattern:
metrics:health:{provider}
-
Circuit Breaker States: CLOSED, OPEN, HALF_OPEN
- Fault tolerance state per provider/model
- Key pattern:
circuit_breaker:{provider}:{model}
- Total requests (per provider/hour)
- Successful/failed requests
- Input/output tokens
- Total cost (credits/USD)
- Latency percentiles (p50, p95, p99)
- Error messages and counts
- Health scores (0-100)
Tables:
Primary table for model health metrics.
Columns:
-
provider,model(composite primary key) -
last_response_time_ms,average_response_time_ms -
last_status,call_count,success_count,error_count -
last_called_at,last_success_at,last_failure_at last_error_message-
gateway,monitoring_tier(critical/popular/standard/on_demand) -
uptime_percentage_24h,uptime_percentage_7d,uptime_percentage_30d -
consecutive_failures,consecutive_successes -
circuit_breaker_state(closed/open/half_open) -
priority_score,usage_count_24h,usage_count_7d,usage_count_30d -
next_check_at,check_interval_seconds -
is_enabled,metadata(JSONB)
Indexes:
-
idx_model_health_provideron(provider) -
idx_model_health_statuson(last_status) -
idx_model_health_tieron(monitoring_tier) -
idx_model_health_next_checkon(next_check_at)
Hourly aggregated metrics for historical analysis.
Columns:
-
hour,provider,model(unique constraint) -
total_requests,successful_requests,failed_requests -
total_tokens_input,total_tokens_output total_cost_credits-
avg_latency_ms,p50_latency_ms,p95_latency_ms,p99_latency_ms -
min_latency_ms,max_latency_ms error_rate
Indexes:
-
idx_hourly_aggregates_houron(hour) -
idx_hourly_aggregates_provideron(provider)
Incident tracking and resolution.
Columns:
-
id(UUID, primary key) -
provider,model,gateway -
incident_type(outage/degradation/timeout/rate_limit) -
severity(critical/high/medium/low) -
started_at,resolved_at,duration_seconds -
error_message,error_count,affected_requests -
status(active/resolved/acknowledged) -
resolution_notes,metadata(JSONB)
Indexes:
-
idx_incidents_provider_modelon(provider, model) -
idx_incidents_statuson(status) -
idx_incidents_started_aton(started_at)
Time-series data for trend analysis.
Columns:
-
id(UUID, primary key) -
provider,model,gateway -
checked_at,status,response_time_ms -
error_message,http_status_code -
circuit_breaker_state,metadata(JSONB)
Indexes:
-
idx_history_provider_modelon(provider, model) -
idx_history_checked_aton(checked_at)
Partitioning: Partitioned by month for performance
Pre-computed statistics for fast queries.
Columns:
-
provider,model,gateway -
aggregation_period(hour/day/week/month) -
period_start,period_end -
total_checks,successful_checks,failed_checks -
avg_response_time_ms,min_response_time_ms,max_response_time_ms -
p50_response_time_ms,p95_response_time_ms,p99_response_time_ms -
uptime_percentage,incident_count
Indexes:
-
idx_aggregates_provider_model_periodon(provider, model, aggregation_period) -
idx_aggregates_period_starton(period_start)
Refreshed every 5 minutes for fast queries.
Fast provider comparison (last 24 hours).
Columns:
-
provider,total_requests,successful_requests,failed_requests -
avg_latency_ms,total_cost_credits,total_tokens -
avg_error_rate,unique_models_count
Current model status for public status page.
Columns:
-
provider,model,gateway -
status(operational/degraded/partial_outage/major_outage) -
uptime_24h,uptime_7d,uptime_30d -
last_response_time_ms,circuit_breaker_state -
active_incidents_count,last_checked_at
Provider-level health aggregation.
Columns:
-
provider,total_models,healthy_models,degraded_models,offline_models -
avg_uptime_24h,avg_response_time_ms -
status(operational/degraded/major_outage) last_updated_at
Scrape Interval: 10 seconds
Service: src/services/prometheus_metrics.py
-
Counters (monotonically increasing):
fastapi_requests_totalmodel_inference_requests_totaltokens_used_totalcredits_used_totaldatabase_queries_total-
cache_hits_total,cache_misses_total fastapi_exceptions_totalrate_limited_requests_total
-
Histograms (distributions):
fastapi_requests_duration_secondsmodel_inference_duration_secondsdatabase_query_duration_secondsprovider_response_time_secondsbackend_ttfb_secondsstreaming_duration_secondsfrontend_processing_secondsrequest_stage_duration_seconds
-
Gauges (point-in-time values):
fastapi_requests_in_progressfastapi_request_size_bytesfastapi_response_size_bytescache_size_bytesprovider_availabilityprovider_error_rateuser_credit_balancetrial_activesubscription_countstage_percentage
Dependency: get_api_key from src/security/deps.py
Used by:
- All
/api/monitoring/*endpoints - Most
/health/*endpoints
Validation Checks:
- API key exists in database
- API key is active (not revoked)
- API key is not expired
- Request limits not exceeded (rate limiting)
- IP address in allowlist (if configured)
- Domain restrictions (if configured)
Header Format:
Authorization: Bearer YOUR_API_KEY
Error Codes:
-
401 Unauthorized: Invalid or missing API key -
403 Forbidden: API key revoked or access denied -
429 Too Many Requests: Rate limit exceeded
Dependency: get_current_user from src/security/deps.py
Used by:
/v1/analytics/events/v1/analytics/batch
Authentication Methods:
- JWT token (from login)
- Session cookie
Dependency: get_optional_user from src/security/deps.py
Used by:
-
/v1/model-health/*endpoints
Behavior:
- Allows anonymous access
- Enhanced features for authenticated users
- User tracking if authenticated
No authentication required:
-
/health- Simple health check -
/metrics- Prometheus metrics -
/v1/status/*- Public status page endpoints -
/health/google-vertex- Provider diagnostics -
/health/database- Database health
All monitoring tables have RLS enabled.
Policies:
- Service Role: Full access to all tables (for background services)
- Authenticated Users: Read access to metrics, incidents, history, aggregates
-
Anonymous Users: Read access to
model_status_currentandprovider_health_currentviews only
// src/lib/monitoring-client.ts
export class MonitoringClient {
private baseUrl: string;
private apiKey: string;
constructor(baseUrl: string, apiKey: string) {
this.baseUrl = baseUrl;
this.apiKey = apiKey;
}
private async request<T>(endpoint: string, options?: RequestInit): Promise<T> {
const response = await fetch(`${this.baseUrl}${endpoint}`, {
...options,
headers: {
'Authorization': `Bearer ${this.apiKey}`,
'Content-Type': 'application/json',
...options?.headers,
},
});
if (!response.ok) {
throw new Error(`API request failed: ${response.statusText}`);
}
return response.json();
}
// Real-time Stats
async getRealtimeStats(hours: number = 1) {
return this.request(`/api/monitoring/stats/realtime?hours=${hours}`);
}
// Provider Health
async getProviderHealth(provider?: string) {
const endpoint = provider
? `/api/monitoring/health/${provider}`
: `/api/monitoring/health`;
return this.request(endpoint);
}
// Circuit Breakers
async getCircuitBreakers(provider?: string) {
const endpoint = provider
? `/api/monitoring/circuit-breakers/${provider}`
: `/api/monitoring/circuit-breakers`;
return this.request(endpoint);
}
// Anomalies
async getAnomalies() {
return this.request('/api/monitoring/anomalies');
}
// Provider Comparison
async getProviderComparison() {
return this.request('/api/monitoring/providers/comparison');
}
// Cost Analysis
async getCostAnalysis(days: number = 7) {
return this.request(`/api/monitoring/cost-analysis?days=${days}`);
}
// Error Rates
async getErrorRates(hours: number = 24) {
return this.request(`/api/monitoring/error-rates?hours=${hours}`);
}
// Latency Trends
async getLatencyTrends(provider: string, hours: number = 24) {
return this.request(`/api/monitoring/latency-trends/${provider}?hours=${hours}`);
}
// Model Health
async getModelHealth(provider: string, model: string) {
return this.request(`/v1/model-health/${provider}/${model}`);
}
// Recent Errors
async getRecentErrors(provider: string, limit: number = 100) {
return this.request(`/api/monitoring/errors/${provider}?limit=${limit}`);
}
}// src/hooks/useMonitoring.ts
import { useQuery } from '@tanstack/react-query';
import { MonitoringClient } from '@/lib/monitoring-client';
export function useRealtimeStats(hours: number = 1) {
const client = new MonitoringClient(
process.env.NEXT_PUBLIC_API_URL,
process.env.NEXT_PUBLIC_API_KEY
);
return useQuery({
queryKey: ['monitoring', 'realtime', hours],
queryFn: () => client.getRealtimeStats(hours),
refetchInterval: 30000, // Refresh every 30 seconds
});
}
export function useProviderHealth(provider?: string) {
const client = new MonitoringClient(
process.env.NEXT_PUBLIC_API_URL,
process.env.NEXT_PUBLIC_API_KEY
);
return useQuery({
queryKey: ['monitoring', 'health', provider],
queryFn: () => client.getProviderHealth(provider),
refetchInterval: 60000, // Refresh every minute
});
}
export function useAnomalies() {
const client = new MonitoringClient(
process.env.NEXT_PUBLIC_API_URL,
process.env.NEXT_PUBLIC_API_KEY
);
return useQuery({
queryKey: ['monitoring', 'anomalies'],
queryFn: () => client.getAnomalies(),
refetchInterval: 120000, // Refresh every 2 minutes
});
}// src/components/monitoring-dashboard.tsx
'use client';
import { useRealtimeStats, useProviderHealth, useAnomalies } from '@/hooks/useMonitoring';
import { Card, CardHeader, CardTitle, CardContent } from '@/components/ui/card';
export function MonitoringDashboard() {
const { data: realtimeStats, isLoading: statsLoading } = useRealtimeStats(24);
const { data: providerHealth, isLoading: healthLoading } = useProviderHealth();
const { data: anomalies, isLoading: anomaliesLoading } = useAnomalies();
if (statsLoading || healthLoading || anomaliesLoading) {
return <div>Loading...</div>;
}
return (
<div className="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-3 gap-4">
{/* Real-time Stats */}
<Card>
<CardHeader>
<CardTitle>Real-time Stats (24h)</CardTitle>
</CardHeader>
<CardContent>
{realtimeStats?.providers.map((provider) => (
<div key={provider.provider} className="mb-4">
<h3 className="font-semibold">{provider.provider}</h3>
<p>Requests: {provider.total_requests.toLocaleString()}</p>
<p>Success Rate: {((provider.successful_requests / provider.total_requests) * 100).toFixed(2)}%</p>
<p>Avg Latency: {provider.avg_latency_ms}ms</p>
<p>Cost: ${provider.total_cost_credits.toFixed(2)}</p>
</div>
))}
</CardContent>
</Card>
{/* Provider Health */}
<Card>
<CardHeader>
<CardTitle>Provider Health</CardTitle>
</CardHeader>
<CardContent>
{providerHealth?.providers.map((provider) => (
<div key={provider.provider} className="mb-4">
<div className="flex justify-between items-center">
<h3 className="font-semibold">{provider.provider}</h3>
<span className={`px-2 py-1 rounded ${
provider.health_score >= 90 ? 'bg-green-500' :
provider.health_score >= 70 ? 'bg-yellow-500' :
'bg-red-500'
} text-white text-sm`}>
{provider.health_score}
</span>
</div>
<p className="text-sm text-gray-600">{provider.status}</p>
</div>
))}
</CardContent>
</Card>
{/* Anomalies */}
<Card>
<CardHeader>
<CardTitle>Detected Anomalies</CardTitle>
</CardHeader>
<CardContent>
{anomalies?.anomalies.length === 0 ? (
<p className="text-green-600">No anomalies detected</p>
) : (
anomalies?.anomalies.map((anomaly, idx) => (
<div key={idx} className={`mb-4 p-3 rounded ${
anomaly.severity === 'critical' ? 'bg-red-100' :
anomaly.severity === 'warning' ? 'bg-yellow-100' :
'bg-blue-100'
}`}>
<h4 className="font-semibold">{anomaly.type}</h4>
<p className="text-sm">{anomaly.provider}</p>
<p className="text-sm">{anomaly.message}</p>
</div>
))
)}
</CardContent>
</Card>
</div>
);
}from src.services.redis_metrics import RedisMetricsService
from src.db.model_health import record_model_call
async def make_api_call(provider: str, model: str):
start_time = time.time()
try:
# Make your API call
response = await api_client.complete(...)
# Calculate metrics
latency_ms = (time.time() - start_time) * 1000
tokens_used = response.usage.total_tokens
cost = calculate_cost(tokens_used, model)
# Record success in Redis (real-time)
await RedisMetricsService.record_request(
provider=provider,
model=model,
latency_ms=latency_ms,
success=True,
cost_credits=cost,
tokens_input=response.usage.prompt_tokens,
tokens_output=response.usage.completion_tokens
)
# Update model health (database)
await record_model_call(
provider=provider,
model=model,
response_time_ms=latency_ms,
success=True,
error_message=None
)
return response
except Exception as e:
# Record failure
latency_ms = (time.time() - start_time) * 1000
await RedisMetricsService.record_request(
provider=provider,
model=model,
latency_ms=latency_ms,
success=False,
error_message=str(e)
)
await record_model_call(
provider=provider,
model=model,
response_time_ms=latency_ms,
success=False,
error_message=str(e)
)
raiseAdd to .env:
# Monitoring Services
PROMETHEUS_ENABLED=true
PROMETHEUS_SCRAPE_ENABLED=true
# Error Monitoring
SENTRY_DSN=https://your-sentry-dsn@sentry.io/project-id
SENTRY_ENABLED=true
SENTRY_ENVIRONMENT=production
SENTRY_TRACES_SAMPLE_RATE=1.0
# Analytics
STATSIG_SERVER_SECRET_KEY=your-statsig-key
POSTHOG_API_KEY=your-posthog-key
BRAINTRUST_API_KEY=your-braintrust-key
# Observability
TEMPO_ENABLED=true # Distributed tracing
LOKI_ENABLED=true # Structured logging
# Redis (optional, graceful degradation)
REDIS_URL=redis://localhost:6379File: prometheus.yml
global:
scrape_interval: 10s
evaluation_interval: 10s
external_labels:
monitor: 'gatewayz-monitor'
environment: 'production'
scrape_configs:
- job_name: 'gatewayz-api'
static_configs:
- targets: ['api.railway.internal:8000']
metrics_path: '/metrics'The background health monitoring service runs continuously with these settings:
File: src/services/model_health_monitor.py
# Monitoring Tiers (check intervals)
CRITICAL_TIER = 300 # 5 minutes
POPULAR_TIER = 1800 # 30 minutes
STANDARD_TIER = 7200 # 2 hours
ON_DEMAND_TIER = 14400 # 4 hours
# Circuit Breaker Thresholds
FAILURE_THRESHOLD = 5 # Consecutive failures to open circuit
HALF_OPEN_SUCCESS_THRESHOLD = 3 # Consecutive successes to close circuit
CIRCUIT_OPEN_DURATION = 300 # 5 minutes before attempting half-open
# Batch Processing
BATCH_SIZE = 10 # Models to check per batch
BATCH_INTERVAL = 1 # Seconds between batches-
Refresh Intervals:
- Real-time metrics: 30-60 seconds
- Health checks: 1-2 minutes
- Anomalies: 2-5 minutes
- Historical data: 5-10 minutes
-
Error Handling:
- Always handle API errors gracefully
- Show cached data during outages
- Display loading states clearly
-
Performance:
- Use pagination for large datasets
- Implement virtual scrolling for long lists
- Cache responses when appropriate
Recommended Alerts:
-
Critical:
- Provider down (uptime < 95%)
- High error rate (> 25%)
- Circuit breaker opened
- Database connection lost
-
Warning:
- Degraded performance (uptime < 98%)
- Elevated error rate (> 10%)
- Anomaly detected (cost/latency spike)
-
Info:
- New incident created
- Incident resolved
Monitor these metrics for capacity planning:
- Request rates (per provider/model)
- Token consumption trends
- Cost trends
- Latency percentiles (p95, p99)
- Error rates by time of day
- Never expose API keys in frontend code
- Use environment variables for sensitive data
- Implement rate limiting on monitoring endpoints
- Enable CORS only for trusted domains
- Use HTTPS for all API calls
- Regularly rotate API keys
Database:
- Use materialized views for frequently accessed data
- Partition large tables (e.g.,
model_health_history) - Create appropriate indexes
- Archive old data regularly
Redis:
- Set appropriate TTLs (24 hours for metrics)
- Use pipelining for bulk operations
- Monitor memory usage
API:
- Implement response caching
- Use pagination for large results
- Enable compression (gzip)
- Monitor query performance
Possible causes:
- Redis connection lost
- Background monitoring service stopped
- Database connection issues
- Rate limiting
Solutions:
# Check Redis connection
curl https://api.gatewayz.io/health/database
# Check monitoring service status
curl -H "Authorization: Bearer YOUR_API_KEY" \
https://api.gatewayz.io/health/monitoring/status
# Restart monitoring service
curl -X POST -H "Authorization: Bearer YOUR_API_KEY" \
https://api.gatewayz.io/health/monitoring/startPossible causes:
- Large result sets
- Missing indexes
- Materialized views not refreshed
- Redis cache miss
Solutions:
- Use pagination (
limitandoffset) - Reduce time ranges (e.g., 24h instead of 7d)
- Check database indexes
- Refresh materialized views manually
Possible causes:
- Provider not enabled
- No recent API calls
- Monitoring tier too low
- Circuit breaker open
Solutions:
# Enable provider monitoring
await record_model_call(
provider="new-provider",
model="new-model",
response_time_ms=0,
success=True,
monitoring_tier="critical" # Set appropriate tier
)| Category | Endpoint Prefix | Authentication | Use Case |
|---|---|---|---|
| Monitoring | /api/monitoring/* |
API Key | Admin dashboards, internal tools |
| Health | /health/* |
Mixed | System health, diagnostics |
| Model Health | /v1/model-health/* |
Optional | Public model status, provider health |
| Status Page | /v1/status/* |
None | Public status pages |
| Analytics | /v1/analytics/* |
User Auth | Event tracking |
| Metrics | /metrics |
None | Prometheus scraping |
- API Reference - Complete API documentation
- Architecture - System architecture overview
- Setup Guide - Installation and configuration
- Database Migrations - Database schema
For questions or issues:
- GitHub Issues: gatewayz-backend/issues
- Documentation: Wiki Home
- Performance-Monitoring — TTFB, streaming duration, latency histograms
- Prometheus-Setup — Metric collection and PromQL
- Error-Monitoring — Sentry integration and alerts
- Delta Report — P1-7: Circuit breaker timing discrepancy
Reading Path (start here, in order)
- Conceptual Model
- Stability Definition
- Conceptual Model Features
- Features
- Delta Report
- Features-Acceptance-Criteria
Testing
Security & Access
Billing
Monitoring
Features
Providers
Operations
Data References