-
Notifications
You must be signed in to change notification settings - Fork 1
Features
Reading path: Conceptual Model | Stability Definition | Conceptual Model Features | Features (you are here) | Delta Report | Features-Acceptance-Criteria
Read after: Conceptual Model Features (the spec — what we should build) Next: Delta Report (gap analysis — spec vs reality)
TL;DR — This is the reality check. 450+ endpoints across 17 systems that are actually built and running today. Each system section includes a high-level overview, what it does NOT do (boundaries), and low-level technical details (auth mechanisms, middleware, database calls, caching). Compare this against Conceptual Model Features to understand the gap, or go straight to Delta Report for the pre-built comparison.
Version: 2.0.4 | Last Updated: 2026-03-04
- Admin
- Analytics
- Authentication
- Chat & Messaging
- Circuit Breakers
- Code Router
- Coupons
- Credits
- Diagnostics
- Error Monitoring
- General Router
- Health & Monitoring
- Metrics & Observability
- Models & Catalog
- Other
- Status
- Users
The Admin system is an internal operations layer providing 80+ endpoints for managing all aspects of the Gatewayz platform. It covers user and account management, financial operations (credits, billing), system monitoring and diagnostics, cache management, rate limit configuration, model/provider catalog synchronization, coupon lifecycle management, downtime incident tracking, IP whitelisting, role-based access control (RBAC), trial analytics, and notification processing.
Admin endpoints are designed for internal tooling, admin dashboards, and support workflows -- not for end-user consumption.
Boundaries -- what Admin does NOT do:
- Does not handle end-user inference requests (chat completions, provider failover are separate systems)
- Does not manage Stripe/payment processing directly (admin can manually grant credits, but payment webhooks and subscriptions are separate)
- Does not provide self-service admin creation (admin role assignment is done via the roles endpoints or directly in the database)
- Does not enforce cross-process cache consistency (most caches are in-process Python dicts, not shared across instances)
- Does not provide real-time streaming or WebSocket feeds (all endpoints are request/response)
- Does not perform automated remediation (downtime endpoints let admins view and resolve incidents, but there is no auto-healing)
- Does not have pagination on all read endpoints (e.g.,
GET /admin/balanceand coupon stats fetch all rows)
Authentication: Two mechanisms exist:
-
Primary (
require_admindependency):HTTPBearer->get_api_key()->validate_api_key_security()(checks active, expiration, IP allowlist, domain restrictions) ->get_user()(5-min in-memory cache) ->validate_trial_expiration()-> checkuser.role == "admin"ORuser.is_admin == True. Unauthorized attempts triggeraudit_logger.log_security_violation("UNAUTHORIZED_ADMIN_ACCESS")and return 403. -
Secondary (monitoring endpoints only):
get_admin_key()compares Bearer token againstADMIN_API_KEYenv var usingsecrets.compare_digest()(constant-time comparison).
Middleware pipeline (all admin requests): SecurityMiddleware (50% Sentry sampling for /admin/ paths) -> RequestTimeoutMiddleware (55s) -> ConcurrencyMiddleware -> RequestIDMiddleware -> TraceContextMiddleware -> AutoSentryMiddleware -> CORSMiddleware -> ObservabilityMiddleware (Prometheus http_requests_total) -> SelectiveGZipMiddleware -> StagingSecurityMiddleware -> DeprecationMiddleware.
Database: All operations go through get_supabase_client() (PostgREST). Most use synchronous Supabase calls wrapped in asyncio.to_thread(). Some use execute_with_retry(fn, max_retries=2, retry_delay=0.2) for resilience. No admin endpoint uses atomic transaction wrapping.
Caching: Three tiers: (1) in-process Python dicts for models/providers/users (not shared across processes), (2) Redis for trial analytics and chat summary caching, (3) functools.lru_cache for the rate limit manager singleton.
| Endpoint | Description |
|---|---|
GET /admin/users |
Paginated, filterable user list with email/API key/active status filters. Fast path via RPC search_users_by_email for email-only searches. |
GET /admin/users/{user_id} |
Comprehensive user profile: all fields, all API keys, 10 recent usage records, 10 recent activity logs. Returns plaintext API key strings. |
GET /admin/users/by-api-key |
Exact-match user lookup by API key string. Uses indexed RPC for O(log n) lookup (~10-20ms). |
GET /admin/users/count |
Total registered user count via server-side COUNT(*). Ultra-fast (5-20ms). |
GET /admin/users/growth |
Daily cumulative user registration counts over configurable time period (1-365 days). Includes growth rate calculation. |
GET /admin/users/stats |
Aggregated user statistics (counts by role, active status, credits totals, subscription breakdown). Up to 5 sequential queries. |
DELETE /admin/users/by-domain/{domain} |
Bulk delete users by email domain. dry_run=true by default. Protected domains (gmail, yahoo, outlook, etc.) blocked. Non-atomic (per-user DELETE in loop). |
GET /admin/api-keys/{api_key_id} |
Full API key details by numeric ID, including plaintext key string and owner profile. |
GET /admin/balance |
Credit balances for ALL users. No pagination -- fetches all records. Intended for small datasets. |
| Endpoint | Description |
|---|---|
POST /admin/add_credits |
Add credits to a user account. Per-transaction cap (ADMIN_MAX_CREDIT_GRANT) and 24h rolling daily limit (ADMIN_DAILY_GRANT_LIMIT). Logs transaction with full audit trail. Invalidates user cache. |
GET /admin/credit-transactions |
Paginated credit transaction history with filtering (user, type, date range, amount, direction). Warning: min_amount/max_amount filters fetch ALL rows into Python memory. |
POST /admin/limit |
Set per-user daily credit spending limits. Upserts rate_limit_configs. Does NOT clear rate limit cache (use /admin/clear-rate-limit-cache separately). |
POST /admin/assign-plan |
Assign a subscription plan to a user. |
| Endpoint | Description |
|---|---|
GET /admin/monitor |
Comprehensive system snapshot: user counts, credit totals, API usage (today + 30 days). 8 sequential Supabase queries, 500ms-2s typical. Merges activity_log + legacy usage_records with deduplication. |
GET /admin/monitoring/chat-requests |
Paginated chat completion request records. Filters: model, provider, date range. JOINs models + providers. Limit up to 100,000. |
GET /admin/monitoring/chat-requests/summary |
Aggregate chat request stats (tokens, latency, cost, success rate). Redis cached (60s TTL, key based on filter hash). |
GET /admin/monitoring/chat-requests/plot-data |
Chart data: last 10 full records + parallel arrays (tokens, latency, timestamps) for all matching requests. No LIMIT on plot query. |
GET /admin/monitoring/chat-requests/by-api-key |
Chat requests for a specific API key. Resolves key string to ID. |
GET /admin/monitoring/chat-requests/providers |
Providers with chat request activity. Primary: RPC path. Fallback: per-provider COUNT queries. |
GET /admin/monitoring/chat-requests/counts |
Request count leaderboard by model. In-memory grouping. |
GET /admin/monitoring/chat-requests/models |
Models with comprehensive request statistics (token totals, averages, latency). |
GET /admin/model-usage-analytics |
Pre-aggregated model usage from database view. Paginated, searchable, sortable by 11 fields. |
GET /admin/monitoring/api-key-tracking-quality |
API key tracking quality analysis. Uses get_admin_key auth (env var). Alert: ok (>=90%), warning (70-90%), critical (<70%). |
GET /admin/monitoring/api-key-tracking-trend |
Daily time-series of API key tracking quality over configurable period. |
| Endpoint | Description |
|---|---|
GET /admin/cache-status |
In-process provider catalog cache metadata (age, TTL 1800s, validity, provider count). Sub-millisecond. |
GET /admin/huggingface-cache-status |
HuggingFace cache state with all cached model IDs (can be 1000+ items). |
GET /admin/debug-models |
Diagnostic: sample models/providers from caches, tests provider-slug matching. May trigger cache refresh if stale. |
POST /admin/refresh-providers |
Force refresh provider cache (1s debounce, then immediate fetch). |
POST /admin/refresh-huggingface-cache |
Invalidate HuggingFace cache (lazy -- next request triggers cold fetch, 500ms-2s). |
POST /admin/clear-rate-limit-cache |
Clear in-memory rate limit config cache. Does NOT clear Redis counters. |
GET /admin/cache/status |
Overall cache status. |
GET /admin/cache/debouncer/stats |
Cache debouncer statistics. |
GET /admin/cache/warmer/stats |
Cache warmer statistics. |
GET /admin/cache/modelz/status |
Modelz cache status. |
POST /admin/cache/refresh/{gateway} |
Refresh cache for a specific gateway. |
POST /admin/cache/clear |
Clear all caches. |
POST /admin/cache/modelz/refresh |
Refresh modelz cache. |
POST /admin/cache/pricing/refresh |
Refresh pricing cache. |
POST /admin/api/cache/invalidate |
Invalidate specific cache entry. |
DELETE /admin/cache/modelz/clear |
Clear modelz cache. |
| Endpoint | Description |
|---|---|
GET /admin/model-sync/status |
Current model sync job status. |
GET /admin/model-sync/health |
Model sync service health. |
GET /admin/model-sync/providers |
List of 33 syncable providers. No auth enforced. |
POST /admin/model-sync/trigger |
Trigger incremental model sync. |
POST /admin/model-sync/all |
Sync all models from all providers. |
POST /admin/model-sync/full |
Full catalog resync (delete + reimport). |
POST /admin/model-sync/incremental |
Incremental delta sync. |
POST /admin/model-sync/providers-only |
Sync provider metadata only. |
POST /admin/model-sync/provider/{provider_slug} |
Sync a single provider. |
POST /admin/model-sync/reset-and-resync |
Flush and fully resync the catalog. |
DELETE /admin/model-sync/flush-models |
Flush all models from catalog DB. |
DELETE /admin/model-sync/flush-providers |
Flush all providers from catalog DB. |
| Endpoint | Description |
|---|---|
GET /admin/rate-limits/config |
Current rate limit configuration. |
GET /admin/rate-limits/system |
System-wide rate limit stats. |
GET /admin/rate-limits/users |
Per-user rate limit stats. |
PUT /admin/rate-limits/update |
Update rate limit rules. |
POST /admin/rate-limits/config/reset |
Reset rate limit config to defaults. |
DELETE /admin/rate-limits/delete |
Delete rate limit entries for an API key. |
| Endpoint | Description |
|---|---|
GET /admin/roles/{user_id} |
Get user's role info. |
POST /admin/roles/update |
Update a user's role with reason logging. |
GET /admin/roles/list/{role} |
List all users with a specific role. |
GET /admin/roles/permissions/{role} |
Get permissions for a role. |
GET /admin/roles/audit/log |
Role change audit log. |
| Endpoint | Description |
|---|---|
GET /admin/trial/analytics |
Trial usage analytics. Redis cached (5-min TTL). Paginated fetch of ALL api_keys_new records on cache miss. Computes conversion rate, usage stats, status breakdown. |
GET /admin/trial/users |
List all trial users with status. |
GET /admin/trial/domain-analysis |
Trial sign-up analysis by email domain (abuse detection). |
GET /admin/trial/conversion-funnel |
Trial-to-paid conversion funnel. |
GET /admin/trial/ip-analysis |
Trial sign-ups by IP (fraud detection). |
GET /admin/trial/cohort-analysis |
Trial user cohort retention analysis. |
POST /admin/trial/save-conversion-metrics |
Save conversion metrics snapshot. |
| Endpoint | Description |
|---|---|
GET /admin/downtime/incidents |
List downtime incidents with filters (status, severity, environment). Uses execute_with_retry (2 retries). |
GET /admin/downtime/incidents/ongoing |
Currently active incidents only. |
GET /admin/downtime/statistics |
Downtime statistics over time period (MTTR, frequency, by severity/status). Returns zeroed stats on failure. |
GET /admin/downtime/incidents/{incident_id} |
Full incident details including logs, server info, metrics snapshot. |
GET /admin/downtime/incidents/{incident_id}/logs |
Filtered incident logs (by level, logger, search term). In-memory filtering. |
GET /admin/downtime/incidents/{incident_id}/analysis |
Error pattern analysis: error/warning counts, type distribution, top 10 messages. |
POST /admin/downtime/incidents/{incident_id}/capture-logs |
Trigger Loki log capture for ongoing incident. External HTTP to Grafana Loki (30s timeout). Stores up to 10,000 entries. |
POST /admin/downtime/incidents/{incident_id}/resolve |
Resolve incident. Records ended_at, resolved_by, optional notes. Rejects if already resolved. |
| Endpoint | Description |
|---|---|
GET /admin/coupons |
List coupons with filters (scope, type, is_active). Paginated. |
GET /admin/coupons/{coupon_id} |
Single coupon details. |
GET /admin/coupons/{coupon_id}/analytics |
Per-coupon analytics: redemption rate, remaining uses, unique users, total value. |
GET /admin/coupons/stats/overview |
Coupon system overview. Fetches ALL coupons and ALL redemptions (no pagination). |
| Endpoint | Description |
|---|---|
POST /admin/notifications/process |
Process pending notification queue. |
GET /admin/notifications/stats |
Notification delivery statistics. |
| Endpoint | Description |
|---|---|
GET /admin/test-huggingface/{hugging_face_id} |
Test HuggingFace API connectivity. Synchronous HTTP call (blocks event loop, 10s timeout). Redis cached (1h TTL). |
GET /admin/health/optimizations/cache |
Health optimization cache status. |
POST /admin/health/optimizations/cache/clear |
Clear health optimization cache. |
The Analytics system provides server-side event forwarding to Statsig and PostHog, allowing the frontend to bypass client-side ad-blockers that would otherwise prevent analytics collection. Events are routed through the backend as a proxy, ensuring reliable tracking of user behavior, feature usage, and session activity.
Boundaries -- what Analytics does NOT do:
- Does not store events in the Gatewayz database (events are forwarded to external services only)
- Does not provide its own analytics dashboard or query interface (use Statsig/PostHog dashboards)
- Does not guarantee event delivery (both services fail silently if unavailable)
- Does not process events concurrently within a batch (sequential processing -- one slow call delays the rest)
- Does not track inference/chat usage (that's handled by the activity logging system)
Authentication: Optional. Uses get_current_user which allows None. User ID resolution priority: authenticated user ID > event-supplied user_id > "anonymous".
External services: Statsig (batched flush: 10s interval or 50-event queue) and PostHog (async capture via SDK). Configured via STATSIG_SERVER_SECRET_KEY, POSTHOG_API_KEY, POSTHOG_HOST env vars.
No database writes, no Redis, no Prometheus metrics, no audit logging. Both services fail silently if not configured -- endpoints always return 200.
Batch processing caveat: Events in a batch are processed sequentially. If any event fails, remaining events in that batch are skipped (HTTP 500).
| Endpoint | Description |
|---|---|
POST /v1/analytics/events |
Send a single analytics event. Routes to both Statsig and PostHog. Always returns 200. |
POST /v1/analytics/batch |
Batch send multiple analytics events. Sequential processing per event. |
POST /v1/analytics/session/start |
Start an analytics session. Creates session-level tracking context. |
GET /v1/analytics/cache |
Analytics cache status and data. |
GET /v1/analytics/cache/summary |
Analytics cache summary. |
The Authentication system handles user identity verification through Privy (primary identity provider), supporting multiple auth methods: Google OAuth, GitHub OAuth, email/password, crypto wallet, and phone number. The primary POST /auth endpoint serves as both login and registration -- it validates the user's Privy token and either returns an existing account or creates a new one automatically. New users start with $5 in credits, a 3-day trial, and a "basic" tier.
Boundaries -- what Authentication does NOT do:
- Does not validate Privy access tokens (the field exists but is not currently checked)
- Does not handle OAuth flows directly (Privy handles the OAuth redirect and token exchange)
- Does not support traditional username/password login (only Privy-mediated authentication)
- Does not provide MFA/2FA management (delegated to Privy)
- Does not manage user sessions server-side (stateless API key-based authentication post-login)
- Does not rate-limit at the API key level for auth endpoints (uses IP-based limits only)
Rate limiting: Login: 10 attempts per 15 minutes per IP (in-memory sliding window). Registration: 3 attempts per hour per IP.
User lookup chain: In-memory Redis-backed cache by Privy ID -> Supabase SELECT * FROM users WHERE privy_user_id = ? -> Secondary fallback by username.
Email verification pipeline: Local blocklist check -> temp email service list -> Emailable API verification (external service).
New user provisioning: $5 initial credits, 3-day trial, "basic" tier. Partner codes (e.g., REDBEARD) trigger PartnerTrialService for extended trials. Referral codes update users.referred_by_code.
API key creation: Existing users get their primary key from api_keys_new table. New users get an auto-created primary key with Fernet AES-128 encryption + SHA-256 HMAC hashing.
Background tasks on login/register: Welcome email via Resend, activity log insert.
Auth info priority: request.email > linked "email" account > Google OAuth > phone > GitHub.
| Endpoint | Description |
|---|---|
POST /auth |
Primary auth endpoint. Validates Privy token, returns API key + credits + subscription status. Creates new user if none exists. Rate limited: 10/15min per IP. |
POST /auth/register |
Direct registration (alternative to Privy flow). Rate limited: 3/hour per IP. |
POST /auth/password-reset |
Request a password reset email via Resend. |
POST /auth/reset-password |
Reset password using a token from the reset email. |
GET /auth/health |
Auth service health check. 4 sequential checks: DB connectivity (3s timeout), Redis ping, auth cache stats, timeout config. Always returns 200. |
The Chat & Messaging system is the core of Gatewayz -- it provides unified AI inference across 30+ providers through OpenAI-compatible and Anthropic-compatible APIs. The primary endpoint POST /v1/chat/completions is a full drop-in replacement for OpenAI's Chat Completions API, supporting streaming SSE, tool/function calling, JSON mode, logprobs, and all standard parameters. It automatically routes requests across providers with failover, applies credit billing, enforces rate limits, and records comprehensive observability data.
Beyond inference, the system includes chat session management (create, list, search, delete sessions), message history persistence, user feedback collection, chat sharing via public links, and Vercel AI SDK compatibility endpoints.
Boundaries -- what Chat & Messaging does NOT do:
- Does not train or fine-tune models (inference only)
- Does not store raw model responses long-term (activity logs capture metadata, not full response text unless chat history is enabled)
- Does not provide its own models (routes to external providers: OpenRouter, Featherless, Chutes, DeepInfra, Fireworks, Together, Groq, Cerebras, etc.)
- Does not support file uploads in chat (images in messages are supported via URL references, not file upload)
- Does not handle Assistants API or threads (OpenAI Assistants API is not implemented)
- Does not provide batch/async inference (all requests are synchronous or streaming)
- Does not guarantee message ordering across concurrent requests to the same session
- Does not implement conversation memory or context management (the client is responsible for sending conversation history)
Authentication: Optional for the main inference endpoint. Supports both anonymous (no key, IP rate-limited, model whitelist restricted) and authenticated (API key) users.
Authenticated request pipeline (10+ steps):
- Parallel auth via
asyncio.gather(user lookup + api_key_id resolution + trial status) - Trial validation (402 if expired)
- Plan check (entitlement enforcement)
- Redis rate limiting (
INCR rate_limit:{api_key_id}:{minute},EXPIRETTL 60s) - Credit balance check (pre-flight)
- Auto web search injection (if enabled, prepends search results to prompt)
- Router detection (
router:general:*orrouter:code:*model IDs) - Provider detection + model transformation (maps model IDs to provider-specific formats)
- Failover chain construction (checks circuit breaker states via Redis per provider)
- Health-based provider selection
- Streaming or non-streaming dispatch
Anonymous path: IP-based Redis rate limit (INCR anon_rate:{ip_hash}:{minute}, TTL 60s) + model whitelist check.
Provider failover: build_provider_failover_chain() reads circuit breaker state per provider via Redis GET circuit_breaker:{provider}. Skips providers with OPEN circuit breakers.
Streaming: SSE with StreamNormalizer.normalize() across all providers. TTFC (Time to First Chunk) tracked via Prometheus. Background post-processing after stream completes: credit deduction, activity log, chat history save, health capture.
Prometheus metrics: model_inference_requests, model_inference_duration, tokens_used, credits_used, api_cost_usd_total, api_cost_per_request, TTFC.
Token estimation fallback: When providers don't return usage data, estimates at ~1 token per 4 characters.
| Endpoint | Description |
|---|---|
POST /v1/chat/completions |
Primary inference endpoint. Full OpenAI Chat Completions API compatibility (streaming SSE, tool/function calling, JSON mode, logprobs). Routes to 30+ providers with automatic failover. Credit billing. |
POST /v1/messages |
Anthropic Messages API. Drop-in Claude compatibility. Same routing/billing pipeline. |
POST /v1/responses |
OpenAI v1/responses API. Unified response format. |
POST /api/chat/ai-sdk |
Vercel AI SDK compatible chat endpoint. |
POST /api/chat/ai-sdk-completions |
Vercel AI SDK completions endpoint. |
| Endpoint | Description |
|---|---|
GET /v1/chat/sessions |
List all chat sessions for the authenticated user. |
GET /v1/chat/sessions/{session_id} |
Get session with full message history. |
POST /v1/chat/sessions/{session_id}/messages |
Save a message to a session. |
POST /v1/chat/sessions/{session_id}/messages/batch |
Batch save messages to a session. |
POST /v1/chat/search |
Full-text search across chat sessions. |
PUT /v1/chat/sessions/{session_id} |
Update session metadata (title, etc.). |
DELETE /v1/chat/sessions/{session_id} |
Delete a chat session and its messages. |
GET /v1/chat/stats |
Chat usage statistics for the user. |
| Endpoint | Description |
|---|---|
POST /v1/chat/feedback |
Submit feedback (thumbs up/down, text) on a response. |
GET /v1/chat/feedback |
Get user's feedback entries. |
GET /v1/chat/feedback/stats |
Feedback statistics. |
GET /v1/chat/sessions/{session_id}/feedback |
Feedback for a specific session. |
PUT /v1/chat/feedback/{feedback_id} |
Update existing feedback. |
DELETE /v1/chat/feedback/{feedback_id} |
Delete feedback. |
| Endpoint | Description |
|---|---|
POST /v1/chat/share |
Create a shareable public link for a chat session. |
GET /v1/chat/share |
List user's share links. |
GET /v1/chat/share/{token} |
Access a shared chat (public, no auth). |
DELETE /v1/chat/share/{token} |
Delete a share link. |
| Endpoint | Description |
|---|---|
GET /v1/chat/completions/metrics/tokens-per-second |
Token throughput per model (Prometheus format). |
GET /v1/chat/completions/metrics/tokens-per-second/all |
Aggregate tokens/second across all models. |
The Circuit Breaker system implements the circuit breaker pattern for provider reliability. Each AI provider has an independent circuit breaker that tracks success/failure counts and automatically transitions between states: CLOSED (normal operation), OPEN (blocking requests after too many failures), and HALF_OPEN (testing recovery). This prevents cascading failures by stopping requests to unhealthy providers and automatically recovering when they come back online.
These endpoints are read/control interfaces for the circuit breaker states -- the actual circuit breaking happens inside the inference pipeline.
Boundaries -- what Circuit Breakers does NOT do:
- Does not make inference requests (only monitors/controls state; the inference pipeline uses circuit breaker state)
- Does not persist state to Supabase (Redis + in-memory only; state is lost if both Redis and the process restart)
- Does not provide historical circuit breaker data (only current state)
- Does not auto-configure thresholds per provider (all providers use the same default config)
- Does not trigger alerts or notifications on state transitions (only Prometheus metrics are emitted)
State storage: Redis keys circuit_breaker:{provider}:{state|failure_count|success_count|opened_at|consecutive_opens} with 3600s TTL. In-memory fallback if Redis unavailable.
State transitions: CLOSED -> OPEN (after 5 consecutive failures), OPEN -> HALF_OPEN (after 300s recovery timeout), HALF_OPEN -> CLOSED (after 3 consecutive successes), HALF_OPEN -> OPEN (on any failure).
Auto-creation: Querying a nonexistent provider creates a new breaker with default config (CLOSED, zero counts). There is no 404 case.
Prometheus metrics (emitted on state changes, not by these endpoints): circuit_breaker_state_transitions_total, circuit_breaker_failures_total, circuit_breaker_successes_total, circuit_breaker_rejected_requests_total, circuit_breaker_current_state.
No Supabase queries. Entirely Redis + in-memory.
| Endpoint | Description |
|---|---|
GET /circuit-breakers |
All circuit breaker states with failure/success counts per provider. No auth required. |
GET /circuit-breakers/{provider} |
Circuit breaker state for a specific provider. Auto-creates if not found. |
POST /circuit-breakers/{provider}/reset |
Reset a provider's circuit breaker to CLOSED state (clears failure counts). |
POST /circuit-breakers/reset-all |
Reset all circuit breakers to CLOSED. |
The Code Router provides intelligent, benchmark-driven model selection specifically for coding tasks. It classifies task complexity and matches requests to tiered models scored by SWE-bench and HumanEval benchmarks. Four routing modes are available: auto (complexity-based), price (cheapest capable model), quality (highest benchmark score), and agentic (optimized for multi-step tool-using agents).
Boundaries -- what Code Router does NOT do:
- Does not execute code or run benchmarks (uses pre-computed static benchmark data from
code_quality_priors.json) - Does not make inference requests (only selects models; the inference pipeline dispatches to the selected model)
- Does not learn or adapt from user feedback (static benchmark data, never reloaded after startup)
- Does not support custom model pools or user-defined tiers
- Does not interact with any database, Redis, or external service (entirely in-memory/static)
Data source: src/services/code_quality_priors.json -- lazy-loaded at startup, cached at module level (never reloaded). On file read error: Sentry capture + minimal fallback config.
Tier system: 4 tiers of models ranked by SWE-bench/HumanEval benchmark scores and pricing. Fallback model: zai/glm-4.7.
Routing modes: router:code:auto (complexity detection), router:code:price, router:code:quality, router:code:agentic.
No Supabase, no Redis, no Prometheus, no external calls. All endpoints serve static/in-memory data.
| Endpoint | Description |
|---|---|
GET /code-router/settings/options |
Available code router configuration options (modes, parameters). |
GET /code-router/tiers |
Code model tiers with benchmark scores and pricing per tier. |
GET /code-router/stats |
Code router usage statistics. |
POST /code-router/test |
Test code routing with a sample prompt. Returns selected model and routing rationale. |
POST /code-router/settings/validate |
Validate code router settings before applying. |
The Coupon system allows users to redeem coupon codes for credit bonuses. Coupons can be global (available to all users) or user-specific (assigned to a particular user). Each coupon has a USD value, usage limits, and validity dates. Admin coupon management (creation, analytics) is under the Admin section.
Boundaries -- what Coupons does NOT do:
- Does not create or manage coupons (that's the Admin coupon endpoints)
- Does not provide percentage-based discounts (all coupons are fixed USD credit values)
- Does not apply coupons to specific purchases or subscriptions (credits are added to the general balance)
- Does not support coupon stacking or combining multiple codes in one redemption
- Does not send notifications when coupons are about to expire
Redemption flow: Validates coupon code -> checks expiration/active/max_uses -> checks scope (global vs user-specific) -> checks if user already redeemed -> adds credits via add_credits_to_user() -> records redemption in coupon_redemptions table.
Tables: coupons (code, value_usd, scope, type, max_uses, times_used, valid_from, valid_until, is_active) and coupon_redemptions (coupon_id, user_id, value_applied, redeemed_at).
| Endpoint | Description |
|---|---|
GET /coupons/available |
List available coupons for the authenticated user (global + user-specific). |
GET /coupons/history |
User's coupon redemption history with values and timestamps. |
POST /coupons/redeem |
Redeem a coupon code. Validates eligibility, adds credits, records redemption. |
The Credits system manages the credit-based billing model that powers all inference usage. Users have a credit balance that decreases with each API call based on token usage and model pricing. Credits can be added through purchases (Stripe), admin grants, coupon redemptions, referral rewards, and trial allocations. The system provides balance checking, transaction history, and bulk credit operations.
Boundaries -- what Credits does NOT do:
- Does not handle real-time pricing calculations during inference (that's the pricing service in the inference pipeline)
- Does not enforce credit limits during streaming (credits are deducted after the response completes)
- Does not provide credit expiration or time-limited credits
- Does not support credit transfers between users
- Does not integrate directly with Stripe (credit purchases go through the Payments system, which then calls credits)
- Admin-only: all credit endpoints require admin authentication
Tables: users (purchased_credits, subscription_allowance), credit_transactions (user_id, amount, transaction_type, description, balance_before, balance_after, metadata, created_by).
Transaction types: trial, purchase, api_usage, admin_credit, admin_debit, refund, bonus, transfer.
Summary endpoint: When system-wide (no user_id), fetches ALL users and ALL transactions for client-side aggregation. Performance warning -- no server-side aggregation.
Amount filtering caveat: When min_amount/max_amount are provided, the endpoint fetches ALL rows into Python memory and filters client-side.
No Redis, no Prometheus.
| Endpoint | Description |
|---|---|
GET /credits/balance |
Current credit balance breakdown (purchased credits + subscription allowance). |
GET /credits/summary |
Credit summary with usage breakdown. System-wide mode fetches all records. |
GET /credits/transactions |
Paginated credit transaction history. Filters: user, type, date range, amount, direction. |
POST /credits/add |
Add credits to an account. |
POST /credits/adjust |
Adjust credits (positive or negative). |
POST /credits/bulk-add |
Bulk add credits to multiple users. |
POST /credits/refund |
Process a credit refund. |
The Diagnostics system provides real-time visibility into the gateway's operational state -- specifically concurrency pressure and provider response times. These endpoints are designed for debugging 503 errors (concurrency saturation) and identifying slow providers.
Boundaries -- what Diagnostics does NOT do:
- Does not diagnose individual request failures (use error monitoring or activity logs for that)
- Does not provide historical data or trends (only current/recent state)
- Does not trigger alerts or remediation actions
- Does not bypass ConcurrencyMiddleware (unlike
/healthand/metrics, diagnostics requests count against the concurrency limit) - No authentication required (public endpoints)
Concurrency endpoint: Reads Prometheus gauges (concurrency_active_requests, concurrency_queued_requests) via internal ._value._value access. Status thresholds: >=90% utilization OR >=80% queue = "critical"; >=70%/>=60% = "warning". Default config: CONCURRENCY_LIMIT=20, CONCURRENCY_QUEUE_SIZE=50, CONCURRENCY_QUEUE_TIMEOUT=10.0s.
Provider timing: Reads provider_slow_requests_total Prometheus counter (labels: provider, model, severity). Slow threshold: 30-45s. Very slow: >45s.
No Supabase, no Redis. Reads only Prometheus in-memory metrics. Errors return 200 with degraded payload (no HTTPException).
| Endpoint | Description |
|---|---|
GET /api/diagnostics/concurrency |
Active concurrency stats: in-flight requests, queue depth, shed count, utilization percentage, status (normal/warning/critical). |
GET /api/diagnostics/provider-timing |
Provider response timing summary: counts of slow (30-45s) and very slow (>45s) requests by provider and model. |
The Error Monitoring system provides comprehensive error detection, classification, pattern recognition, and AI-generated fix suggestions. It integrates with Grafana Loki for log ingestion, classifies errors by type and severity, identifies recurring patterns, assesses which errors are automatically fixable, and can generate code fix suggestions using the Anthropic API (Claude).
Boundaries -- what Error Monitoring does NOT do:
- Does not automatically apply fixes (generates suggestions only; human review required)
- Does not persist error data to a database (all error patterns stored in-memory, lost on restart)
- Does not integrate with Sentry (Sentry is a separate error tracking system; this monitors Loki logs)
- Does not monitor provider-level errors in real-time (scans logs on-demand or via autonomous background task)
- Does not send alerts or notifications on critical errors
- No authentication required on any endpoint (all public)
Data source: Grafana Loki via httpx.AsyncClient (10s timeout). Query: {level="ERROR"} on /loki/api/v1/query_range.
Error classification pipeline: extract_error_details() -> classify_error() (pattern matching for provider/DB/auth/rate limit/timeout/cache/external errors) -> determine_fixability() -> group_similar_errors() (by {category}:{message[:50]}).
Severity mapping: Database = CRITICAL; provider timeout/5xx + auth + external = HIGH; rate limit/timeout/cache = MEDIUM; validation = LOW.
Fixability: Rate limit, timeout, cache, database, auth errors are assessed as fixable. Provider, validation, external service, internal errors are not.
Bug fix generator: Uses Anthropic API (claude-3-5-sonnet-20241022), requires ANTHROPIC_API_KEY and optional GITHUB_TOKEN.
Autonomous monitor: Background scanning singleton. In-memory state only.
No Supabase, no Redis, no Prometheus.
| Endpoint | Description |
|---|---|
GET /error-monitor/autonomous/status |
Autonomous error monitor status (running, last scan time, error count). |
GET /error-monitor/health |
Error monitor health check. Checks ErrorMonitor, AutonomousMonitor, and BugFixGenerator singletons. |
GET /error-monitor/dashboard |
Dashboard data: error charts, stats, trends, severity breakdown. |
GET /error-monitor/errors/recent |
Recent errors with severity, category, trace info. |
GET /error-monitor/errors/critical |
Critical errors only (database-level failures). |
GET /error-monitor/errors/fixable |
Errors assessed as automatically fixable. |
GET /error-monitor/errors/patterns |
Detected error patterns (recurring issues grouped by category + message prefix). |
GET /error-monitor/fixes/generated |
All AI-generated fix suggestions. |
GET /error-monitor/fixes/{fix_id} |
Details of a specific fix suggestion. |
POST /error-monitor/fixes/generate-for-error |
Generate an AI fix suggestion for a specific error using Claude. |
POST /error-monitor/fixes/generate-batch |
Batch generate fix suggestions for multiple errors. |
POST /error-monitor/monitor/start |
Start continuous background error monitoring. |
POST /error-monitor/monitor/scan |
Trigger a one-time error scan against Loki. |
The General Router provides ML-powered model selection for general (non-coding) tasks using NotDiamond integration. It analyzes prompt content and selects the best model optimized for the chosen strategy: quality, cost, latency, or balanced. Model IDs use the router:general:<mode> syntax which is intercepted by the inference pipeline.
Boundaries -- what General Router does NOT do:
- Does not make inference requests (selects models; the inference pipeline dispatches)
- Does not learn from user feedback or usage patterns (static mode-to-model mapping with NotDiamond fallback)
- Does not support custom model pools or user-defined routing rules
- Does not guarantee NotDiamond availability (falls back to mode-specific default models)
- No database, Redis, or external calls from these info endpoints (entirely static/in-memory)
Routing modes: Balanced (anthropic/claude-sonnet-4), Quality (openai/gpt-4o), Cost (openai/gpt-4o-mini), Latency (groq/llama-3.3-70b-versatile). System default fallback: anthropic/claude-sonnet-4.
NotDiamond integration: When notdiamond_enabled is true, prompts are analyzed by NotDiamond for optimal model selection. When unavailable, the system falls back to mode-specific fallback models.
Syntax normalization: normalize_model_string() handles hyphenated aliases (e.g., router:general:balanced = router:general-balanced).
| Endpoint | Description |
|---|---|
GET /general-router/settings/options |
Available routing strategies and model pools per mode. |
GET /general-router/models |
Models available for general routing with benchmark data. |
GET /general-router/fallback-models |
Fallback model chain per routing mode. |
GET /general-router/stats |
General router usage statistics. |
POST /general-router/test |
Test general routing with a sample prompt. Returns selected model and reasoning. |
The Health & Monitoring system provides tiered health checks across all platform subsystems: the gateway itself, individual providers, models, databases, Redis, and gateways. It supports multiple check types from lightweight pings (for load balancers) to comprehensive deep checks (for dashboards). The system includes uptime tracking, health scoring (0-100), AI-generated health insights, and auto-fix capabilities for unhealthy gateways.
Boundaries -- what Health & Monitoring does NOT do:
- Does not perform load testing or synthetic transactions (health checks verify connectivity, not throughput)
- Does not send alerts or notifications on health degradation (only exposes data; alerting is in Grafana/PagerDuty)
- Does not automatically reroute traffic away from unhealthy providers (that's the circuit breaker system)
- Does not provide historical health data beyond what's cached (point-in-time checks)
- Not all health endpoints are exempt from ConcurrencyMiddleware (only
/health,/metrics,/readyare exempt; subsystem health endpoints are NOT) - Always returns HTTP 200 (degraded info in response body) to allow load balancers to distinguish healthy vs degraded routing
Health check tiers: Quick (sub-millisecond, static response), Standard (DB + Redis connectivity), Railway (comprehensive: DB, Redis, providers), System (memory, CPU, connections), All (combined).
Provider health: Health scores 0-100 based on success rate, latency, and error patterns. Tracked per-provider with uptime history.
Model health: Per-model health status with history. Stored in model_health table.
Gateway health: Per-gateway health checks with auto-fix capability. Dashboard available in both HTML and JSON formats.
Health monitoring control: Start/stop background monitoring tasks, trigger on-demand checks.
| Endpoint | Description |
|---|---|
GET /health |
Primary health check. Returns version, status, timestamp. Used by load balancers. |
GET /health/quick |
Lightweight health check (minimal overhead, static response). |
GET /health/railway |
Railway deployment health check (comprehensive: DB, Redis, providers). |
GET /health/system |
System health: memory, CPU, connections. |
GET /health/database |
Database connectivity and performance. |
GET /health/all |
All health checks combined. |
GET /health/status |
Current system status. |
GET /health/summary |
Health summary with scores. |
GET /health/uptime |
System uptime metrics. |
GET /health/insights |
AI-generated health insights and recommendations. |
GET /health/dashboard |
Health dashboard data. |
GET /health/optimizations |
Current optimization status. |
GET /health/optimizations/connection-pools |
Connection pool health. |
GET /health/optimizations/prioritization |
Request prioritization stats. |
| Endpoint | Description |
|---|---|
GET /health/providers |
All provider health scores (0-100) and statuses. |
GET /health/provider/{provider} |
Single provider health details. |
GET /health/providers/stats |
Provider health statistics. |
GET /health/providers/uptime |
Provider uptime history. |
GET /health/providers/import-status |
Provider data import status. |
GET /health/google-vertex |
Google Vertex AI specific health check. |
| Endpoint | Description |
|---|---|
GET /health/models |
All model health scores. |
GET /health/model/{model_id} |
Single model health details. |
GET /health/models/stats |
Model health statistics. |
GET /health/models/uptime |
Model uptime history. |
| Endpoint | Description |
|---|---|
GET /health/gateways |
All gateway health checks. |
GET /health/gateways/dashboard |
Gateway health dashboard (HTML). |
GET /health/gateways/dashboard/data |
Gateway dashboard data (JSON). |
GET /health/{gateway} |
Single gateway health check. |
POST /health/gateways/{gateway}/fix |
Trigger auto-fix for an unhealthy gateway. |
| Endpoint | Description |
|---|---|
GET /health/catalog/models |
Catalog model data health. |
GET /health/catalog/providers |
Catalog provider data health. |
| Endpoint | Description |
|---|---|
GET /health/monitoring/status |
Active monitoring status. |
POST /health/monitoring/start |
Start active health monitoring. |
POST /health/monitoring/stop |
Stop active health monitoring. |
POST /health/check |
Trigger a health check. |
POST /health/check/now |
Trigger an immediate health check. |
The Metrics & Observability system provides the telemetry backbone for Gatewayz. It exposes Prometheus-compatible metrics for scraping, structured JSON metrics for dashboards, Grafana integration (SimpleJSON datasource protocol), OpenTelemetry tracing (Tempo), structured logging (Loki), and instrumentation health checks. The /metrics endpoint is the primary Prometheus scrape target, supporting both standard Prometheus text format and OpenMetrics format with exemplar support for trace-to-metric linking.
Boundaries -- what Metrics & Observability does NOT do:
- Does not store metric history (Prometheus server handles retention; the gateway only exposes current values)
- Does not provide alerting (alerting rules are configured in Grafana/Prometheus, not in the gateway)
- Does not process or query traces (traces are sent to Tempo; querying is done via Grafana)
- Does not aggregate metrics across multiple gateway instances (each instance exposes its own metrics)
- Does not provide custom dashboards (Grafana dashboards are configured separately)
GET /metrics: Core Prometheus scrape endpoint. Refreshes Redis INFO gauges via asyncio.to_thread(collect_redis_info) with 5s cap per scrape. Content negotiation: Accept: application/openmetrics-text enables OpenMetrics format with exemplar support for trace-to-metric linking via Grafana/Tempo. Hidden from OpenAPI docs.
GET /api/metrics/parsed: Self-calls http://localhost:8000/metrics via httpx.AsyncClient (10s timeout), parses Prometheus text format, computes latency percentiles (p50/p95/p99) via linear interpolation on histogram buckets, extracts request counts and error counts by endpoint.
Metrics status: Reports live vs synthetic mode based on Supabase availability (cached for 60s).
Redis: INFO command issued on each /metrics scrape. Timeout silently swallowed (stale values served).
Grafana SimpleJSON datasource: Implements the full protocol (test, search, query, annotations, tag-keys, tag-values) for direct Grafana integration.
| Endpoint | Description |
|---|---|
GET /metrics |
Raw Prometheus metrics. Supports OpenMetrics format with exemplars for trace-to-metric linking. Hidden from OpenAPI. |
GET /api/metrics/parsed |
Structured JSON: latency percentiles (p50/p95/p99/avg), request counts, error counts by endpoint. Self-calls /metrics. |
GET /api/metrics/status |
Metrics collection status (live vs synthetic mode). |
GET /api/metrics/summary |
Metrics summary. |
GET /api/metrics/health |
Metrics system health. |
GET /api/metrics/grafana-queries |
Grafana-compatible query results. |
POST /api/metrics/test |
Test metrics collection. |
| Endpoint | Description |
|---|---|
GET /api/monitoring/health |
Provider health scores (0-100) with status per provider. |
GET /api/monitoring/health/{provider} |
Single provider health. |
GET /api/monitoring/stats/realtime |
Real-time stats: requests, cost, health, error rates, latency with hourly breakdown. |
GET /api/monitoring/stats/hourly/{provider} |
Hourly stats for a specific provider. |
GET /api/monitoring/error-rates |
Error rates by provider and model with trend detection. |
GET /api/monitoring/errors/{provider} |
Recent error logs per provider. |
GET /api/monitoring/cost-analysis |
Cost breakdown by provider with cost-per-request. |
GET /api/monitoring/latency-trends/{provider} |
Latency percentiles (p50/p95/p99) over time. |
GET /api/monitoring/latency/{provider}/{model} |
Latency stats for a specific model. |
GET /api/monitoring/anomalies |
Anomaly detection: cost spikes, latency spikes, high error rates. |
GET /api/monitoring/circuit-breakers |
Circuit breaker states per provider. |
GET /api/monitoring/circuit-breakers/{provider} |
Circuit breaker for a specific provider. |
GET /api/monitoring/providers/comparison |
Multi-provider comparison matrix. |
GET /api/monitoring/token-efficiency/{provider}/{model} |
Token efficiency analysis. |
GET /api/monitoring/trial-analytics |
Trial system analytics. |
GET /api/monitoring/chat-requests |
Chat request monitoring. |
GET /api/monitoring/chat-requests/counts |
Chat request counts. |
GET /api/monitoring/chat-requests/models |
Chat requests by model. |
GET /api/monitoring/chat-requests/providers |
Chat requests by provider. |
GET /api/monitoring/chat-requests/plot-data |
Chat request time-series data. |
POST /monitoring |
Sentry tunnel (proxies Sentry events from frontend). |
| Endpoint | Description |
|---|---|
GET /api/instrumentation/health |
Instrumentation health. |
GET /api/instrumentation/config |
Current instrumentation configuration. |
GET /api/instrumentation/environment-variables |
Instrumentation env vars. |
GET /api/instrumentation/loki/status |
Loki log aggregation status. |
GET /api/instrumentation/tempo/status |
Tempo distributed tracing status. |
GET /api/instrumentation/otel/status |
OpenTelemetry status. |
GET /api/instrumentation/trace-context |
Current trace context. |
POST /api/instrumentation/test-log |
Send test log to Loki. |
POST /api/instrumentation/test-trace |
Send test trace to Tempo. |
| Endpoint | Description |
|---|---|
GET /prometheus/datasource |
Grafana SimpleJSON datasource test endpoint. |
POST /prometheus/datasource/search |
Metric name search. |
POST /prometheus/datasource/query |
Metric query. |
POST /prometheus/datasource/annotations |
Annotation query. |
POST /prometheus/datasource/tag-keys |
Tag key query. |
POST /prometheus/datasource/tag-values |
Tag value query. |
| Endpoint | Description |
|---|---|
GET /prometheus/data/metrics |
Prometheus telemetry data. |
GET /prometheus/data/admin/cache/status |
Cache status via Prometheus API. |
DELETE /prometheus/data/admin/cache/invalidate |
Invalidate cache via Prometheus API. |
GET /prometheus/data/instrumentation/health |
Instrumentation health via Prometheus. |
GET /prometheus/data/instrumentation/loki/status |
Loki status via Prometheus. |
GET /prometheus/data/instrumentation/tempo/status |
Tempo status via Prometheus. |
POST /prometheus/data/instrumentation/test-log |
Test log via Prometheus. |
POST /prometheus/data/instrumentation/test-trace |
Test trace via Prometheus. |
The Models & Catalog system is the model discovery and management layer. It provides model search, comparison, trending analytics, provider information, and HuggingFace integration. Users can discover models across all 30+ providers, compare them across dimensions (price, speed, context length), view trending models, and access detailed HuggingFace metadata (downloads, likes, model cards). The system also includes a canonical model registry (Modelz), model availability monitoring with circuit breaker integration, model health tracking, and a ranking leaderboard.
Boundaries -- what Models & Catalog does NOT do:
- Does not host or serve models (only catalogs and routes to external providers)
- Does not benchmark models itself (uses external benchmark data: SWE-bench, HumanEval, MMLU)
- Does not provide model fine-tuning or customization
- Does not guarantee real-time model availability (availability is tracked via background monitoring with in-memory state)
- Does not cache model responses (only caches model metadata/catalog data)
- Availability data is in-memory only (lost on restart, not backed by Redis or Supabase)
Model detail enrichment: Searches across all cached gateway model lists. Enriches with provider info (logos, site URLs) and HuggingFace data (downloads, likes). Redis-cached per gateway (5-15 min TTL).
Availability monitoring: ModelAvailabilityService singleton, entirely in-memory. Circuit breaker config: failure_threshold=5, recovery_timeout=300s, success_threshold=3, response_timeout=30s. Slow response (>30s): 3 consecutive triggers degradation.
Supported gateways (19): featherless, deepinfra, chutes, groq, fireworks, together, cerebras, nebius, xai, novita, hug, aimo, near, fal, anannas, aihubmix, vercel-ai-gateway, onerouter, helicone. OpenRouter always fetched as baseline.
Gateway registry: Frontend auto-discovers from GET /v1/gateways. Each gateway has a name, color, priority, and site URL.
| Endpoint | Description |
|---|---|
GET /v1/models |
List all models. Filter by provider, gateway, private, HuggingFace. Pagination supported. |
GET /v1/models/unique |
Deduplicated model list (one entry per model across providers). |
GET /v1/models/search |
Full-text model search. |
GET /v1/models/trending |
Top models ranked by requests, tokens, users, cost, speed. |
GET /v1/models/low-latency |
Low-latency optimized models. |
GET /v1/models/{provider}/{model} |
Specific model details with provider info. |
GET /v1/models/{provider}/{model}/compare |
Compare a model across all available providers. |
GET /v1/models/{developer} |
Models by developer/organization. |
POST /v1/models/batch-compare |
Batch compare multiple models at once. |
GET /api/models/detail |
Detailed model info enriched with HuggingFace data. Public, no auth. |
| Endpoint | Description |
|---|---|
GET /v1/modelz/models |
Full canonical model registry. |
GET /v1/modelz/ids |
All model IDs in the registry. |
GET /v1/modelz/check/{model_id} |
Check if a model exists and get its registry data. |
| Endpoint | Description |
|---|---|
GET /v1/provider |
List all providers with stats. |
GET /v1/provider/{provider_name}/stats |
Provider statistics (model count, request count, cost). |
GET /v1/provider/{provider_name}/top-models |
Top models for a provider by usage. |
GET /v1/routers |
Available intelligent routing options (code, general). |
| Endpoint | Description |
|---|---|
GET /v1/gateways |
List all registered gateways. Frontend auto-discovers from this. |
GET /v1/gateways/status |
Gateway statuses. |
GET /v1/gateways/summary |
Aggregated gateway statistics. |
GET /v1/gateway/{gateway}/stats |
Stats for a specific gateway. |
| Endpoint | Description |
|---|---|
GET /v1/huggingface/discovery |
Discover HuggingFace models. |
GET /v1/huggingface/search |
Search HuggingFace models. |
GET /v1/huggingface/author/{author}/models |
Models by a HuggingFace author. |
GET /v1/huggingface/models/{model_id}/details |
HuggingFace model details (downloads, likes, parameters). |
GET /v1/huggingface/models/{model_id}/card |
Model card (README). |
GET /v1/huggingface/models/{model_id}/files |
Model file listing. |
| Endpoint | Description |
|---|---|
GET /v1/model-health |
All model health data. |
GET /v1/model-health/stats |
Model health statistics. |
GET /v1/model-health/providers |
Provider-level health summary. |
GET /v1/model-health/unhealthy |
Currently unhealthy models. |
GET /v1/model-health/{provider}/{model} |
Health for a specific model. |
GET /v1/model-health/provider/{provider}/summary |
Provider health summary. |
| Endpoint | Description |
|---|---|
GET /ranking/models |
Model leaderboard with trend data (direction, percentage, logos). |
| Endpoint | Description |
|---|---|
GET /availability/models |
All model availability statuses with circuit breaker state. In-memory only. |
GET /availability/model/{model_id} |
Availability for a specific model. |
GET /availability/summary |
Availability summary (percentage, gateway breakdown). |
GET /availability/status |
Overall availability status (operational/degraded). |
GET /availability/check/{model_id} |
Quick availability check. Unknown models assumed available (optimistic). |
GET /availability/fallback/{model_id} |
Fallback providers for a model. |
GET /availability/best/{model_id} |
Best available provider for a model. |
POST /availability/maintenance/{model_id} |
Put a model in maintenance mode. |
DELETE /availability/maintenance/{model_id} |
Remove maintenance mode. |
POST /availability/monitoring/start |
Start availability monitoring background task. |
POST /availability/monitoring/stop |
Stop availability monitoring. |
| Endpoint | Description |
|---|---|
GET /catalog/models-db/ |
List all models in the database catalog. |
GET /catalog/models-db/{model_id} |
Get a model from the DB. |
GET /catalog/models-db/search |
Search the model catalog DB. |
GET /catalog/models-db/stats |
Catalog statistics. |
GET /catalog/models-db/provider/{provider_slug} |
Models by provider in DB. |
GET /catalog/models-db/health/{health_status} |
Models by health status in DB. |
GET /catalog/models-db/{model_id}/health/history |
Model health history. |
POST /catalog/models-db/bulk |
Bulk create models. |
POST /catalog/models-db/bulk-upsert |
Bulk upsert models. |
POST /catalog/models-db/upsert |
Upsert a single model. |
POST /catalog/models-db/{model_id}/activate |
Activate a model. |
POST /catalog/models-db/{model_id}/deactivate |
Deactivate a model. |
PATCH /catalog/models-db/{model_id}/health |
Update model health status. |
| Endpoint | Description |
|---|---|
GET /providers/ |
List all providers in DB. |
PATCH /providers/{provider_id} |
Update provider metadata. |
GET /providers/{provider_id}/models/stats |
Provider model statistics. |
This section covers features that don't fit neatly into the other categories: user registration, image generation, audio transcription, server-side tools, payments (Stripe), IP allowlists, Nosana GPU computing, partner trials, provider credit monitoring, user notifications, and system utilities.
Boundaries -- what these features do NOT do (by sub-feature):
- Image Generation: Does not host image models (proxies to Stability AI, DALL-E, etc.). Does not support image editing or inpainting. Credit-billed per generation.
- Audio Transcription: Does not support real-time streaming transcription. Proxies to Whisper. Credit-billed per minute.
- Tools: Only 2 server-side tools available (web_search via Tavily API, text_to_speech). Does not execute arbitrary code. Does not support user-defined tools.
- Payments: Does not handle cryptocurrency payments. Stripe only. Does not support invoicing or payment plans.
-
Nosana GPU: Proxies all requests to the Nosana external API. Does not manage GPU hardware directly. Requires
NOSANA_API_KEY. - Partner Trials: Does not support custom trial configurations per partner at runtime (configured in DB). Does not auto-extend trials.
- IP Allowlist: Does not support IPv6 range matching. CIDR notation supported for IPv4.
User Registration (POST /create): Creates user with $5 initial credits, 3-day trial, welcome email via Resend. Partner codes trigger PartnerTrialService. Referral codes update users.referred_by_code.
Image Generation (POST /v1/images/generations): Routes to image providers with credit billing based on model and resolution.
Audio Transcription (POST /v1/audio/transcriptions): Supports all major audio formats. Credit billing per minute of audio. Also accepts base64-encoded audio via /base64 endpoint.
Tools: Static tool registry with WebSearchTool (Tavily API, 30s timeout) and TextToSpeechTool. OpenAI-compatible function calling format. POST /v1/tools/search/augment provides web search context formatting for prompt injection.
Stripe Payments: Full checkout flow (create session -> webhook -> credit delivery). Webhook handles payment_intent.succeeded, charge.succeeded, invoice.paid, customer.subscription.created. Always returns 200 to Stripe (even on errors) for idempotency. Subscription management: upgrade, downgrade, cancel.
Nosana GPU: Pure proxy to Nosana external API (https://dashboard.k8s.prd.nos.ci/api/). Supports LLM, image generation, and Whisper deployments. Deployment lifecycle: create -> start -> stop -> archive. Job management: create, extend, stop.
Partner Trials: Configurable per partner (e.g., Redbeard: 14-day Pro trial, $100 credits, $50/day limit). 5-min in-memory cache per partner code. Status tracking in partner_trial_analytics table.
Velocity Mode: Auto-protection during high error rates. Activates when error rate exceeds 25% over 3-min window. Reduces rate limits to 50% of normal.
| Endpoint | Description |
|---|---|
POST /create |
Create new user account ($5 initial credits, 3-day trial, welcome email). |
| Endpoint | Description |
|---|---|
POST /v1/images/generations |
Generate images using AI models (Stability AI, DALL-E, etc.) with credit billing. |
| Endpoint | Description |
|---|---|
POST /v1/audio/transcriptions |
Transcribe audio files via Whisper. All major formats supported. Credit billing per minute. |
POST /v1/audio/transcriptions/base64 |
Transcribe base64-encoded audio. |
| Endpoint | Description |
|---|---|
GET /v1/tools |
List available server-side tools with OpenAI-compatible definitions. Public, no auth. |
GET /v1/tools/definitions |
Tool definitions formatted for function calling. Returns raw list. |
GET /v1/tools/{tool_name} |
Get a specific tool's details. 404 if not found. |
POST /v1/tools/execute |
Execute a server-side tool (web_search via Tavily, text_to_speech). Requires API key auth. |
POST /v1/tools/search/augment |
Web search + formatted context for prompt augmentation. Optional auth. Never raises HTTPException. |
| Endpoint | Description |
|---|---|
POST /api/stripe/checkout-session |
Create Stripe checkout session for credit purchase. |
GET /api/stripe/checkout-session/{session_id} |
Get checkout session status. Proxies to Stripe API. |
GET /api/stripe/credit-packages |
Available credit packages and pricing. Public, no auth. |
POST /api/stripe/payment-intent |
Create payment intent. |
GET /api/stripe/payment-intent/{payment_intent_id} |
Get payment intent status. Proxies to Stripe API. |
GET /api/stripe/payments |
List user's payment history. Paginated. |
GET /api/stripe/payments/{payment_id} |
Get payment details. |
POST /api/stripe/refund |
Process a refund. |
GET /api/stripe/subscription |
Get current subscription. |
POST /api/stripe/subscription-checkout |
Create subscription checkout. |
POST /api/stripe/subscription/upgrade |
Upgrade subscription plan. |
POST /api/stripe/subscription/downgrade |
Downgrade subscription plan. |
POST /api/stripe/subscription/cancel |
Cancel subscription. |
POST /api/stripe/webhook |
Stripe webhook handler. Always returns 200. Handles: payment_intent.succeeded, charge.succeeded, invoice.paid, subscription.created. |
| Endpoint | Description |
|---|---|
POST /api/admin/ip-whitelist |
Create IP allowlist entry. Supports CIDR ranges. Admin only. |
GET /api/admin/ip-whitelist/{entry_id} |
Get allowlist entry. |
PUT /api/admin/ip-whitelist/{entry_id} |
Update allowlist entry. |
DELETE /api/admin/ip-whitelist/{entry_id} |
Delete allowlist entry. |
GET /api/admin/ip-whitelist |
List all allowlist entries. |
POST /api/admin/ip-whitelist/check |
Check if an IP is allowlisted. |
| Endpoint | Description |
|---|---|
GET /nosana/config |
Nosana platform configuration (deployment strategies, supported frameworks). Static/hardcoded. |
GET /nosana/credits/balance |
Nosana credit balance. Proxies to Nosana API (120s read timeout). |
GET /nosana/deployments |
List all Nosana deployments. |
GET /nosana/deployments/{deployment_id} |
Deployment details. |
POST /nosana/deployments/llm |
Deploy LLM inference on GPU (vllm, ollama, lmdeploy). |
POST /nosana/deployments/image-generation |
Deploy image generation on GPU (stable-diffusion-webui). |
POST /nosana/deployments/whisper |
Deploy Whisper transcription on GPU. |
POST /nosana/deployments/{deployment_id}/start |
Start a deployment. |
POST /nosana/deployments/{deployment_id}/stop |
Stop a deployment. |
POST /nosana/deployments/{deployment_id}/archive |
Archive a deployment. |
POST /nosana/deployments/{deployment_id}/revisions |
Create a deployment revision. |
PATCH /nosana/deployments/{deployment_id}/replicas |
Update replica count. |
GET /nosana/markets |
List GPU markets. |
GET /nosana/markets/{market_id} |
Market details. |
GET /nosana/markets/{market_id}/resources |
Market resource requirements. |
POST /nosana/jobs |
Create a new GPU job. |
GET /nosana/jobs/{job_address} |
Job details. |
POST /nosana/jobs/{job_address}/extend |
Extend job duration. |
POST /nosana/jobs/{job_address}/stop |
Stop a job. |
| Endpoint | Description |
|---|---|
GET /partner-trials/config/{partner_code} |
Partner trial configuration. 5-min in-memory cache. Public. |
GET /partner-trials/check/{code} |
Check if code is a partner code (vs user referral). Always returns 200. |
GET /partner-trials/status |
Current user's partner trial status (active/expired/converted, days remaining, usage). |
GET /partner-trials/daily-limit |
Partner trial daily usage limit info. |
GET /partner-trials/analytics/{partner_code} |
Partner trial analytics. |
POST /partner-trials/start |
Start a partner trial (e.g., Redbeard 14-day Pro). |
POST /partner-trials/expire/{target_user_id} |
Force-expire a partner trial. |
| Endpoint | Description |
|---|---|
GET /api/provider-credits/balance |
All upstream provider credit balances. |
GET /api/provider-credits/balance/{provider} |
Specific provider credit balance. |
| Endpoint | Description |
|---|---|
GET /user/notifications/preferences |
Get notification preferences. |
POST /user/notifications/send-usage-report |
Send usage report email. |
POST /user/notifications/test |
Send test notification. |
| Endpoint | Description |
|---|---|
GET /ping |
System ping (pong response with uptime). |
GET /ping/stats |
Ping statistics. |
GET /sentry-debug |
Test Sentry error tracking integration (intentionally raises an error). |
GET /velocity-mode-status |
Security velocity mode status. Returns current error rate, threshold (25%), limits (normal vs velocity). In-memory only. |
GET / |
Root endpoint (API info, version). |
| Endpoint | Description |
|---|---|
GET /health/gateways/optimized |
Optimized gateway health. |
GET /health/models/optimized |
Optimized model health. |
GET /health/providers/optimized |
Optimized provider health. |
GET /health/dashboard/optimized |
Optimized dashboard data. |
The Status system provides public-facing status information about model and provider availability. It includes both a lightweight operational check (operational/degraded) and a detailed infrastructure status view with concurrency metrics, circuit breaker states, and database/cache connectivity. These endpoints are designed for status pages and external monitoring.
Boundaries -- what Status does NOT do:
- Does not provide incident management or incident history (that's the Admin downtime tracking system)
- Does not send status notifications or updates (only exposes current state)
- Does not provide per-user status (only system-wide)
- Always returns HTTP 200 (even in error states), reporting degradation in the response body
- Does not track status history or provide uptime percentages over time
GET /availability/status: Simple "operational" (>90% availability) or "degraded" (<=90%) based on ModelAvailabilityService in-memory data. Always returns 200.
GET /v1/status/detailed: Combines Prometheus concurrency gauges, Redis circuit breaker states per provider (circuit_breaker:{provider}:*, 5 keys per provider, 3600s TTL), and infrastructure connectivity (Supabase client global, Redis health cache GET health:system with 360s TTL). Status = "normal" if active requests < 15, else "high_load". Graceful degradation on all errors.
| Endpoint | Description |
|---|---|
GET /v1/status/ |
Overall system status (operational/degraded). |
GET /v1/status/detailed |
Detailed status: concurrency, circuit breakers, infrastructure (database, cache). |
GET /v1/status/providers |
Provider availability statuses. |
GET /v1/status/models |
Model availability statuses. |
GET /v1/status/models/{provider}/{model_id} |
Specific model status. |
GET /v1/status/incidents |
Recent incidents. |
GET /v1/status/uptime/{provider}/{model_id} |
Model uptime history. |
GET /v1/status/search |
Search models on status page. |
GET /v1/status/stats |
Status page statistics. |
The Users system provides user-facing account management: profile viewing/editing, credit balance checking, plan/subscription info, rate limit visibility, activity logs, API key management, referral codes, and account deletion. This is the user's self-service interface for managing their Gatewayz account.
Boundaries -- what Users does NOT do:
- Does not provide admin-level user management (that's the Admin system)
- Does not handle authentication (that's the Auth system; Users requires an already-authenticated API key)
- Does not process payments (that's Stripe; Users only views balance and plan info)
- Does not mask API keys in list responses (full plaintext keys are returned)
- Does not provide cross-user visibility (each user can only see their own data)
- Does not support team/organization accounts (single-user only)
Auth chain: get_current_user -> get_api_key (Bearer token + validate_api_key_security) -> get_user (5-min in-memory TTLCache, 512 entries max) -> validate_trial_expiration (HTTP 402 if expired).
API key creation (POST /user/api-keys): Rate limited at 10 creations/hour per user_id (sliding window). Key format: gw_{env}_{random43chars} (e.g., gw_live_abc123...). Stored with Fernet AES-128 encryption + SHA-256 HMAC hash + last4 tracking. Plan entitlement enforcement caps max_requests. Schema cache error handling: PGRST204 triggers refresh_postgrest_schema_cache RPC and retry.
Activity stats: Queries activity_log table with user_id + date range (default 30 days, no pagination limit). Client-side aggregation by date/model/provider.
Activity log caveat: The total field in the activity log response is the count of returned records, NOT the total in the database.
Audit logging: audit_logger.log_api_key_usage() fires on every authenticated call. ObservabilityMiddleware records http_requests_total per endpoint.
| Endpoint | Description |
|---|---|
GET /user/profile |
User profile (email, username, credits, trial status, plan). |
PUT /user/profile |
Update user profile (email, username, preferences). Email uniqueness enforced. |
GET /user/balance |
Current credit balance and status. |
GET /user/monitor |
User's own usage monitoring data. |
GET /user/plan |
Current subscription plan. |
GET /user/plan/entitlements |
Plan entitlements (what the plan includes). |
GET /user/plan/usage |
Plan usage vs limits. |
GET /user/limit |
Daily spending limit. |
GET /user/credit-transactions |
Credit transaction history. |
GET /user/environment-usage |
Usage by environment (live/test/staging/dev). |
GET /user/cache-settings |
User's cache settings. |
DELETE /user/account |
Delete user account (irreversible). |
| Endpoint | Description |
|---|---|
GET /user/activity/stats |
Activity statistics: total requests/tokens/spend, daily breakdown, by model/provider. Default: last 30 days. No pagination limit. |
GET /user/activity/log |
Paginated activity log (limit 1-1000). Filters: date range, model, provider (exact match). Note: total field is current page count only. |
| Endpoint | Description |
|---|---|
POST /user/api-keys |
Create a new API key. Rate limited: 10/hour. Fernet encrypted. Key shown once. |
GET /user/api-keys |
List all API keys (full plaintext key strings, not masked). |
GET /user/api-keys/usage |
API key usage statistics. |
GET /user/api-keys/audit-logs |
API key audit logs. |
PUT /user/api-keys/{key_id} |
Update API key (name, active status). |
DELETE /user/api-keys/{key_id} |
Delete an API key. |
| Endpoint | Description |
|---|---|
GET /user/rate-limits |
User's rate limit configuration and current usage. |
GET /user/rate-limits/usage/{key_id} |
Rate limit usage for a specific key. |
PUT /user/rate-limits/{key_id} |
Update rate limit for a key. |
POST /user/rate-limits/bulk-update |
Bulk update rate limits. |
| Endpoint | Description |
|---|---|
GET /plans |
List all available subscription plans. |
GET /plans/{plan_id} |
Plan details. |
GET /subscription/plans |
Subscription plans (alternate path). |
GET /trial/status |
Current trial status (active/expired, days remaining). |
| Endpoint | Description |
|---|---|
GET /referral/code |
Get user's referral code. |
POST /referral/generate |
Generate a new referral code. |
GET /referral/stats |
Referral statistics (total referred, conversion rate, rewards earned). Redis cached (5-min). |
POST /referral/validate |
Validate and apply a referral code. |
| Endpoint | Description |
|---|---|
GET /analytics/transactions |
Transaction analytics. Proxies to OpenRouter frontend API using OPENROUTER_COOKIE env var. 503 if cookie not set. |
GET /analytics/transactions/summary |
Transaction summary. |
| Category | Endpoints | Key Capabilities |
|---|---|---|
| Admin | 80+ | User management, credits, caches, model sync, rate limits, RBAC, trials, downtime, coupons, notifications |
| Analytics | 5 | Server-side event forwarding (Statsig + PostHog), ad-blocker bypass |
| Authentication | 5 | Multi-method auth (Privy, Google, GitHub, email, wallet), auto-registration |
| Chat & Messaging | 20+ | OpenAI/Anthropic inference, 30+ provider failover, streaming, sessions, history, feedback, sharing |
| Circuit Breakers | 4 | Provider circuit breaker monitoring (CLOSED/OPEN/HALF_OPEN), reset controls |
| Code Router | 5 | Benchmark-driven code model selection (SWE-bench, HumanEval), 4 modes |
| Coupons | 3 | User coupon redemption, credit bonuses |
| Credits | 7 | Balance, transactions, add/adjust/refund/bulk operations |
| Diagnostics | 2 | Real-time concurrency and provider timing diagnostics |
| Error Monitoring | 13 | Autonomous error detection, Loki integration, AI fix generation (Claude) |
| General Router | 5 | ML-powered model selection (NotDiamond), 4 optimization modes |
| Health & Monitoring | 30+ | Multi-tier health: system, providers, models, gateways, auto-fix |
| Metrics & Observability | 40+ | Prometheus, Grafana, OpenTelemetry, Loki, Tempo, anomaly detection |
| Models & Catalog | 50+ | Discovery, search, compare, trending, HuggingFace, availability, CRUD |
| Other | 50+ | Images, audio, tools, Stripe, IP allowlists, Nosana GPU, partner trials, notifications |
| Status | 9 | Public status page, provider/model availability, incident history |
| Users | 25+ | Profile, balance, plan, API keys, rate limits, activity, referrals |
| Total | 450+ |
Source: API Mappings Wiki | Conceptual Model
Reading Path (start here, in order)
- Conceptual Model
- Stability Definition
- Conceptual Model Features
- Features
- Delta Report
- Features-Acceptance-Criteria
Testing
Security & Access
Billing
Monitoring
Features
Providers
Operations
Data References