Skip to content
arminrad edited this page Mar 16, 2026 · 4 revisions

Gatewayz Backend — Complete Feature Reference

Reading path: Conceptual Model | Stability Definition | Conceptual Model Features | Features (you are here) | Delta Report | Features-Acceptance-Criteria

Read after: Conceptual Model Features (the spec — what we should build) Next: Delta Report (gap analysis — spec vs reality)


TL;DR — This is the reality check. 450+ endpoints across 17 systems that are actually built and running today. Each system section includes a high-level overview, what it does NOT do (boundaries), and low-level technical details (auth mechanisms, middleware, database calls, caching). Compare this against Conceptual Model Features to understand the gap, or go straight to Delta Report for the pre-built comparison.

Version: 2.0.4 | Last Updated: 2026-03-04


Table of Contents

  1. Admin
  2. Analytics
  3. Authentication
  4. Chat & Messaging
  5. Circuit Breakers
  6. Code Router
  7. Coupons
  8. Credits
  9. Diagnostics
  10. Error Monitoring
  11. General Router
  12. Health & Monitoring
  13. Metrics & Observability
  14. Models & Catalog
  15. Other
  16. Status
  17. Users

1. Admin

High-Level Overview

The Admin system is an internal operations layer providing 80+ endpoints for managing all aspects of the Gatewayz platform. It covers user and account management, financial operations (credits, billing), system monitoring and diagnostics, cache management, rate limit configuration, model/provider catalog synchronization, coupon lifecycle management, downtime incident tracking, IP whitelisting, role-based access control (RBAC), trial analytics, and notification processing.

Admin endpoints are designed for internal tooling, admin dashboards, and support workflows -- not for end-user consumption.

Boundaries -- what Admin does NOT do:

  • Does not handle end-user inference requests (chat completions, provider failover are separate systems)
  • Does not manage Stripe/payment processing directly (admin can manually grant credits, but payment webhooks and subscriptions are separate)
  • Does not provide self-service admin creation (admin role assignment is done via the roles endpoints or directly in the database)
  • Does not enforce cross-process cache consistency (most caches are in-process Python dicts, not shared across instances)
  • Does not provide real-time streaming or WebSocket feeds (all endpoints are request/response)
  • Does not perform automated remediation (downtime endpoints let admins view and resolve incidents, but there is no auto-healing)
  • Does not have pagination on all read endpoints (e.g., GET /admin/balance and coupon stats fetch all rows)

Low-Level Technical Details

Authentication: Two mechanisms exist:

  1. Primary (require_admin dependency): HTTPBearer -> get_api_key() -> validate_api_key_security() (checks active, expiration, IP allowlist, domain restrictions) -> get_user() (5-min in-memory cache) -> validate_trial_expiration() -> check user.role == "admin" OR user.is_admin == True. Unauthorized attempts trigger audit_logger.log_security_violation("UNAUTHORIZED_ADMIN_ACCESS") and return 403.
  2. Secondary (monitoring endpoints only): get_admin_key() compares Bearer token against ADMIN_API_KEY env var using secrets.compare_digest() (constant-time comparison).

Middleware pipeline (all admin requests): SecurityMiddleware (50% Sentry sampling for /admin/ paths) -> RequestTimeoutMiddleware (55s) -> ConcurrencyMiddleware -> RequestIDMiddleware -> TraceContextMiddleware -> AutoSentryMiddleware -> CORSMiddleware -> ObservabilityMiddleware (Prometheus http_requests_total) -> SelectiveGZipMiddleware -> StagingSecurityMiddleware -> DeprecationMiddleware.

Database: All operations go through get_supabase_client() (PostgREST). Most use synchronous Supabase calls wrapped in asyncio.to_thread(). Some use execute_with_retry(fn, max_retries=2, retry_delay=0.2) for resilience. No admin endpoint uses atomic transaction wrapping.

Caching: Three tiers: (1) in-process Python dicts for models/providers/users (not shared across processes), (2) Redis for trial analytics and chat summary caching, (3) functools.lru_cache for the rate limit manager singleton.

Endpoints

User Management

Endpoint Description
GET /admin/users Paginated, filterable user list with email/API key/active status filters. Fast path via RPC search_users_by_email for email-only searches.
GET /admin/users/{user_id} Comprehensive user profile: all fields, all API keys, 10 recent usage records, 10 recent activity logs. Returns plaintext API key strings.
GET /admin/users/by-api-key Exact-match user lookup by API key string. Uses indexed RPC for O(log n) lookup (~10-20ms).
GET /admin/users/count Total registered user count via server-side COUNT(*). Ultra-fast (5-20ms).
GET /admin/users/growth Daily cumulative user registration counts over configurable time period (1-365 days). Includes growth rate calculation.
GET /admin/users/stats Aggregated user statistics (counts by role, active status, credits totals, subscription breakdown). Up to 5 sequential queries.
DELETE /admin/users/by-domain/{domain} Bulk delete users by email domain. dry_run=true by default. Protected domains (gmail, yahoo, outlook, etc.) blocked. Non-atomic (per-user DELETE in loop).
GET /admin/api-keys/{api_key_id} Full API key details by numeric ID, including plaintext key string and owner profile.
GET /admin/balance Credit balances for ALL users. No pagination -- fetches all records. Intended for small datasets.

Credit Operations

Endpoint Description
POST /admin/add_credits Add credits to a user account. Per-transaction cap (ADMIN_MAX_CREDIT_GRANT) and 24h rolling daily limit (ADMIN_DAILY_GRANT_LIMIT). Logs transaction with full audit trail. Invalidates user cache.
GET /admin/credit-transactions Paginated credit transaction history with filtering (user, type, date range, amount, direction). Warning: min_amount/max_amount filters fetch ALL rows into Python memory.
POST /admin/limit Set per-user daily credit spending limits. Upserts rate_limit_configs. Does NOT clear rate limit cache (use /admin/clear-rate-limit-cache separately).
POST /admin/assign-plan Assign a subscription plan to a user.

System Monitoring

Endpoint Description
GET /admin/monitor Comprehensive system snapshot: user counts, credit totals, API usage (today + 30 days). 8 sequential Supabase queries, 500ms-2s typical. Merges activity_log + legacy usage_records with deduplication.
GET /admin/monitoring/chat-requests Paginated chat completion request records. Filters: model, provider, date range. JOINs models + providers. Limit up to 100,000.
GET /admin/monitoring/chat-requests/summary Aggregate chat request stats (tokens, latency, cost, success rate). Redis cached (60s TTL, key based on filter hash).
GET /admin/monitoring/chat-requests/plot-data Chart data: last 10 full records + parallel arrays (tokens, latency, timestamps) for all matching requests. No LIMIT on plot query.
GET /admin/monitoring/chat-requests/by-api-key Chat requests for a specific API key. Resolves key string to ID.
GET /admin/monitoring/chat-requests/providers Providers with chat request activity. Primary: RPC path. Fallback: per-provider COUNT queries.
GET /admin/monitoring/chat-requests/counts Request count leaderboard by model. In-memory grouping.
GET /admin/monitoring/chat-requests/models Models with comprehensive request statistics (token totals, averages, latency).
GET /admin/model-usage-analytics Pre-aggregated model usage from database view. Paginated, searchable, sortable by 11 fields.
GET /admin/monitoring/api-key-tracking-quality API key tracking quality analysis. Uses get_admin_key auth (env var). Alert: ok (>=90%), warning (70-90%), critical (<70%).
GET /admin/monitoring/api-key-tracking-trend Daily time-series of API key tracking quality over configurable period.

Cache Management

Endpoint Description
GET /admin/cache-status In-process provider catalog cache metadata (age, TTL 1800s, validity, provider count). Sub-millisecond.
GET /admin/huggingface-cache-status HuggingFace cache state with all cached model IDs (can be 1000+ items).
GET /admin/debug-models Diagnostic: sample models/providers from caches, tests provider-slug matching. May trigger cache refresh if stale.
POST /admin/refresh-providers Force refresh provider cache (1s debounce, then immediate fetch).
POST /admin/refresh-huggingface-cache Invalidate HuggingFace cache (lazy -- next request triggers cold fetch, 500ms-2s).
POST /admin/clear-rate-limit-cache Clear in-memory rate limit config cache. Does NOT clear Redis counters.
GET /admin/cache/status Overall cache status.
GET /admin/cache/debouncer/stats Cache debouncer statistics.
GET /admin/cache/warmer/stats Cache warmer statistics.
GET /admin/cache/modelz/status Modelz cache status.
POST /admin/cache/refresh/{gateway} Refresh cache for a specific gateway.
POST /admin/cache/clear Clear all caches.
POST /admin/cache/modelz/refresh Refresh modelz cache.
POST /admin/cache/pricing/refresh Refresh pricing cache.
POST /admin/api/cache/invalidate Invalidate specific cache entry.
DELETE /admin/cache/modelz/clear Clear modelz cache.

Model Sync

Endpoint Description
GET /admin/model-sync/status Current model sync job status.
GET /admin/model-sync/health Model sync service health.
GET /admin/model-sync/providers List of 33 syncable providers. No auth enforced.
POST /admin/model-sync/trigger Trigger incremental model sync.
POST /admin/model-sync/all Sync all models from all providers.
POST /admin/model-sync/full Full catalog resync (delete + reimport).
POST /admin/model-sync/incremental Incremental delta sync.
POST /admin/model-sync/providers-only Sync provider metadata only.
POST /admin/model-sync/provider/{provider_slug} Sync a single provider.
POST /admin/model-sync/reset-and-resync Flush and fully resync the catalog.
DELETE /admin/model-sync/flush-models Flush all models from catalog DB.
DELETE /admin/model-sync/flush-providers Flush all providers from catalog DB.

Rate Limits

Endpoint Description
GET /admin/rate-limits/config Current rate limit configuration.
GET /admin/rate-limits/system System-wide rate limit stats.
GET /admin/rate-limits/users Per-user rate limit stats.
PUT /admin/rate-limits/update Update rate limit rules.
POST /admin/rate-limits/config/reset Reset rate limit config to defaults.
DELETE /admin/rate-limits/delete Delete rate limit entries for an API key.

Roles (RBAC)

Endpoint Description
GET /admin/roles/{user_id} Get user's role info.
POST /admin/roles/update Update a user's role with reason logging.
GET /admin/roles/list/{role} List all users with a specific role.
GET /admin/roles/permissions/{role} Get permissions for a role.
GET /admin/roles/audit/log Role change audit log.

Trial Analytics

Endpoint Description
GET /admin/trial/analytics Trial usage analytics. Redis cached (5-min TTL). Paginated fetch of ALL api_keys_new records on cache miss. Computes conversion rate, usage stats, status breakdown.
GET /admin/trial/users List all trial users with status.
GET /admin/trial/domain-analysis Trial sign-up analysis by email domain (abuse detection).
GET /admin/trial/conversion-funnel Trial-to-paid conversion funnel.
GET /admin/trial/ip-analysis Trial sign-ups by IP (fraud detection).
GET /admin/trial/cohort-analysis Trial user cohort retention analysis.
POST /admin/trial/save-conversion-metrics Save conversion metrics snapshot.

Downtime Tracking

Endpoint Description
GET /admin/downtime/incidents List downtime incidents with filters (status, severity, environment). Uses execute_with_retry (2 retries).
GET /admin/downtime/incidents/ongoing Currently active incidents only.
GET /admin/downtime/statistics Downtime statistics over time period (MTTR, frequency, by severity/status). Returns zeroed stats on failure.
GET /admin/downtime/incidents/{incident_id} Full incident details including logs, server info, metrics snapshot.
GET /admin/downtime/incidents/{incident_id}/logs Filtered incident logs (by level, logger, search term). In-memory filtering.
GET /admin/downtime/incidents/{incident_id}/analysis Error pattern analysis: error/warning counts, type distribution, top 10 messages.
POST /admin/downtime/incidents/{incident_id}/capture-logs Trigger Loki log capture for ongoing incident. External HTTP to Grafana Loki (30s timeout). Stores up to 10,000 entries.
POST /admin/downtime/incidents/{incident_id}/resolve Resolve incident. Records ended_at, resolved_by, optional notes. Rejects if already resolved.

Coupons (Admin)

Endpoint Description
GET /admin/coupons List coupons with filters (scope, type, is_active). Paginated.
GET /admin/coupons/{coupon_id} Single coupon details.
GET /admin/coupons/{coupon_id}/analytics Per-coupon analytics: redemption rate, remaining uses, unique users, total value.
GET /admin/coupons/stats/overview Coupon system overview. Fetches ALL coupons and ALL redemptions (no pagination).

Notifications (Admin)

Endpoint Description
POST /admin/notifications/process Process pending notification queue.
GET /admin/notifications/stats Notification delivery statistics.

Other Admin

Endpoint Description
GET /admin/test-huggingface/{hugging_face_id} Test HuggingFace API connectivity. Synchronous HTTP call (blocks event loop, 10s timeout). Redis cached (1h TTL).
GET /admin/health/optimizations/cache Health optimization cache status.
POST /admin/health/optimizations/cache/clear Clear health optimization cache.

2. Analytics

High-Level Overview

The Analytics system provides server-side event forwarding to Statsig and PostHog, allowing the frontend to bypass client-side ad-blockers that would otherwise prevent analytics collection. Events are routed through the backend as a proxy, ensuring reliable tracking of user behavior, feature usage, and session activity.

Boundaries -- what Analytics does NOT do:

  • Does not store events in the Gatewayz database (events are forwarded to external services only)
  • Does not provide its own analytics dashboard or query interface (use Statsig/PostHog dashboards)
  • Does not guarantee event delivery (both services fail silently if unavailable)
  • Does not process events concurrently within a batch (sequential processing -- one slow call delays the rest)
  • Does not track inference/chat usage (that's handled by the activity logging system)

Low-Level Technical Details

Authentication: Optional. Uses get_current_user which allows None. User ID resolution priority: authenticated user ID > event-supplied user_id > "anonymous".

External services: Statsig (batched flush: 10s interval or 50-event queue) and PostHog (async capture via SDK). Configured via STATSIG_SERVER_SECRET_KEY, POSTHOG_API_KEY, POSTHOG_HOST env vars.

No database writes, no Redis, no Prometheus metrics, no audit logging. Both services fail silently if not configured -- endpoints always return 200.

Batch processing caveat: Events in a batch are processed sequentially. If any event fails, remaining events in that batch are skipped (HTTP 500).

Endpoints

Endpoint Description
POST /v1/analytics/events Send a single analytics event. Routes to both Statsig and PostHog. Always returns 200.
POST /v1/analytics/batch Batch send multiple analytics events. Sequential processing per event.
POST /v1/analytics/session/start Start an analytics session. Creates session-level tracking context.
GET /v1/analytics/cache Analytics cache status and data.
GET /v1/analytics/cache/summary Analytics cache summary.

3. Authentication

High-Level Overview

The Authentication system handles user identity verification through Privy (primary identity provider), supporting multiple auth methods: Google OAuth, GitHub OAuth, email/password, crypto wallet, and phone number. The primary POST /auth endpoint serves as both login and registration -- it validates the user's Privy token and either returns an existing account or creates a new one automatically. New users start with $5 in credits, a 3-day trial, and a "basic" tier.

Boundaries -- what Authentication does NOT do:

  • Does not validate Privy access tokens (the field exists but is not currently checked)
  • Does not handle OAuth flows directly (Privy handles the OAuth redirect and token exchange)
  • Does not support traditional username/password login (only Privy-mediated authentication)
  • Does not provide MFA/2FA management (delegated to Privy)
  • Does not manage user sessions server-side (stateless API key-based authentication post-login)
  • Does not rate-limit at the API key level for auth endpoints (uses IP-based limits only)

Low-Level Technical Details

Rate limiting: Login: 10 attempts per 15 minutes per IP (in-memory sliding window). Registration: 3 attempts per hour per IP.

User lookup chain: In-memory Redis-backed cache by Privy ID -> Supabase SELECT * FROM users WHERE privy_user_id = ? -> Secondary fallback by username.

Email verification pipeline: Local blocklist check -> temp email service list -> Emailable API verification (external service).

New user provisioning: $5 initial credits, 3-day trial, "basic" tier. Partner codes (e.g., REDBEARD) trigger PartnerTrialService for extended trials. Referral codes update users.referred_by_code.

API key creation: Existing users get their primary key from api_keys_new table. New users get an auto-created primary key with Fernet AES-128 encryption + SHA-256 HMAC hashing.

Background tasks on login/register: Welcome email via Resend, activity log insert.

Auth info priority: request.email > linked "email" account > Google OAuth > phone > GitHub.

Endpoints

Endpoint Description
POST /auth Primary auth endpoint. Validates Privy token, returns API key + credits + subscription status. Creates new user if none exists. Rate limited: 10/15min per IP.
POST /auth/register Direct registration (alternative to Privy flow). Rate limited: 3/hour per IP.
POST /auth/password-reset Request a password reset email via Resend.
POST /auth/reset-password Reset password using a token from the reset email.
GET /auth/health Auth service health check. 4 sequential checks: DB connectivity (3s timeout), Redis ping, auth cache stats, timeout config. Always returns 200.

4. Chat & Messaging

High-Level Overview

The Chat & Messaging system is the core of Gatewayz -- it provides unified AI inference across 30+ providers through OpenAI-compatible and Anthropic-compatible APIs. The primary endpoint POST /v1/chat/completions is a full drop-in replacement for OpenAI's Chat Completions API, supporting streaming SSE, tool/function calling, JSON mode, logprobs, and all standard parameters. It automatically routes requests across providers with failover, applies credit billing, enforces rate limits, and records comprehensive observability data.

Beyond inference, the system includes chat session management (create, list, search, delete sessions), message history persistence, user feedback collection, chat sharing via public links, and Vercel AI SDK compatibility endpoints.

Boundaries -- what Chat & Messaging does NOT do:

  • Does not train or fine-tune models (inference only)
  • Does not store raw model responses long-term (activity logs capture metadata, not full response text unless chat history is enabled)
  • Does not provide its own models (routes to external providers: OpenRouter, Featherless, Chutes, DeepInfra, Fireworks, Together, Groq, Cerebras, etc.)
  • Does not support file uploads in chat (images in messages are supported via URL references, not file upload)
  • Does not handle Assistants API or threads (OpenAI Assistants API is not implemented)
  • Does not provide batch/async inference (all requests are synchronous or streaming)
  • Does not guarantee message ordering across concurrent requests to the same session
  • Does not implement conversation memory or context management (the client is responsible for sending conversation history)

Low-Level Technical Details

Authentication: Optional for the main inference endpoint. Supports both anonymous (no key, IP rate-limited, model whitelist restricted) and authenticated (API key) users.

Authenticated request pipeline (10+ steps):

  1. Parallel auth via asyncio.gather (user lookup + api_key_id resolution + trial status)
  2. Trial validation (402 if expired)
  3. Plan check (entitlement enforcement)
  4. Redis rate limiting (INCR rate_limit:{api_key_id}:{minute}, EXPIRE TTL 60s)
  5. Credit balance check (pre-flight)
  6. Auto web search injection (if enabled, prepends search results to prompt)
  7. Router detection (router:general:* or router:code:* model IDs)
  8. Provider detection + model transformation (maps model IDs to provider-specific formats)
  9. Failover chain construction (checks circuit breaker states via Redis per provider)
  10. Health-based provider selection
  11. Streaming or non-streaming dispatch

Anonymous path: IP-based Redis rate limit (INCR anon_rate:{ip_hash}:{minute}, TTL 60s) + model whitelist check.

Provider failover: build_provider_failover_chain() reads circuit breaker state per provider via Redis GET circuit_breaker:{provider}. Skips providers with OPEN circuit breakers.

Streaming: SSE with StreamNormalizer.normalize() across all providers. TTFC (Time to First Chunk) tracked via Prometheus. Background post-processing after stream completes: credit deduction, activity log, chat history save, health capture.

Prometheus metrics: model_inference_requests, model_inference_duration, tokens_used, credits_used, api_cost_usd_total, api_cost_per_request, TTFC.

Token estimation fallback: When providers don't return usage data, estimates at ~1 token per 4 characters.

Endpoints

Inference

Endpoint Description
POST /v1/chat/completions Primary inference endpoint. Full OpenAI Chat Completions API compatibility (streaming SSE, tool/function calling, JSON mode, logprobs). Routes to 30+ providers with automatic failover. Credit billing.
POST /v1/messages Anthropic Messages API. Drop-in Claude compatibility. Same routing/billing pipeline.
POST /v1/responses OpenAI v1/responses API. Unified response format.
POST /api/chat/ai-sdk Vercel AI SDK compatible chat endpoint.
POST /api/chat/ai-sdk-completions Vercel AI SDK completions endpoint.

Chat Sessions & History

Endpoint Description
GET /v1/chat/sessions List all chat sessions for the authenticated user.
GET /v1/chat/sessions/{session_id} Get session with full message history.
POST /v1/chat/sessions/{session_id}/messages Save a message to a session.
POST /v1/chat/sessions/{session_id}/messages/batch Batch save messages to a session.
POST /v1/chat/search Full-text search across chat sessions.
PUT /v1/chat/sessions/{session_id} Update session metadata (title, etc.).
DELETE /v1/chat/sessions/{session_id} Delete a chat session and its messages.
GET /v1/chat/stats Chat usage statistics for the user.

Feedback

Endpoint Description
POST /v1/chat/feedback Submit feedback (thumbs up/down, text) on a response.
GET /v1/chat/feedback Get user's feedback entries.
GET /v1/chat/feedback/stats Feedback statistics.
GET /v1/chat/sessions/{session_id}/feedback Feedback for a specific session.
PUT /v1/chat/feedback/{feedback_id} Update existing feedback.
DELETE /v1/chat/feedback/{feedback_id} Delete feedback.

Sharing

Endpoint Description
POST /v1/chat/share Create a shareable public link for a chat session.
GET /v1/chat/share List user's share links.
GET /v1/chat/share/{token} Access a shared chat (public, no auth).
DELETE /v1/chat/share/{token} Delete a share link.

Chat Metrics

Endpoint Description
GET /v1/chat/completions/metrics/tokens-per-second Token throughput per model (Prometheus format).
GET /v1/chat/completions/metrics/tokens-per-second/all Aggregate tokens/second across all models.

5. Circuit Breakers

High-Level Overview

The Circuit Breaker system implements the circuit breaker pattern for provider reliability. Each AI provider has an independent circuit breaker that tracks success/failure counts and automatically transitions between states: CLOSED (normal operation), OPEN (blocking requests after too many failures), and HALF_OPEN (testing recovery). This prevents cascading failures by stopping requests to unhealthy providers and automatically recovering when they come back online.

These endpoints are read/control interfaces for the circuit breaker states -- the actual circuit breaking happens inside the inference pipeline.

Boundaries -- what Circuit Breakers does NOT do:

  • Does not make inference requests (only monitors/controls state; the inference pipeline uses circuit breaker state)
  • Does not persist state to Supabase (Redis + in-memory only; state is lost if both Redis and the process restart)
  • Does not provide historical circuit breaker data (only current state)
  • Does not auto-configure thresholds per provider (all providers use the same default config)
  • Does not trigger alerts or notifications on state transitions (only Prometheus metrics are emitted)

Low-Level Technical Details

State storage: Redis keys circuit_breaker:{provider}:{state|failure_count|success_count|opened_at|consecutive_opens} with 3600s TTL. In-memory fallback if Redis unavailable.

State transitions: CLOSED -> OPEN (after 5 consecutive failures), OPEN -> HALF_OPEN (after 300s recovery timeout), HALF_OPEN -> CLOSED (after 3 consecutive successes), HALF_OPEN -> OPEN (on any failure).

Auto-creation: Querying a nonexistent provider creates a new breaker with default config (CLOSED, zero counts). There is no 404 case.

Prometheus metrics (emitted on state changes, not by these endpoints): circuit_breaker_state_transitions_total, circuit_breaker_failures_total, circuit_breaker_successes_total, circuit_breaker_rejected_requests_total, circuit_breaker_current_state.

No Supabase queries. Entirely Redis + in-memory.

Endpoints

Endpoint Description
GET /circuit-breakers All circuit breaker states with failure/success counts per provider. No auth required.
GET /circuit-breakers/{provider} Circuit breaker state for a specific provider. Auto-creates if not found.
POST /circuit-breakers/{provider}/reset Reset a provider's circuit breaker to CLOSED state (clears failure counts).
POST /circuit-breakers/reset-all Reset all circuit breakers to CLOSED.

6. Code Router

High-Level Overview

The Code Router provides intelligent, benchmark-driven model selection specifically for coding tasks. It classifies task complexity and matches requests to tiered models scored by SWE-bench and HumanEval benchmarks. Four routing modes are available: auto (complexity-based), price (cheapest capable model), quality (highest benchmark score), and agentic (optimized for multi-step tool-using agents).

Boundaries -- what Code Router does NOT do:

  • Does not execute code or run benchmarks (uses pre-computed static benchmark data from code_quality_priors.json)
  • Does not make inference requests (only selects models; the inference pipeline dispatches to the selected model)
  • Does not learn or adapt from user feedback (static benchmark data, never reloaded after startup)
  • Does not support custom model pools or user-defined tiers
  • Does not interact with any database, Redis, or external service (entirely in-memory/static)

Low-Level Technical Details

Data source: src/services/code_quality_priors.json -- lazy-loaded at startup, cached at module level (never reloaded). On file read error: Sentry capture + minimal fallback config.

Tier system: 4 tiers of models ranked by SWE-bench/HumanEval benchmark scores and pricing. Fallback model: zai/glm-4.7.

Routing modes: router:code:auto (complexity detection), router:code:price, router:code:quality, router:code:agentic.

No Supabase, no Redis, no Prometheus, no external calls. All endpoints serve static/in-memory data.

Endpoints

Endpoint Description
GET /code-router/settings/options Available code router configuration options (modes, parameters).
GET /code-router/tiers Code model tiers with benchmark scores and pricing per tier.
GET /code-router/stats Code router usage statistics.
POST /code-router/test Test code routing with a sample prompt. Returns selected model and routing rationale.
POST /code-router/settings/validate Validate code router settings before applying.

7. Coupons

High-Level Overview

The Coupon system allows users to redeem coupon codes for credit bonuses. Coupons can be global (available to all users) or user-specific (assigned to a particular user). Each coupon has a USD value, usage limits, and validity dates. Admin coupon management (creation, analytics) is under the Admin section.

Boundaries -- what Coupons does NOT do:

  • Does not create or manage coupons (that's the Admin coupon endpoints)
  • Does not provide percentage-based discounts (all coupons are fixed USD credit values)
  • Does not apply coupons to specific purchases or subscriptions (credits are added to the general balance)
  • Does not support coupon stacking or combining multiple codes in one redemption
  • Does not send notifications when coupons are about to expire

Low-Level Technical Details

Redemption flow: Validates coupon code -> checks expiration/active/max_uses -> checks scope (global vs user-specific) -> checks if user already redeemed -> adds credits via add_credits_to_user() -> records redemption in coupon_redemptions table.

Tables: coupons (code, value_usd, scope, type, max_uses, times_used, valid_from, valid_until, is_active) and coupon_redemptions (coupon_id, user_id, value_applied, redeemed_at).

Endpoints

Endpoint Description
GET /coupons/available List available coupons for the authenticated user (global + user-specific).
GET /coupons/history User's coupon redemption history with values and timestamps.
POST /coupons/redeem Redeem a coupon code. Validates eligibility, adds credits, records redemption.

8. Credits

High-Level Overview

The Credits system manages the credit-based billing model that powers all inference usage. Users have a credit balance that decreases with each API call based on token usage and model pricing. Credits can be added through purchases (Stripe), admin grants, coupon redemptions, referral rewards, and trial allocations. The system provides balance checking, transaction history, and bulk credit operations.

Boundaries -- what Credits does NOT do:

  • Does not handle real-time pricing calculations during inference (that's the pricing service in the inference pipeline)
  • Does not enforce credit limits during streaming (credits are deducted after the response completes)
  • Does not provide credit expiration or time-limited credits
  • Does not support credit transfers between users
  • Does not integrate directly with Stripe (credit purchases go through the Payments system, which then calls credits)
  • Admin-only: all credit endpoints require admin authentication

Low-Level Technical Details

Tables: users (purchased_credits, subscription_allowance), credit_transactions (user_id, amount, transaction_type, description, balance_before, balance_after, metadata, created_by).

Transaction types: trial, purchase, api_usage, admin_credit, admin_debit, refund, bonus, transfer.

Summary endpoint: When system-wide (no user_id), fetches ALL users and ALL transactions for client-side aggregation. Performance warning -- no server-side aggregation.

Amount filtering caveat: When min_amount/max_amount are provided, the endpoint fetches ALL rows into Python memory and filters client-side.

No Redis, no Prometheus.

Endpoints

Endpoint Description
GET /credits/balance Current credit balance breakdown (purchased credits + subscription allowance).
GET /credits/summary Credit summary with usage breakdown. System-wide mode fetches all records.
GET /credits/transactions Paginated credit transaction history. Filters: user, type, date range, amount, direction.
POST /credits/add Add credits to an account.
POST /credits/adjust Adjust credits (positive or negative).
POST /credits/bulk-add Bulk add credits to multiple users.
POST /credits/refund Process a credit refund.

9. Diagnostics

High-Level Overview

The Diagnostics system provides real-time visibility into the gateway's operational state -- specifically concurrency pressure and provider response times. These endpoints are designed for debugging 503 errors (concurrency saturation) and identifying slow providers.

Boundaries -- what Diagnostics does NOT do:

  • Does not diagnose individual request failures (use error monitoring or activity logs for that)
  • Does not provide historical data or trends (only current/recent state)
  • Does not trigger alerts or remediation actions
  • Does not bypass ConcurrencyMiddleware (unlike /health and /metrics, diagnostics requests count against the concurrency limit)
  • No authentication required (public endpoints)

Low-Level Technical Details

Concurrency endpoint: Reads Prometheus gauges (concurrency_active_requests, concurrency_queued_requests) via internal ._value._value access. Status thresholds: >=90% utilization OR >=80% queue = "critical"; >=70%/>=60% = "warning". Default config: CONCURRENCY_LIMIT=20, CONCURRENCY_QUEUE_SIZE=50, CONCURRENCY_QUEUE_TIMEOUT=10.0s.

Provider timing: Reads provider_slow_requests_total Prometheus counter (labels: provider, model, severity). Slow threshold: 30-45s. Very slow: >45s.

No Supabase, no Redis. Reads only Prometheus in-memory metrics. Errors return 200 with degraded payload (no HTTPException).

Endpoints

Endpoint Description
GET /api/diagnostics/concurrency Active concurrency stats: in-flight requests, queue depth, shed count, utilization percentage, status (normal/warning/critical).
GET /api/diagnostics/provider-timing Provider response timing summary: counts of slow (30-45s) and very slow (>45s) requests by provider and model.

10. Error Monitoring

High-Level Overview

The Error Monitoring system provides comprehensive error detection, classification, pattern recognition, and AI-generated fix suggestions. It integrates with Grafana Loki for log ingestion, classifies errors by type and severity, identifies recurring patterns, assesses which errors are automatically fixable, and can generate code fix suggestions using the Anthropic API (Claude).

Boundaries -- what Error Monitoring does NOT do:

  • Does not automatically apply fixes (generates suggestions only; human review required)
  • Does not persist error data to a database (all error patterns stored in-memory, lost on restart)
  • Does not integrate with Sentry (Sentry is a separate error tracking system; this monitors Loki logs)
  • Does not monitor provider-level errors in real-time (scans logs on-demand or via autonomous background task)
  • Does not send alerts or notifications on critical errors
  • No authentication required on any endpoint (all public)

Low-Level Technical Details

Data source: Grafana Loki via httpx.AsyncClient (10s timeout). Query: {level="ERROR"} on /loki/api/v1/query_range.

Error classification pipeline: extract_error_details() -> classify_error() (pattern matching for provider/DB/auth/rate limit/timeout/cache/external errors) -> determine_fixability() -> group_similar_errors() (by {category}:{message[:50]}).

Severity mapping: Database = CRITICAL; provider timeout/5xx + auth + external = HIGH; rate limit/timeout/cache = MEDIUM; validation = LOW.

Fixability: Rate limit, timeout, cache, database, auth errors are assessed as fixable. Provider, validation, external service, internal errors are not.

Bug fix generator: Uses Anthropic API (claude-3-5-sonnet-20241022), requires ANTHROPIC_API_KEY and optional GITHUB_TOKEN.

Autonomous monitor: Background scanning singleton. In-memory state only.

No Supabase, no Redis, no Prometheus.

Endpoints

Endpoint Description
GET /error-monitor/autonomous/status Autonomous error monitor status (running, last scan time, error count).
GET /error-monitor/health Error monitor health check. Checks ErrorMonitor, AutonomousMonitor, and BugFixGenerator singletons.
GET /error-monitor/dashboard Dashboard data: error charts, stats, trends, severity breakdown.
GET /error-monitor/errors/recent Recent errors with severity, category, trace info.
GET /error-monitor/errors/critical Critical errors only (database-level failures).
GET /error-monitor/errors/fixable Errors assessed as automatically fixable.
GET /error-monitor/errors/patterns Detected error patterns (recurring issues grouped by category + message prefix).
GET /error-monitor/fixes/generated All AI-generated fix suggestions.
GET /error-monitor/fixes/{fix_id} Details of a specific fix suggestion.
POST /error-monitor/fixes/generate-for-error Generate an AI fix suggestion for a specific error using Claude.
POST /error-monitor/fixes/generate-batch Batch generate fix suggestions for multiple errors.
POST /error-monitor/monitor/start Start continuous background error monitoring.
POST /error-monitor/monitor/scan Trigger a one-time error scan against Loki.

11. General Router

High-Level Overview

The General Router provides ML-powered model selection for general (non-coding) tasks using NotDiamond integration. It analyzes prompt content and selects the best model optimized for the chosen strategy: quality, cost, latency, or balanced. Model IDs use the router:general:<mode> syntax which is intercepted by the inference pipeline.

Boundaries -- what General Router does NOT do:

  • Does not make inference requests (selects models; the inference pipeline dispatches)
  • Does not learn from user feedback or usage patterns (static mode-to-model mapping with NotDiamond fallback)
  • Does not support custom model pools or user-defined routing rules
  • Does not guarantee NotDiamond availability (falls back to mode-specific default models)
  • No database, Redis, or external calls from these info endpoints (entirely static/in-memory)

Low-Level Technical Details

Routing modes: Balanced (anthropic/claude-sonnet-4), Quality (openai/gpt-4o), Cost (openai/gpt-4o-mini), Latency (groq/llama-3.3-70b-versatile). System default fallback: anthropic/claude-sonnet-4.

NotDiamond integration: When notdiamond_enabled is true, prompts are analyzed by NotDiamond for optimal model selection. When unavailable, the system falls back to mode-specific fallback models.

Syntax normalization: normalize_model_string() handles hyphenated aliases (e.g., router:general:balanced = router:general-balanced).

Endpoints

Endpoint Description
GET /general-router/settings/options Available routing strategies and model pools per mode.
GET /general-router/models Models available for general routing with benchmark data.
GET /general-router/fallback-models Fallback model chain per routing mode.
GET /general-router/stats General router usage statistics.
POST /general-router/test Test general routing with a sample prompt. Returns selected model and reasoning.

12. Health & Monitoring

High-Level Overview

The Health & Monitoring system provides tiered health checks across all platform subsystems: the gateway itself, individual providers, models, databases, Redis, and gateways. It supports multiple check types from lightweight pings (for load balancers) to comprehensive deep checks (for dashboards). The system includes uptime tracking, health scoring (0-100), AI-generated health insights, and auto-fix capabilities for unhealthy gateways.

Boundaries -- what Health & Monitoring does NOT do:

  • Does not perform load testing or synthetic transactions (health checks verify connectivity, not throughput)
  • Does not send alerts or notifications on health degradation (only exposes data; alerting is in Grafana/PagerDuty)
  • Does not automatically reroute traffic away from unhealthy providers (that's the circuit breaker system)
  • Does not provide historical health data beyond what's cached (point-in-time checks)
  • Not all health endpoints are exempt from ConcurrencyMiddleware (only /health, /metrics, /ready are exempt; subsystem health endpoints are NOT)
  • Always returns HTTP 200 (degraded info in response body) to allow load balancers to distinguish healthy vs degraded routing

Low-Level Technical Details

Health check tiers: Quick (sub-millisecond, static response), Standard (DB + Redis connectivity), Railway (comprehensive: DB, Redis, providers), System (memory, CPU, connections), All (combined).

Provider health: Health scores 0-100 based on success rate, latency, and error patterns. Tracked per-provider with uptime history.

Model health: Per-model health status with history. Stored in model_health table.

Gateway health: Per-gateway health checks with auto-fix capability. Dashboard available in both HTML and JSON formats.

Health monitoring control: Start/stop background monitoring tasks, trigger on-demand checks.

Endpoints

System Health

Endpoint Description
GET /health Primary health check. Returns version, status, timestamp. Used by load balancers.
GET /health/quick Lightweight health check (minimal overhead, static response).
GET /health/railway Railway deployment health check (comprehensive: DB, Redis, providers).
GET /health/system System health: memory, CPU, connections.
GET /health/database Database connectivity and performance.
GET /health/all All health checks combined.
GET /health/status Current system status.
GET /health/summary Health summary with scores.
GET /health/uptime System uptime metrics.
GET /health/insights AI-generated health insights and recommendations.
GET /health/dashboard Health dashboard data.
GET /health/optimizations Current optimization status.
GET /health/optimizations/connection-pools Connection pool health.
GET /health/optimizations/prioritization Request prioritization stats.

Provider Health

Endpoint Description
GET /health/providers All provider health scores (0-100) and statuses.
GET /health/provider/{provider} Single provider health details.
GET /health/providers/stats Provider health statistics.
GET /health/providers/uptime Provider uptime history.
GET /health/providers/import-status Provider data import status.
GET /health/google-vertex Google Vertex AI specific health check.

Model Health

Endpoint Description
GET /health/models All model health scores.
GET /health/model/{model_id} Single model health details.
GET /health/models/stats Model health statistics.
GET /health/models/uptime Model uptime history.

Gateway Health

Endpoint Description
GET /health/gateways All gateway health checks.
GET /health/gateways/dashboard Gateway health dashboard (HTML).
GET /health/gateways/dashboard/data Gateway dashboard data (JSON).
GET /health/{gateway} Single gateway health check.
POST /health/gateways/{gateway}/fix Trigger auto-fix for an unhealthy gateway.

Catalog Health

Endpoint Description
GET /health/catalog/models Catalog model data health.
GET /health/catalog/providers Catalog provider data health.

Health Monitoring Control

Endpoint Description
GET /health/monitoring/status Active monitoring status.
POST /health/monitoring/start Start active health monitoring.
POST /health/monitoring/stop Stop active health monitoring.
POST /health/check Trigger a health check.
POST /health/check/now Trigger an immediate health check.

13. Metrics & Observability

High-Level Overview

The Metrics & Observability system provides the telemetry backbone for Gatewayz. It exposes Prometheus-compatible metrics for scraping, structured JSON metrics for dashboards, Grafana integration (SimpleJSON datasource protocol), OpenTelemetry tracing (Tempo), structured logging (Loki), and instrumentation health checks. The /metrics endpoint is the primary Prometheus scrape target, supporting both standard Prometheus text format and OpenMetrics format with exemplar support for trace-to-metric linking.

Boundaries -- what Metrics & Observability does NOT do:

  • Does not store metric history (Prometheus server handles retention; the gateway only exposes current values)
  • Does not provide alerting (alerting rules are configured in Grafana/Prometheus, not in the gateway)
  • Does not process or query traces (traces are sent to Tempo; querying is done via Grafana)
  • Does not aggregate metrics across multiple gateway instances (each instance exposes its own metrics)
  • Does not provide custom dashboards (Grafana dashboards are configured separately)

Low-Level Technical Details

GET /metrics: Core Prometheus scrape endpoint. Refreshes Redis INFO gauges via asyncio.to_thread(collect_redis_info) with 5s cap per scrape. Content negotiation: Accept: application/openmetrics-text enables OpenMetrics format with exemplar support for trace-to-metric linking via Grafana/Tempo. Hidden from OpenAPI docs.

GET /api/metrics/parsed: Self-calls http://localhost:8000/metrics via httpx.AsyncClient (10s timeout), parses Prometheus text format, computes latency percentiles (p50/p95/p99) via linear interpolation on histogram buckets, extracts request counts and error counts by endpoint.

Metrics status: Reports live vs synthetic mode based on Supabase availability (cached for 60s).

Redis: INFO command issued on each /metrics scrape. Timeout silently swallowed (stale values served).

Grafana SimpleJSON datasource: Implements the full protocol (test, search, query, annotations, tag-keys, tag-values) for direct Grafana integration.

Endpoints

Core Metrics

Endpoint Description
GET /metrics Raw Prometheus metrics. Supports OpenMetrics format with exemplars for trace-to-metric linking. Hidden from OpenAPI.
GET /api/metrics/parsed Structured JSON: latency percentiles (p50/p95/p99/avg), request counts, error counts by endpoint. Self-calls /metrics.
GET /api/metrics/status Metrics collection status (live vs synthetic mode).
GET /api/metrics/summary Metrics summary.
GET /api/metrics/health Metrics system health.
GET /api/metrics/grafana-queries Grafana-compatible query results.
POST /api/metrics/test Test metrics collection.

Monitoring API

Endpoint Description
GET /api/monitoring/health Provider health scores (0-100) with status per provider.
GET /api/monitoring/health/{provider} Single provider health.
GET /api/monitoring/stats/realtime Real-time stats: requests, cost, health, error rates, latency with hourly breakdown.
GET /api/monitoring/stats/hourly/{provider} Hourly stats for a specific provider.
GET /api/monitoring/error-rates Error rates by provider and model with trend detection.
GET /api/monitoring/errors/{provider} Recent error logs per provider.
GET /api/monitoring/cost-analysis Cost breakdown by provider with cost-per-request.
GET /api/monitoring/latency-trends/{provider} Latency percentiles (p50/p95/p99) over time.
GET /api/monitoring/latency/{provider}/{model} Latency stats for a specific model.
GET /api/monitoring/anomalies Anomaly detection: cost spikes, latency spikes, high error rates.
GET /api/monitoring/circuit-breakers Circuit breaker states per provider.
GET /api/monitoring/circuit-breakers/{provider} Circuit breaker for a specific provider.
GET /api/monitoring/providers/comparison Multi-provider comparison matrix.
GET /api/monitoring/token-efficiency/{provider}/{model} Token efficiency analysis.
GET /api/monitoring/trial-analytics Trial system analytics.
GET /api/monitoring/chat-requests Chat request monitoring.
GET /api/monitoring/chat-requests/counts Chat request counts.
GET /api/monitoring/chat-requests/models Chat requests by model.
GET /api/monitoring/chat-requests/providers Chat requests by provider.
GET /api/monitoring/chat-requests/plot-data Chat request time-series data.
POST /monitoring Sentry tunnel (proxies Sentry events from frontend).

Instrumentation

Endpoint Description
GET /api/instrumentation/health Instrumentation health.
GET /api/instrumentation/config Current instrumentation configuration.
GET /api/instrumentation/environment-variables Instrumentation env vars.
GET /api/instrumentation/loki/status Loki log aggregation status.
GET /api/instrumentation/tempo/status Tempo distributed tracing status.
GET /api/instrumentation/otel/status OpenTelemetry status.
GET /api/instrumentation/trace-context Current trace context.
POST /api/instrumentation/test-log Send test log to Loki.
POST /api/instrumentation/test-trace Send test trace to Tempo.

Prometheus/Grafana Datasource

Endpoint Description
GET /prometheus/datasource Grafana SimpleJSON datasource test endpoint.
POST /prometheus/datasource/search Metric name search.
POST /prometheus/datasource/query Metric query.
POST /prometheus/datasource/annotations Annotation query.
POST /prometheus/datasource/tag-keys Tag key query.
POST /prometheus/datasource/tag-values Tag value query.

Prometheus Data API

Endpoint Description
GET /prometheus/data/metrics Prometheus telemetry data.
GET /prometheus/data/admin/cache/status Cache status via Prometheus API.
DELETE /prometheus/data/admin/cache/invalidate Invalidate cache via Prometheus API.
GET /prometheus/data/instrumentation/health Instrumentation health via Prometheus.
GET /prometheus/data/instrumentation/loki/status Loki status via Prometheus.
GET /prometheus/data/instrumentation/tempo/status Tempo status via Prometheus.
POST /prometheus/data/instrumentation/test-log Test log via Prometheus.
POST /prometheus/data/instrumentation/test-trace Test trace via Prometheus.

14. Models & Catalog

High-Level Overview

The Models & Catalog system is the model discovery and management layer. It provides model search, comparison, trending analytics, provider information, and HuggingFace integration. Users can discover models across all 30+ providers, compare them across dimensions (price, speed, context length), view trending models, and access detailed HuggingFace metadata (downloads, likes, model cards). The system also includes a canonical model registry (Modelz), model availability monitoring with circuit breaker integration, model health tracking, and a ranking leaderboard.

Boundaries -- what Models & Catalog does NOT do:

  • Does not host or serve models (only catalogs and routes to external providers)
  • Does not benchmark models itself (uses external benchmark data: SWE-bench, HumanEval, MMLU)
  • Does not provide model fine-tuning or customization
  • Does not guarantee real-time model availability (availability is tracked via background monitoring with in-memory state)
  • Does not cache model responses (only caches model metadata/catalog data)
  • Availability data is in-memory only (lost on restart, not backed by Redis or Supabase)

Low-Level Technical Details

Model detail enrichment: Searches across all cached gateway model lists. Enriches with provider info (logos, site URLs) and HuggingFace data (downloads, likes). Redis-cached per gateway (5-15 min TTL).

Availability monitoring: ModelAvailabilityService singleton, entirely in-memory. Circuit breaker config: failure_threshold=5, recovery_timeout=300s, success_threshold=3, response_timeout=30s. Slow response (>30s): 3 consecutive triggers degradation.

Supported gateways (19): featherless, deepinfra, chutes, groq, fireworks, together, cerebras, nebius, xai, novita, hug, aimo, near, fal, anannas, aihubmix, vercel-ai-gateway, onerouter, helicone. OpenRouter always fetched as baseline.

Gateway registry: Frontend auto-discovers from GET /v1/gateways. Each gateway has a name, color, priority, and site URL.

Endpoints

Model Discovery

Endpoint Description
GET /v1/models List all models. Filter by provider, gateway, private, HuggingFace. Pagination supported.
GET /v1/models/unique Deduplicated model list (one entry per model across providers).
GET /v1/models/search Full-text model search.
GET /v1/models/trending Top models ranked by requests, tokens, users, cost, speed.
GET /v1/models/low-latency Low-latency optimized models.
GET /v1/models/{provider}/{model} Specific model details with provider info.
GET /v1/models/{provider}/{model}/compare Compare a model across all available providers.
GET /v1/models/{developer} Models by developer/organization.
POST /v1/models/batch-compare Batch compare multiple models at once.
GET /api/models/detail Detailed model info enriched with HuggingFace data. Public, no auth.

Modelz (Canonical Registry)

Endpoint Description
GET /v1/modelz/models Full canonical model registry.
GET /v1/modelz/ids All model IDs in the registry.
GET /v1/modelz/check/{model_id} Check if a model exists and get its registry data.

Providers

Endpoint Description
GET /v1/provider List all providers with stats.
GET /v1/provider/{provider_name}/stats Provider statistics (model count, request count, cost).
GET /v1/provider/{provider_name}/top-models Top models for a provider by usage.
GET /v1/routers Available intelligent routing options (code, general).

Gateways

Endpoint Description
GET /v1/gateways List all registered gateways. Frontend auto-discovers from this.
GET /v1/gateways/status Gateway statuses.
GET /v1/gateways/summary Aggregated gateway statistics.
GET /v1/gateway/{gateway}/stats Stats for a specific gateway.

HuggingFace Integration

Endpoint Description
GET /v1/huggingface/discovery Discover HuggingFace models.
GET /v1/huggingface/search Search HuggingFace models.
GET /v1/huggingface/author/{author}/models Models by a HuggingFace author.
GET /v1/huggingface/models/{model_id}/details HuggingFace model details (downloads, likes, parameters).
GET /v1/huggingface/models/{model_id}/card Model card (README).
GET /v1/huggingface/models/{model_id}/files Model file listing.

Model Health

Endpoint Description
GET /v1/model-health All model health data.
GET /v1/model-health/stats Model health statistics.
GET /v1/model-health/providers Provider-level health summary.
GET /v1/model-health/unhealthy Currently unhealthy models.
GET /v1/model-health/{provider}/{model} Health for a specific model.
GET /v1/model-health/provider/{provider}/summary Provider health summary.

Rankings

Endpoint Description
GET /ranking/models Model leaderboard with trend data (direction, percentage, logos).

Model Availability

Endpoint Description
GET /availability/models All model availability statuses with circuit breaker state. In-memory only.
GET /availability/model/{model_id} Availability for a specific model.
GET /availability/summary Availability summary (percentage, gateway breakdown).
GET /availability/status Overall availability status (operational/degraded).
GET /availability/check/{model_id} Quick availability check. Unknown models assumed available (optimistic).
GET /availability/fallback/{model_id} Fallback providers for a model.
GET /availability/best/{model_id} Best available provider for a model.
POST /availability/maintenance/{model_id} Put a model in maintenance mode.
DELETE /availability/maintenance/{model_id} Remove maintenance mode.
POST /availability/monitoring/start Start availability monitoring background task.
POST /availability/monitoring/stop Stop availability monitoring.

Models Catalog Management (CRUD)

Endpoint Description
GET /catalog/models-db/ List all models in the database catalog.
GET /catalog/models-db/{model_id} Get a model from the DB.
GET /catalog/models-db/search Search the model catalog DB.
GET /catalog/models-db/stats Catalog statistics.
GET /catalog/models-db/provider/{provider_slug} Models by provider in DB.
GET /catalog/models-db/health/{health_status} Models by health status in DB.
GET /catalog/models-db/{model_id}/health/history Model health history.
POST /catalog/models-db/bulk Bulk create models.
POST /catalog/models-db/bulk-upsert Bulk upsert models.
POST /catalog/models-db/upsert Upsert a single model.
POST /catalog/models-db/{model_id}/activate Activate a model.
POST /catalog/models-db/{model_id}/deactivate Deactivate a model.
PATCH /catalog/models-db/{model_id}/health Update model health status.

Providers Management (CRUD)

Endpoint Description
GET /providers/ List all providers in DB.
PATCH /providers/{provider_id} Update provider metadata.
GET /providers/{provider_id}/models/stats Provider model statistics.

15. Other

High-Level Overview

This section covers features that don't fit neatly into the other categories: user registration, image generation, audio transcription, server-side tools, payments (Stripe), IP allowlists, Nosana GPU computing, partner trials, provider credit monitoring, user notifications, and system utilities.

Boundaries -- what these features do NOT do (by sub-feature):

  • Image Generation: Does not host image models (proxies to Stability AI, DALL-E, etc.). Does not support image editing or inpainting. Credit-billed per generation.
  • Audio Transcription: Does not support real-time streaming transcription. Proxies to Whisper. Credit-billed per minute.
  • Tools: Only 2 server-side tools available (web_search via Tavily API, text_to_speech). Does not execute arbitrary code. Does not support user-defined tools.
  • Payments: Does not handle cryptocurrency payments. Stripe only. Does not support invoicing or payment plans.
  • Nosana GPU: Proxies all requests to the Nosana external API. Does not manage GPU hardware directly. Requires NOSANA_API_KEY.
  • Partner Trials: Does not support custom trial configurations per partner at runtime (configured in DB). Does not auto-extend trials.
  • IP Allowlist: Does not support IPv6 range matching. CIDR notation supported for IPv4.

Low-Level Technical Details

User Registration (POST /create): Creates user with $5 initial credits, 3-day trial, welcome email via Resend. Partner codes trigger PartnerTrialService. Referral codes update users.referred_by_code.

Image Generation (POST /v1/images/generations): Routes to image providers with credit billing based on model and resolution.

Audio Transcription (POST /v1/audio/transcriptions): Supports all major audio formats. Credit billing per minute of audio. Also accepts base64-encoded audio via /base64 endpoint.

Tools: Static tool registry with WebSearchTool (Tavily API, 30s timeout) and TextToSpeechTool. OpenAI-compatible function calling format. POST /v1/tools/search/augment provides web search context formatting for prompt injection.

Stripe Payments: Full checkout flow (create session -> webhook -> credit delivery). Webhook handles payment_intent.succeeded, charge.succeeded, invoice.paid, customer.subscription.created. Always returns 200 to Stripe (even on errors) for idempotency. Subscription management: upgrade, downgrade, cancel.

Nosana GPU: Pure proxy to Nosana external API (https://dashboard.k8s.prd.nos.ci/api/). Supports LLM, image generation, and Whisper deployments. Deployment lifecycle: create -> start -> stop -> archive. Job management: create, extend, stop.

Partner Trials: Configurable per partner (e.g., Redbeard: 14-day Pro trial, $100 credits, $50/day limit). 5-min in-memory cache per partner code. Status tracking in partner_trial_analytics table.

Velocity Mode: Auto-protection during high error rates. Activates when error rate exceeds 25% over 3-min window. Reduces rate limits to 50% of normal.

Endpoints

User Registration

Endpoint Description
POST /create Create new user account ($5 initial credits, 3-day trial, welcome email).

Image Generation

Endpoint Description
POST /v1/images/generations Generate images using AI models (Stability AI, DALL-E, etc.) with credit billing.

Audio Transcription

Endpoint Description
POST /v1/audio/transcriptions Transcribe audio files via Whisper. All major formats supported. Credit billing per minute.
POST /v1/audio/transcriptions/base64 Transcribe base64-encoded audio.

Server-Side Tools

Endpoint Description
GET /v1/tools List available server-side tools with OpenAI-compatible definitions. Public, no auth.
GET /v1/tools/definitions Tool definitions formatted for function calling. Returns raw list.
GET /v1/tools/{tool_name} Get a specific tool's details. 404 if not found.
POST /v1/tools/execute Execute a server-side tool (web_search via Tavily, text_to_speech). Requires API key auth.
POST /v1/tools/search/augment Web search + formatted context for prompt augmentation. Optional auth. Never raises HTTPException.

Payments (Stripe)

Endpoint Description
POST /api/stripe/checkout-session Create Stripe checkout session for credit purchase.
GET /api/stripe/checkout-session/{session_id} Get checkout session status. Proxies to Stripe API.
GET /api/stripe/credit-packages Available credit packages and pricing. Public, no auth.
POST /api/stripe/payment-intent Create payment intent.
GET /api/stripe/payment-intent/{payment_intent_id} Get payment intent status. Proxies to Stripe API.
GET /api/stripe/payments List user's payment history. Paginated.
GET /api/stripe/payments/{payment_id} Get payment details.
POST /api/stripe/refund Process a refund.
GET /api/stripe/subscription Get current subscription.
POST /api/stripe/subscription-checkout Create subscription checkout.
POST /api/stripe/subscription/upgrade Upgrade subscription plan.
POST /api/stripe/subscription/downgrade Downgrade subscription plan.
POST /api/stripe/subscription/cancel Cancel subscription.
POST /api/stripe/webhook Stripe webhook handler. Always returns 200. Handles: payment_intent.succeeded, charge.succeeded, invoice.paid, subscription.created.

IP Allowlist Management

Endpoint Description
POST /api/admin/ip-whitelist Create IP allowlist entry. Supports CIDR ranges. Admin only.
GET /api/admin/ip-whitelist/{entry_id} Get allowlist entry.
PUT /api/admin/ip-whitelist/{entry_id} Update allowlist entry.
DELETE /api/admin/ip-whitelist/{entry_id} Delete allowlist entry.
GET /api/admin/ip-whitelist List all allowlist entries.
POST /api/admin/ip-whitelist/check Check if an IP is allowlisted.

Nosana GPU Computing

Endpoint Description
GET /nosana/config Nosana platform configuration (deployment strategies, supported frameworks). Static/hardcoded.
GET /nosana/credits/balance Nosana credit balance. Proxies to Nosana API (120s read timeout).
GET /nosana/deployments List all Nosana deployments.
GET /nosana/deployments/{deployment_id} Deployment details.
POST /nosana/deployments/llm Deploy LLM inference on GPU (vllm, ollama, lmdeploy).
POST /nosana/deployments/image-generation Deploy image generation on GPU (stable-diffusion-webui).
POST /nosana/deployments/whisper Deploy Whisper transcription on GPU.
POST /nosana/deployments/{deployment_id}/start Start a deployment.
POST /nosana/deployments/{deployment_id}/stop Stop a deployment.
POST /nosana/deployments/{deployment_id}/archive Archive a deployment.
POST /nosana/deployments/{deployment_id}/revisions Create a deployment revision.
PATCH /nosana/deployments/{deployment_id}/replicas Update replica count.
GET /nosana/markets List GPU markets.
GET /nosana/markets/{market_id} Market details.
GET /nosana/markets/{market_id}/resources Market resource requirements.
POST /nosana/jobs Create a new GPU job.
GET /nosana/jobs/{job_address} Job details.
POST /nosana/jobs/{job_address}/extend Extend job duration.
POST /nosana/jobs/{job_address}/stop Stop a job.

Partner Trials

Endpoint Description
GET /partner-trials/config/{partner_code} Partner trial configuration. 5-min in-memory cache. Public.
GET /partner-trials/check/{code} Check if code is a partner code (vs user referral). Always returns 200.
GET /partner-trials/status Current user's partner trial status (active/expired/converted, days remaining, usage).
GET /partner-trials/daily-limit Partner trial daily usage limit info.
GET /partner-trials/analytics/{partner_code} Partner trial analytics.
POST /partner-trials/start Start a partner trial (e.g., Redbeard 14-day Pro).
POST /partner-trials/expire/{target_user_id} Force-expire a partner trial.

Provider Credit Monitoring

Endpoint Description
GET /api/provider-credits/balance All upstream provider credit balances.
GET /api/provider-credits/balance/{provider} Specific provider credit balance.

Notifications (User)

Endpoint Description
GET /user/notifications/preferences Get notification preferences.
POST /user/notifications/send-usage-report Send usage report email.
POST /user/notifications/test Send test notification.

System Utilities

Endpoint Description
GET /ping System ping (pong response with uptime).
GET /ping/stats Ping statistics.
GET /sentry-debug Test Sentry error tracking integration (intentionally raises an error).
GET /velocity-mode-status Security velocity mode status. Returns current error rate, threshold (25%), limits (normal vs velocity). In-memory only.
GET / Root endpoint (API info, version).

System & Cache Management

Endpoint Description
GET /health/gateways/optimized Optimized gateway health.
GET /health/models/optimized Optimized model health.
GET /health/providers/optimized Optimized provider health.
GET /health/dashboard/optimized Optimized dashboard data.

16. Status

High-Level Overview

The Status system provides public-facing status information about model and provider availability. It includes both a lightweight operational check (operational/degraded) and a detailed infrastructure status view with concurrency metrics, circuit breaker states, and database/cache connectivity. These endpoints are designed for status pages and external monitoring.

Boundaries -- what Status does NOT do:

  • Does not provide incident management or incident history (that's the Admin downtime tracking system)
  • Does not send status notifications or updates (only exposes current state)
  • Does not provide per-user status (only system-wide)
  • Always returns HTTP 200 (even in error states), reporting degradation in the response body
  • Does not track status history or provide uptime percentages over time

Low-Level Technical Details

GET /availability/status: Simple "operational" (>90% availability) or "degraded" (<=90%) based on ModelAvailabilityService in-memory data. Always returns 200.

GET /v1/status/detailed: Combines Prometheus concurrency gauges, Redis circuit breaker states per provider (circuit_breaker:{provider}:*, 5 keys per provider, 3600s TTL), and infrastructure connectivity (Supabase client global, Redis health cache GET health:system with 360s TTL). Status = "normal" if active requests < 15, else "high_load". Graceful degradation on all errors.

Endpoints

Endpoint Description
GET /v1/status/ Overall system status (operational/degraded).
GET /v1/status/detailed Detailed status: concurrency, circuit breakers, infrastructure (database, cache).
GET /v1/status/providers Provider availability statuses.
GET /v1/status/models Model availability statuses.
GET /v1/status/models/{provider}/{model_id} Specific model status.
GET /v1/status/incidents Recent incidents.
GET /v1/status/uptime/{provider}/{model_id} Model uptime history.
GET /v1/status/search Search models on status page.
GET /v1/status/stats Status page statistics.

17. Users

High-Level Overview

The Users system provides user-facing account management: profile viewing/editing, credit balance checking, plan/subscription info, rate limit visibility, activity logs, API key management, referral codes, and account deletion. This is the user's self-service interface for managing their Gatewayz account.

Boundaries -- what Users does NOT do:

  • Does not provide admin-level user management (that's the Admin system)
  • Does not handle authentication (that's the Auth system; Users requires an already-authenticated API key)
  • Does not process payments (that's Stripe; Users only views balance and plan info)
  • Does not mask API keys in list responses (full plaintext keys are returned)
  • Does not provide cross-user visibility (each user can only see their own data)
  • Does not support team/organization accounts (single-user only)

Low-Level Technical Details

Auth chain: get_current_user -> get_api_key (Bearer token + validate_api_key_security) -> get_user (5-min in-memory TTLCache, 512 entries max) -> validate_trial_expiration (HTTP 402 if expired).

API key creation (POST /user/api-keys): Rate limited at 10 creations/hour per user_id (sliding window). Key format: gw_{env}_{random43chars} (e.g., gw_live_abc123...). Stored with Fernet AES-128 encryption + SHA-256 HMAC hash + last4 tracking. Plan entitlement enforcement caps max_requests. Schema cache error handling: PGRST204 triggers refresh_postgrest_schema_cache RPC and retry.

Activity stats: Queries activity_log table with user_id + date range (default 30 days, no pagination limit). Client-side aggregation by date/model/provider.

Activity log caveat: The total field in the activity log response is the count of returned records, NOT the total in the database.

Audit logging: audit_logger.log_api_key_usage() fires on every authenticated call. ObservabilityMiddleware records http_requests_total per endpoint.

Endpoints

Profile & Balance

Endpoint Description
GET /user/profile User profile (email, username, credits, trial status, plan).
PUT /user/profile Update user profile (email, username, preferences). Email uniqueness enforced.
GET /user/balance Current credit balance and status.
GET /user/monitor User's own usage monitoring data.
GET /user/plan Current subscription plan.
GET /user/plan/entitlements Plan entitlements (what the plan includes).
GET /user/plan/usage Plan usage vs limits.
GET /user/limit Daily spending limit.
GET /user/credit-transactions Credit transaction history.
GET /user/environment-usage Usage by environment (live/test/staging/dev).
GET /user/cache-settings User's cache settings.
DELETE /user/account Delete user account (irreversible).

Activity

Endpoint Description
GET /user/activity/stats Activity statistics: total requests/tokens/spend, daily breakdown, by model/provider. Default: last 30 days. No pagination limit.
GET /user/activity/log Paginated activity log (limit 1-1000). Filters: date range, model, provider (exact match). Note: total field is current page count only.

API Key Management

Endpoint Description
POST /user/api-keys Create a new API key. Rate limited: 10/hour. Fernet encrypted. Key shown once.
GET /user/api-keys List all API keys (full plaintext key strings, not masked).
GET /user/api-keys/usage API key usage statistics.
GET /user/api-keys/audit-logs API key audit logs.
PUT /user/api-keys/{key_id} Update API key (name, active status).
DELETE /user/api-keys/{key_id} Delete an API key.

Rate Limits (User)

Endpoint Description
GET /user/rate-limits User's rate limit configuration and current usage.
GET /user/rate-limits/usage/{key_id} Rate limit usage for a specific key.
PUT /user/rate-limits/{key_id} Update rate limit for a key.
POST /user/rate-limits/bulk-update Bulk update rate limits.

Plans & Trials

Endpoint Description
GET /plans List all available subscription plans.
GET /plans/{plan_id} Plan details.
GET /subscription/plans Subscription plans (alternate path).
GET /trial/status Current trial status (active/expired, days remaining).

Referrals

Endpoint Description
GET /referral/code Get user's referral code.
POST /referral/generate Generate a new referral code.
GET /referral/stats Referral statistics (total referred, conversion rate, rewards earned). Redis cached (5-min).
POST /referral/validate Validate and apply a referral code.

Transaction Analytics

Endpoint Description
GET /analytics/transactions Transaction analytics. Proxies to OpenRouter frontend API using OPENROUTER_COOKIE env var. 503 if cookie not set.
GET /analytics/transactions/summary Transaction summary.

Feature Summary

Category Endpoints Key Capabilities
Admin 80+ User management, credits, caches, model sync, rate limits, RBAC, trials, downtime, coupons, notifications
Analytics 5 Server-side event forwarding (Statsig + PostHog), ad-blocker bypass
Authentication 5 Multi-method auth (Privy, Google, GitHub, email, wallet), auto-registration
Chat & Messaging 20+ OpenAI/Anthropic inference, 30+ provider failover, streaming, sessions, history, feedback, sharing
Circuit Breakers 4 Provider circuit breaker monitoring (CLOSED/OPEN/HALF_OPEN), reset controls
Code Router 5 Benchmark-driven code model selection (SWE-bench, HumanEval), 4 modes
Coupons 3 User coupon redemption, credit bonuses
Credits 7 Balance, transactions, add/adjust/refund/bulk operations
Diagnostics 2 Real-time concurrency and provider timing diagnostics
Error Monitoring 13 Autonomous error detection, Loki integration, AI fix generation (Claude)
General Router 5 ML-powered model selection (NotDiamond), 4 optimization modes
Health & Monitoring 30+ Multi-tier health: system, providers, models, gateways, auto-fix
Metrics & Observability 40+ Prometheus, Grafana, OpenTelemetry, Loki, Tempo, anomaly detection
Models & Catalog 50+ Discovery, search, compare, trending, HuggingFace, availability, CRUD
Other 50+ Images, audio, tools, Stripe, IP allowlists, Nosana GPU, partner trials, notifications
Status 9 Public status page, provider/model availability, incident history
Users 25+ Profile, balance, plan, API keys, rate limits, activity, referrals
Total 450+

Source: API Mappings Wiki | Conceptual Model

Clone this wiki locally