Summary
Extend Mux from single-provider routing (Anthropic OAuth only) to true multi-provider, policy-driven routing across multiple LLM backends. The policy engine should decide both which model and which provider to use per request, based on prompt complexity, model availability, cost, and caller intent.
Current State
- Mux routes all requests through a single downstream path:
DOWNSTREAM_MODE=anthropic-sdk via Anthropic OAuth
- The policy engine (
src/policy.ts) selects between Haiku/Sonnet/Opus based on prompt analysis (3-tier routing)
RouteDecision.provider and RouteDecision.backendTarget fields exist but are hardcoded to config.defaultProvider / config.defaultBackendTarget — never used for actual dispatch
callDownstream() makes a binary choice based on the global DOWNSTREAM_MODE config, not per-request
Target Providers
| Provider |
Auth |
Backend |
Models |
| Anthropic OAuth |
OAuth token (sk-ant-oat01-...) |
agentweave-proxy → Anthropic API |
Claude Haiku, Sonnet, Opus |
| LiteLLM |
API key or none |
LiteLLM gateway (OpenAI-compatible) |
Any model LiteLLM supports |
| OpenAI |
OAuth or API key |
OpenAI API directly |
GPT-4o, GPT-4o-mini, o1, o3 |
| Google Gemini |
API key |
Gemini API (or via LiteLLM) |
Gemini 2.5 Pro, Flash |
Design Goals
- Policy-driven provider selection:
resolveRoute() returns a provider that determines which backend handles the request — not a global config switch
- Provider registry: Each provider has its own auth config, base URL, and handler. Adding a new provider should be a config + registry entry, not a code change
- Fallback chains: If provider A is unavailable (rate limit, timeout, error), try provider B
- Cost-aware routing: Route to cheaper providers/models for simple queries, premium for complex
- Model availability awareness: If a model isn't available on the preferred provider, route to one that has it
- Unified observability: All providers emit the same tracing attributes so the AgentWeave dashboard shows routing decisions across providers
Architecture Sketch
Request → Policy Engine → RouteDecision { model, provider, backendTarget }
↓
Provider Registry
┌──────────────────┐
│ anthropic-oauth │ → Anthropic SDK (existing)
│ litellm │ → OpenAI-compatible HTTP
│ openai │ → OpenAI SDK (new)
│ gemini │ → Gemini SDK or via LiteLLM
└──────────────────┘
Implementation Phases
Phase 1: Provider abstraction
- Define a
Provider interface: { name, canHandle(model), call(req, route), stream(req, route, res) }
- Refactor existing Anthropic SDK path into a provider
- Refactor existing OpenAI-compatible path into a provider
- Make
callDownstream() dispatch based on route.provider → provider registry
Phase 2: LiteLLM integration
- Add LiteLLM as a provider (OpenAI-compatible endpoint)
- Configure model → provider mapping in env/config
- Route non-Anthropic models to LiteLLM
Phase 3: Direct OpenAI + Gemini
- Add OpenAI SDK provider (direct, not through LiteLLM)
- Add Gemini provider (direct or via LiteLLM)
- OAuth support for OpenAI
Phase 4: Fallback chains + cost routing
- Provider health tracking (error rates, latency)
- Fallback chain config:
anthropic → litellm → openai
- Cost-weighted routing for equivalent models across providers
Config Vision
# Provider registry (JSON)
PROVIDERS={"anthropic-oauth":{"type":"anthropic-sdk","baseUrl":"http://192.168.1.70:30400","oauthToken":"sk-ant-..."},"litellm":{"type":"openai-compatible","baseUrl":"http://127.0.0.1:4001/v1"},"openai":{"type":"openai-sdk","apiKey":"sk-..."},"gemini":{"type":"openai-compatible","baseUrl":"http://127.0.0.1:4001/v1"}}
# Model → provider mapping
MODEL_PROVIDERS={"claude-*":"anthropic-oauth","gpt-*":"openai","gemini-*":"gemini"}
# Fallback chains
PROVIDER_FALLBACKS={"anthropic-oauth":"litellm","openai":"litellm"}
Related
Summary
Extend Mux from single-provider routing (Anthropic OAuth only) to true multi-provider, policy-driven routing across multiple LLM backends. The policy engine should decide both which model and which provider to use per request, based on prompt complexity, model availability, cost, and caller intent.
Current State
DOWNSTREAM_MODE=anthropic-sdkvia Anthropic OAuthsrc/policy.ts) selects between Haiku/Sonnet/Opus based on prompt analysis (3-tier routing)RouteDecision.providerandRouteDecision.backendTargetfields exist but are hardcoded toconfig.defaultProvider/config.defaultBackendTarget— never used for actual dispatchcallDownstream()makes a binary choice based on the globalDOWNSTREAM_MODEconfig, not per-requestTarget Providers
sk-ant-oat01-...)Design Goals
resolveRoute()returns aproviderthat determines which backend handles the request — not a global config switchArchitecture Sketch
Implementation Phases
Phase 1: Provider abstraction
Providerinterface:{ name, canHandle(model), call(req, route), stream(req, route, res) }callDownstream()dispatch based onroute.provider→ provider registryPhase 2: LiteLLM integration
Phase 3: Direct OpenAI + Gemini
Phase 4: Fallback chains + cost routing
anthropic → litellm → openaiConfig Vision
Related
10ce57b