Mux

A thin model routing and policy layer for agent runtimes.

Mux sits in front of existing model gateways/providers and makes system-driven routing decisions based on policy, not per-user micromanagement. It is designed to work across multiple agent runtimes (OpenClaw, pi-mono/Max) while emitting rich routing telemetry to AgentWeave.

Why Mux exists

Strong models are expensive. Cheap models are often good enough.

In practice, personal agent stacks end up with the same problem:

one runtime uses a strong model by default
another runtime has a different provider abstraction
fallbacks and routing are hard to reason about
token/cost usage becomes visible only after the bill hurts

Mux is the control point for that problem.

What Mux provides

an OpenAI-compatible /v1/chat/completions endpoint
native POST /v1/responses passthrough for responses-capable downstreams
a configurable policy-based routing rules
support for routing across models and providers
fallback and escalation handling
structured routing metadata for observability
an explain/dry-run route endpoint for debugging policy decisions

Architecture

Mux sits between heterogeneous agent runtimes and heterogeneous model backends. One OpenAI-compatible endpoint, a policy layer, and a downstream dispatcher selected via DOWNSTREAM_MODE.

Inside src/downstream.ts, Mux translates OpenAI chat-completion shape to and from the Anthropic Messages API — content blocks (text + image), tools, and stop reasons all map across:

End-to-end, a single streaming request flows through validation, routing, the Anthropic SDK, and an event mapper that rewrites Anthropic stream events as OpenAI SSE chunks on the way back to the client:

Getting started

git clone https://github.com/arniesaha/mux.git
cd mux
npm install
cp .env.example .env
npm run dev

Server starts on http://localhost:8787 by default.

Configure your downstream

Option 1 — OpenAI-compatible backend (e.g. LiteLLM, OpenRouter)

DOWNSTREAM_MODE=openai-compatible
DOWNSTREAM_BASE_URL=https://your-gateway.com/v1   # your proxy/gateway URL
DOWNSTREAM_API_KEY=sk-...                          # your API key
DOWNSTREAM_AUTH_MODE=bearer                         # bearer | x-api-key | passthrough | none
DOWNSTREAM_EXTRA_HEADERS={}
DOWNSTREAM_TIMEOUT_MS=30000

Option 2 — Anthropic SDK (direct OAuth tokens)

DOWNSTREAM_MODE=anthropic-sdk
ANTHROPIC_OAUTH_TOKEN=sk-ant-oat01-...            # Anthropic OAuth token
ANTHROPIC_BASE_URL=https://api.anthropic.com        # or your proxy URL
DOWNSTREAM_TIMEOUT_MS=30000

Note: anthropic-sdk mode still accepts OpenAI-compatible chat requests only. Native /v1/responses is currently supported for openai-compatible downstreams that expose a real Responses API.

Routing behavior

Mux routes requests using configurable policy rules:

Request type	Resolved model
Short lightweight prompts (< 80 chars, no task cues)	`claude-haiku-4-5-20251001`
Coding / debugging / execution cues	`claude-sonnet-4-6`
Complex reasoning / planning / architecture cues	`claude-opus-4-6`
`gpt-4o` (simple prompts)	downgraded to `gpt-4o-mini`

Routing for Max runtime requests is evaluated on the last user message only — system prompts and conversation history are ignored to prevent false escalation.

Route decisions are logged with: runtime, requestedModel, resolvedModel, routeReason, provider, backendTarget.

Declarative routing rules

If you want something more flexible than the built-in heuristics in src/policy.ts, set ROUTING_RULES to an ordered JSON array. First match wins.

Supported fields per rule:

id — required unique identifier
protocols — optional array of chat_completions / responses
runtime — optional string or string[]
requestedModel — optional string or string[]
promptIncludesAny — optional string[] keyword match on the last user message
maxPromptLength — optional upper bound for last-user-message length
resolvedModel — required routed model
routeReason — optional custom reason string

Example:

ROUTING_RULES='[
  {
    "id": "simple-openclaw-gpt4o",
    "runtime": "openclaw",
    "requestedModel": "gpt-4o",
    "maxPromptLength": 120,
    "resolvedModel": "gpt-4o-mini"
  }
]'

Config rules are applied after MODEL_MAP / ANTHROPIC_MODEL_MAP and before the built-in heuristics, so explicit overrides still win and defaults remain as fallback behavior.

Explain / dry-run route decisions

Mux exposes a no-downstream debug endpoint:

curl -s http://localhost:8787/v1/route/resolve \
  -H 'content-type: application/json' \
  -H 'x-runtime: openclaw' \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"say hi"}]}' | jq

For Responses-style callers:

curl -s http://localhost:8787/v1/route/resolve \
  -H 'content-type: application/json' \
  -d '{"protocol":"responses","model":"gpt-4o","input":[{"role":"user","content":[{"type":"input_text","text":"say hi"}]}]}' | jq

This returns the resolved route metadata, including matchedRuleId when a declarative rule fired.

Enable native Responses routing

If you want Mux to accept POST /v1/responses for a legacy/default openai-compatible downstream, set:

DOWNSTREAM_PROTOCOLS=chat_completions,responses

Providers configured via PROVIDERS can also declare protocols: ["chat_completions", "responses"].

Example requests

curl -s http://localhost:8787/v1/chat/completions \
  -H 'content-type: application/json' \
  -H 'x-runtime: openclaw' \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "say hi"}]}' | jq

curl -s http://localhost:8787/v1/responses \
  -H 'content-type: application/json' \
  -H 'x-runtime: openclaw' \
  -d '{"model": "gpt-4o", "input": [{"role": "user", "content": [{"type": "input_text", "text": "say hi"}]}]}' | jq

Environment variables

Variable	Default	Description
`PORT`	`8787`	Server port
`NODE_ENV`	`development`	Environment
`MODEL_MAP`	`{}`	JSON map of `requestedModel → resolvedModel`
`ANTHROPIC_MODEL_MAP`	`{}`	Anthropic-only routing overrides
`DOWNSTREAM_MODE`	`openai-compatible`	`openai-compatible` or `anthropic-sdk`
`DOWNSTREAM_BASE_URL`	—	Backend URL (openai-compatible mode)
`DOWNSTREAM_API_KEY`	—	API key for downstream auth
`DOWNSTREAM_AUTH_MODE`	`bearer`	`bearer` \| `x-api-key` \| `passthrough` \| `none`
`DOWNSTREAM_EXTRA_HEADERS`	`{}`	JSON map of extra static headers
`DOWNSTREAM_TIMEOUT_MS`	`30000`	Request timeout in ms
`DOWNSTREAM_PROTOCOLS`	`chat_completions`	Comma-separated legacy/default downstream protocols (`chat_completions`, `responses`)
`DOWNSTREAM_MOCK_FALLBACK`	`true` (dev)	Return mock response when no backend configured
`ROUTING_RULES`	`[]`	Ordered JSON array of declarative routing rules applied before built-in heuristics
`ANTHROPIC_OAUTH_TOKEN`	—	OAuth token (preferred for anthropic-sdk)
`ANTHROPIC_API_KEY`	—	API key fallback for anthropic-sdk
`ANTHROPIC_BASE_URL`	—	Override Anthropic API URL (supports proxies)
`MUX_ANTHROPIC_PROMPT_CACHE`	`true`	Inject Anthropic ephemeral `cache_control` breakpoints on the translated request (system, tools, history)
`TRACE_PROMPT_PREVIEW_ENABLED`	`false`	Enable prompt preview text on tracing span attrs (opt-in; disabled by default for safety)
`TRACE_PROMPT_PREVIEW_REDACTED_VALUE`	`[redacted]`	Value written to trace attrs when prompt preview tracing is disabled

Prompt caching (Anthropic)

When the downstream is Anthropic, Mux transparently injects ephemeral cache_control breakpoints on the translated request so multi-turn agents get a 90% discount on re-used input tokens (5-minute TTL).

What gets cached

the translated system prompt (one breakpoint on the single text block),
the full tools block (breakpoint on the last tool),
the conversation history (breakpoint on the last content block of the last message; skipped when there's only one turn).

Reading cache stats. Anthropic's cache_read_input_tokens is surfaced on responses as usage.prompt_tokens_details.cached_tokens, and both cache-read and cache-creation tokens are rolled into usage.prompt_tokens so billable-prompt size reflects the true transcript:

{
  "usage": {
    "prompt_tokens": 5400,
    "completion_tokens": 20,
    "total_tokens": 5420,
    "prompt_tokens_details": { "cached_tokens": 5000 }
  }
}

Cost model. Anthropic charges a 25% surcharge on cache writes and a 90% discount on cache reads, with a 5-minute TTL. For a client that re-uses the same system + tools across turns, turn 1 pays the write surcharge and turns 2+ hit cache — net cost drops sharply within a single session.

Opt out. Set MUX_ANTHROPIC_PROMPT_CACHE=false to disable.

Running tests

npm test

Architecture diagrams

Diagram sources are in docs/diagrams/ as .excalidraw files. Re-render with the excalidraw-diagram-skill.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
.github/workflows		.github/workflows
docs		docs
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md
SECURITY.md		SECURITY.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsconfig.test.json		tsconfig.test.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mux

Why Mux exists

What Mux provides

Architecture

Getting started

Configure your downstream

Option 1 — OpenAI-compatible backend (e.g. LiteLLM, OpenRouter)

Option 2 — Anthropic SDK (direct OAuth tokens)

Routing behavior

Declarative routing rules

Explain / dry-run route decisions

Enable native Responses routing

Example requests

Environment variables

Prompt caching (Anthropic)

Running tests

Architecture diagrams

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mux

Why Mux exists

What Mux provides

Architecture

Getting started

Configure your downstream

Option 1 — OpenAI-compatible backend (e.g. LiteLLM, OpenRouter)

Option 2 — Anthropic SDK (direct OAuth tokens)

Routing behavior

Declarative routing rules

Explain / dry-run route decisions

Enable native Responses routing

Example requests

Environment variables

Prompt caching (Anthropic)

Running tests

Architecture diagrams

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages