Summary
We want to integrate Chutes models into vllm-proxy while preserving the current production-stable behavior for existing users.
Instead of modifying the current /v1/chat/completions path, add a dedicated Chutes route:
POST /v1/chutes/chat/completions
GET /v1/chutes/models
This lets upstream gateways (e.g., Redpill) continue receiving standard OpenAI-style /v1/chat/completions requests and selectively forward Chutes-bound traffic to the new sidecar route.
Goals
- Keep existing
/v1/chat/completions behavior unchanged
- Minimize coupling/risk to current production path
- Add Chutes integration behind explicit route + config flag
Proposed behavior
Existing route (unchanged)
POST /v1/chat/completions
- continues to forward to local vLLM backend as today
New Chutes route
POST /v1/chutes/chat/completions
- reuses existing E2EE parse/decrypt logic on ingress
- forwards plaintext request to Chutes OpenAI-compatible endpoint over TLS:
${CHUTES_BASE_URL}/v1/chat/completions
- with
Authorization: Bearer ${CHUTES_API_KEY}
- reuses existing E2EE encrypt logic on egress
New Chutes models route
GET /v1/chutes/models
- proxies
${CHUTES_BASE_URL}/v1/models with Chutes auth header
Config
CHUTES_ENABLED (default false)
CHUTES_BASE_URL (default https://llm.chutes.ai)
CHUTES_API_KEY (required if enabled)
Security note
This is not pure client-to-model cryptographic E2EE.
It is a practical TEE-mediated segmented model:
- Client ↔ Proxy: existing E2EE
- Proxy ↔ Chutes: TLS
- Plaintext is only visible inside trusted runtime boundaries
Acceptance criteria
- Existing route behavior remains unchanged
- Chutes route supports stream + non-stream
- Existing E2EE nonce/replay checks remain in effect
- Misconfig returns explicit 503 errors
- No API-key leakage in logs
Summary
We want to integrate Chutes models into
vllm-proxywhile preserving the current production-stable behavior for existing users.Instead of modifying the current
/v1/chat/completionspath, add a dedicated Chutes route:POST /v1/chutes/chat/completionsGET /v1/chutes/modelsThis lets upstream gateways (e.g., Redpill) continue receiving standard OpenAI-style
/v1/chat/completionsrequests and selectively forward Chutes-bound traffic to the new sidecar route.Goals
/v1/chat/completionsbehavior unchangedProposed behavior
Existing route (unchanged)
POST /v1/chat/completionsNew Chutes route
POST /v1/chutes/chat/completions${CHUTES_BASE_URL}/v1/chat/completionsAuthorization: Bearer ${CHUTES_API_KEY}New Chutes models route
GET /v1/chutes/models${CHUTES_BASE_URL}/v1/modelswith Chutes auth headerConfig
CHUTES_ENABLED(default false)CHUTES_BASE_URL(defaulthttps://llm.chutes.ai)CHUTES_API_KEY(required if enabled)Security note
This is not pure client-to-model cryptographic E2EE.
It is a practical TEE-mediated segmented model:
Acceptance criteria