-
Notifications
You must be signed in to change notification settings - Fork 12
background
This page records the design decisions behind DroidProxy and the danger zones that come with them. It is meant for anyone changing the request path, the bundled backend config, or the JSON-editing helpers. Several of these decisions look removable until you understand why they exist, so the goal here is to explain the "why" and flag the changes that will quietly break Claude prompt caching, Anthropic request validation, or Gemini streaming if you make them.
Related pages:
../systems/thinking-proxy.md,
../how-to-contribute/patterns-and-conventions.md,
../reference/configuration.md,
../lore.md.
DroidProxy ships a bundled backend (cli-proxy-api, on 127.0.0.1:8318)
and puts its own proxy, ThinkingProxy, in front of it on 127.0.0.1:8317.
ServerManager (src/Sources/ServerManager.swift) supervises the backend as a
child process; ThinkingProxy (src/Sources/ThinkingProxy.swift) is the
user-facing listener that Droid CLI actually talks to.
The split exists so DroidProxy can inspect and rewrite requests and responses without forking the backend. The backend handles OAuth, provider routing, and upstream transport; the proxy handles the small set of host-specific tweaks (Anthropic-Beta header rewriting, Gemini path rewriting, fast-mode service tier, Claude thinking-block sanitizing). Keeping those tweaks in a thin Swift layer means the vendored Go binary stays unmodified and replaceable.
This is the single most important "do not re-add this" note in the codebase.
Reasoning effort used to be owned by the proxy. The proxy would inject
thinking, reasoning, reasoning_effort, output_config, budget_tokens,
and generationConfig.thinkingConfig into request bodies, and it parsed
"advanced variant" model suffixes like claude-opus-4-8(high) or
gpt-5.2(xhigh) to pick a level. It also had a "Max Budget Mode" override.
All of that was removed. Reasoning is now owned by Droid CLI. Each model is
registered with native reasoning metadata (enableThinking,
supportedReasoningEfforts, defaultReasoningEffort, reasoningEffort) in
src/Sources/DroidProxyModelCatalog.swift, so Droid's per-session selector
exposes every level a model supports and sends the chosen value in the request
body. The proxy forwards that body unchanged.
What this means for anyone editing ThinkingProxy:
- Do not re-introduce injection of
thinking,reasoning,reasoning_effort,output_config,budget_tokens, orgenerationConfig.thinkingConfig. Droid CLI already sets these. Re-adding proxy-side injection would double-write or override the user's selected effort. - Do not re-add model-suffix parsing for advanced variants. Every level now ships in the single base catalog entry.
The proxy still reads a few fields (model, thinking.type, service_tier)
for routing and header decisions, but it does not author reasoning fields. The
AGENTS.md "what it no longer does" list is the authoritative inventory of
removed behavior.
ThinkingProxy edits request bodies by scanning the raw JSON string and
splicing at byte ranges. It never round-trips through JSONSerialization.
The reason is Anthropic's prompt cache. JSONSerialization.data(...) reorders
object keys, and any change in key order changes the cache key, causing cache
misses on large prompts. The helpers findTopLevelFieldLocations,
consumeJSONValue, parseJSONStringToken, and friends locate exact value
ranges so edits (for example inserting ,"service_tier":"priority" right after
the model value) preserve original ordering.
Danger zones:
- Do not switch any request-body edit to a serialize/deserialize round-trip.
This is also called out as a hard convention in
AGENTS.md. - The top-level field scan is intentionally narrow.
routingInspectionKeysis a small set (model,service_tier,thinking) so the scan can early-exit before it has to traverse the potentially hugemessagesarray on most requests. Debug-only keys are scanned separately and only when a log line is about to be written. Widening the routing key set or scanningmessageseagerly would add parse cost to every request.
Two distinct pieces of Claude-specific handling live in the proxy.
Anthropic-Beta rewrite. When a Claude request has thinking.type of
enabled, adaptive, or auto, the proxy strips
redact-thinking-2026-02-12 from the Anthropic-Beta header and appends the
visible-thinking beta list (interleaved-thinking, prompt-caching-scope,
fast-mode, and others in Config.claudeVisibleThinkingBetas). Without this
rewrite, Claude emits only signed empty thinking blocks instead of visible
reasoning, so the header surgery is necessary, not cosmetic.
Thinking-block sanitization. ClaudeThinkingBlockSanitizer
(src/Sources/ClaudeThinkingBlockSanitizer.swift) strips stale thinking and
redacted_thinking blocks from prior assistant turns before forwarding. It
preserves thinking on the latest assistant turn that has trailing tool_results
(that turn's thinking is still live), but removes it from earlier turns, which
Anthropic otherwise rejects or mishandles. A subtle constraint: if stripping
would leave "content":[], the sanitizer substitutes a placeholder content
array ([{"type":"text","text":"..."}]) rather than emptying it, because an
empty content array is invalid and dropping the message would break
user/assistant role alternation.
src/Sources/Resources/config.yaml carries a few non-default choices, each tied
to a real failure mode:
-
disable-cooling: true— the backend's global auth/model cooldown scheduling is turned off. Issue #57 describes long Factory/Droid sessions hitting blackout windows where every Codex/Claude/Gemini auth reportsauth_unavailableafter a single transient failure. Disabling cooling avoids that whole-fleet blackout. -
routing.session-affinity: true(withsession-affinity-ttl: "2h") — pins a session to one upstream auth. Issue #58 covers stateful Responses API traffic (encrypted reasoning state,previous_response_id) breaking when a follow-up turn round-robins to a different account. Affinity keeps the conversation on the account that holds its state. -
request-timeout: "10m"— extended-thinking requests can run many minutes, so the upstream timeout is raised well above a typical HTTP default.
These are deliberate. Reverting any of them re-opens the linked issue.
The Responses API endpoint (/v1/responses, /api/v1/responses) is not
supported by every Gemini path, so the proxy rewrites it conditionally.
- OAuth Code Assist Gemini models (the
-preview-suffixed names served by thegemini-cliexecutor) cannot use the Responses API, so their/v1/responsesis rewritten to/v1/chat/completions. The check isisOAuthCodeAssistGeminiModel, which matches agemini-prefix and a-previewsuffix. - Antigravity-routed Gemini models (for example
gemini-3-flash,gemini-pro-agent) do support/v1/responsesnatively and must not be rewritten. Rewriting them would make the backend return chat-completions SSE that Droid CLI cannot parse, hanging the stream.
So the danger is not "rewrite Gemini" or "never rewrite Gemini" — it is rewriting exactly the OAuth Code Assist preview models and nothing else.
A custom bind address must never be able to leave the proxy permanently down.
startListener(allowCustomBindAddress:) in src/Sources/ThinkingProxy.swift
applies the user-configured bind address, but if the listener init throws or the
listener transitions to .failed (for example the address is malformed or not
assigned to any local interface), it cancels and retries once with
allowCustomBindAddress: false, falling back to the default all-interfaces
bind. Combined with the trim/empty/multiline validation in
AppPreferences.bindAddress, a bad address degrades to a working default rather
than a dead proxy.
| Area | Don't do this | Why |
|---|---|---|
| Reasoning fields | Re-add proxy-side injection of thinking/reasoning/reasoning_effort/output_config/budget_tokens/generationConfig, or model-suffix variant parsing |
Droid CLI now owns reasoning via model metadata; injection double-writes or overrides it |
| JSON edits | Switch request-body edits to JSONSerialization round-trips |
Key reordering breaks Anthropic prompt-cache matching |
| Field scan | Widen routing keys or eagerly traverse messages
|
Adds parse cost to every request; the early-exit scan exists to avoid it |
| Anthropic-Beta | Stop stripping redact-thinking-2026-02-12 / dropping the visible-thinking betas |
Claude then emits only signed empty thinking blocks |
| Claude content | Leave "content":[] after stripping thinking blocks |
Anthropic rejects empty content; use the placeholder and keep role alternation |
disable-cooling |
Re-enable cooling | Re-opens issue #57 (auth_unavailable blackout in long sessions) |
session-affinity |
Turn it off | Re-opens issue #58 (stateful Responses API breaks on round-robin) |
| Gemini paths | Rewrite all Gemini /v1/responses, or none |
Only OAuth Code Assist -preview models need it; antigravity models break if rewritten |
| Bind address | Remove the all-interfaces fallback | A bad custom bind would leave the proxy down with no listener |
For the live behavior these notes describe, see
../systems/thinking-proxy.md and
../reference/configuration.md. Contribution
conventions are in
../how-to-contribute/patterns-and-conventions.md.