Skip to content

background

Nik edited this page May 30, 2026 · 2 revisions

Background

Purpose

This page records the design decisions behind DroidProxy and the danger zones that come with them. It is meant for anyone changing the request path, the bundled backend config, or the JSON-editing helpers. Several of these decisions look removable until you understand why they exist, so the goal here is to explain the "why" and flag the changes that will quietly break Claude prompt caching, Anthropic request validation, or Gemini streaming if you make them.

Related pages: ../systems/thinking-proxy.md, ../how-to-contribute/patterns-and-conventions.md, ../reference/configuration.md, ../lore.md.

Why two servers

DroidProxy ships a bundled backend (cli-proxy-api, on 127.0.0.1:8318) and puts its own proxy, ThinkingProxy, in front of it on 127.0.0.1:8317. ServerManager (src/Sources/ServerManager.swift) supervises the backend as a child process; ThinkingProxy (src/Sources/ThinkingProxy.swift) is the user-facing listener that Droid CLI actually talks to.

The split exists so DroidProxy can inspect and rewrite requests and responses without forking the backend. The backend handles OAuth, provider routing, and upstream transport; the proxy handles the small set of host-specific tweaks (Anthropic-Beta header rewriting, Gemini path rewriting, fast-mode service tier, Claude thinking-block sanitizing). Keeping those tweaks in a thin Swift layer means the vendored Go binary stays unmodified and replaceable.

The reasoning-ownership refactor (most important)

This is the single most important "do not re-add this" note in the codebase.

Reasoning effort used to be owned by the proxy. The proxy would inject thinking, reasoning, reasoning_effort, output_config, budget_tokens, and generationConfig.thinkingConfig into request bodies, and it parsed "advanced variant" model suffixes like claude-opus-4-8(high) or gpt-5.2(xhigh) to pick a level. It also had a "Max Budget Mode" override.

All of that was removed. Reasoning is now owned by Droid CLI. Each model is registered with native reasoning metadata (enableThinking, supportedReasoningEfforts, defaultReasoningEffort, reasoningEffort) in src/Sources/DroidProxyModelCatalog.swift, so Droid's per-session selector exposes every level a model supports and sends the chosen value in the request body. The proxy forwards that body unchanged.

What this means for anyone editing ThinkingProxy:

  • Do not re-introduce injection of thinking, reasoning, reasoning_effort, output_config, budget_tokens, or generationConfig.thinkingConfig. Droid CLI already sets these. Re-adding proxy-side injection would double-write or override the user's selected effort.
  • Do not re-add model-suffix parsing for advanced variants. Every level now ships in the single base catalog entry.

The proxy still reads a few fields (model, thinking.type, service_tier) for routing and header decisions, but it does not author reasoning fields. The AGENTS.md "what it no longer does" list is the authoritative inventory of removed behavior.

Surgical JSON editing

ThinkingProxy edits request bodies by scanning the raw JSON string and splicing at byte ranges. It never round-trips through JSONSerialization.

The reason is Anthropic's prompt cache. JSONSerialization.data(...) reorders object keys, and any change in key order changes the cache key, causing cache misses on large prompts. The helpers findTopLevelFieldLocations, consumeJSONValue, parseJSONStringToken, and friends locate exact value ranges so edits (for example inserting ,"service_tier":"priority" right after the model value) preserve original ordering.

Danger zones:

  • Do not switch any request-body edit to a serialize/deserialize round-trip. This is also called out as a hard convention in AGENTS.md.
  • The top-level field scan is intentionally narrow. routingInspectionKeys is a small set (model, service_tier, thinking) so the scan can early-exit before it has to traverse the potentially huge messages array on most requests. Debug-only keys are scanned separately and only when a log line is about to be written. Widening the routing key set or scanning messages eagerly would add parse cost to every request.

Claude thinking handling

Two distinct pieces of Claude-specific handling live in the proxy.

Anthropic-Beta rewrite. When a Claude request has thinking.type of enabled, adaptive, or auto, the proxy strips redact-thinking-2026-02-12 from the Anthropic-Beta header and appends the visible-thinking beta list (interleaved-thinking, prompt-caching-scope, fast-mode, and others in Config.claudeVisibleThinkingBetas). Without this rewrite, Claude emits only signed empty thinking blocks instead of visible reasoning, so the header surgery is necessary, not cosmetic.

Thinking-block sanitization. ClaudeThinkingBlockSanitizer (src/Sources/ClaudeThinkingBlockSanitizer.swift) strips stale thinking and redacted_thinking blocks from prior assistant turns before forwarding. It preserves thinking on the latest assistant turn that has trailing tool_results (that turn's thinking is still live), but removes it from earlier turns, which Anthropic otherwise rejects or mishandles. A subtle constraint: if stripping would leave "content":[], the sanitizer substitutes a placeholder content array ([{"type":"text","text":"..."}]) rather than emptying it, because an empty content array is invalid and dropping the message would break user/assistant role alternation.

Backend config decisions (issues #57 and #58)

src/Sources/Resources/config.yaml carries a few non-default choices, each tied to a real failure mode:

  • disable-cooling: true — the backend's global auth/model cooldown scheduling is turned off. Issue #57 describes long Factory/Droid sessions hitting blackout windows where every Codex/Claude/Gemini auth reports auth_unavailable after a single transient failure. Disabling cooling avoids that whole-fleet blackout.
  • routing.session-affinity: true (with session-affinity-ttl: "2h") — pins a session to one upstream auth. Issue #58 covers stateful Responses API traffic (encrypted reasoning state, previous_response_id) breaking when a follow-up turn round-robins to a different account. Affinity keeps the conversation on the account that holds its state.
  • request-timeout: "10m" — extended-thinking requests can run many minutes, so the upstream timeout is raised well above a typical HTTP default.

These are deliberate. Reverting any of them re-opens the linked issue.

Gemini path handling

The Responses API endpoint (/v1/responses, /api/v1/responses) is not supported by every Gemini path, so the proxy rewrites it conditionally.

  • OAuth Code Assist Gemini models (the -preview-suffixed names served by the gemini-cli executor) cannot use the Responses API, so their /v1/responses is rewritten to /v1/chat/completions. The check is isOAuthCodeAssistGeminiModel, which matches a gemini- prefix and a -preview suffix.
  • Antigravity-routed Gemini models (for example gemini-3-flash, gemini-pro-agent) do support /v1/responses natively and must not be rewritten. Rewriting them would make the backend return chat-completions SSE that Droid CLI cannot parse, hanging the stream.

So the danger is not "rewrite Gemini" or "never rewrite Gemini" — it is rewriting exactly the OAuth Code Assist preview models and nothing else.

Bind-address fallback

A custom bind address must never be able to leave the proxy permanently down. startListener(allowCustomBindAddress:) in src/Sources/ThinkingProxy.swift applies the user-configured bind address, but if the listener init throws or the listener transitions to .failed (for example the address is malformed or not assigned to any local interface), it cancels and retries once with allowCustomBindAddress: false, falling back to the default all-interfaces bind. Combined with the trim/empty/multiline validation in AppPreferences.bindAddress, a bad address degrades to a working default rather than a dead proxy.

Pitfalls and danger zones summary

Area Don't do this Why
Reasoning fields Re-add proxy-side injection of thinking/reasoning/reasoning_effort/output_config/budget_tokens/generationConfig, or model-suffix variant parsing Droid CLI now owns reasoning via model metadata; injection double-writes or overrides it
JSON edits Switch request-body edits to JSONSerialization round-trips Key reordering breaks Anthropic prompt-cache matching
Field scan Widen routing keys or eagerly traverse messages Adds parse cost to every request; the early-exit scan exists to avoid it
Anthropic-Beta Stop stripping redact-thinking-2026-02-12 / dropping the visible-thinking betas Claude then emits only signed empty thinking blocks
Claude content Leave "content":[] after stripping thinking blocks Anthropic rejects empty content; use the placeholder and keep role alternation
disable-cooling Re-enable cooling Re-opens issue #57 (auth_unavailable blackout in long sessions)
session-affinity Turn it off Re-opens issue #58 (stateful Responses API breaks on round-robin)
Gemini paths Rewrite all Gemini /v1/responses, or none Only OAuth Code Assist -preview models need it; antigravity models break if rewritten
Bind address Remove the all-interfaces fallback A bad custom bind would leave the proxy down with no listener

For the live behavior these notes describe, see ../systems/thinking-proxy.md and ../reference/configuration.md. Contribution conventions are in ../how-to-contribute/patterns-and-conventions.md.

Clone this wiki locally