-
Notifications
You must be signed in to change notification settings - Fork 12
systems thinking proxy
Active contributors: Ran, Nik
The thinking proxy is a thin TCP HTTP proxy that listens on localhost:8317 and is the endpoint Droid CLI connects to. It is built directly on Apple's Network.framework (NWListener / NWConnection) rather than a high-level HTTP server, so it can stream large request and response bodies and edit raw bytes without re-serializing JSON.
Its job is to receive each HTTP request, apply a small, closed set of mutations, and forward the request to the bundled cli-proxy-api backend on 127.0.0.1:8318 (or, for Cursor models, to the Cursor API). Reasoning effort is owned by Droid CLI — the proxy forwards whatever reasoning/thinking values the client sends and does not inject them.
The whole proxy lives in src/Sources/ThinkingProxy.swift. Claude thinking-block stripping is delegated to src/Sources/ClaudeThinkingBlockSanitizer.swift, which is covered by src/Tests/CLIProxyMenuBarTests/ClaudeThinkingBlockSanitizerTests.swift.
src/Sources/ThinkingProxy.swift # the proxy (listener, parsing, mutations, forwarding)
src/Sources/ClaudeThinkingBlockSanitizer.swift # strips stale Claude thinking/redacted_thinking blocks
src/Tests/CLIProxyMenuBarTests/ClaudeThinkingBlockSanitizerTests.swift # sanitizer unit tests
| Type / function | File | Purpose |
|---|---|---|
ThinkingProxy |
src/Sources/ThinkingProxy.swift |
The whole proxy: listener lifecycle, request parsing, mutation pipeline, forwarding, response streaming. |
start() / startListener(allowCustomBindAddress:) / stop()
|
src/Sources/ThinkingProxy.swift |
Listener lifecycle; custom bind-address handling with fallback to all-interfaces on failure. |
receiveNextChunk(from:accumulatedData:) |
src/Sources/ThinkingProxy.swift |
Iterative request accumulation honoring Content-Length. |
processRequest(data:connection:) |
src/Sources/ThinkingProxy.swift |
The decision tree: routing, mutations, path rewrites, logging. |
RequestJSONFields / inspectRequestJSONFields(in:)
|
src/Sources/ThinkingProxy.swift |
Surgical extraction of model / service_tier / thinking locations without full JSON parse. |
headersForForwarding(_:requestFields:) / shouldRequestVisibleClaudeThinking(_:)
|
src/Sources/ThinkingProxy.swift |
Anthropic-Beta header rewriting for visible Claude thinking. |
processOpenAIFastMode(jsonString:path:fields:) |
src/Sources/ThinkingProxy.swift |
Injects service_tier:"priority" for enabled GPT 5.x fast-mode models. |
rewriteAntigravityModelAlias(...) / rewriteCursorModelAlias(...)
|
src/Sources/ThinkingProxy.swift |
Surgical model-name aliasing. |
forwardRequest(...) / streamNextChunk(...) / finishStreaming(target:client:)
|
src/Sources/ThinkingProxy.swift |
Forward to :8318 backend and stream the response back. |
forwardToCursor(...) / loadCursorApiKey() / receiveCursorResponse(...)
|
src/Sources/ThinkingProxy.swift |
Cursor model routing to cursor-api.standardagents.ai. |
findObjectFieldLocations(...) / consumeJSONValue(...) / parseJSONStringToken(...)
|
src/Sources/ThinkingProxy.swift |
Hand-written JSON scanner used for all surgical edits. |
ClaudeThinkingBlockSanitizer.sanitize(_:) |
src/Sources/ClaudeThinkingBlockSanitizer.swift |
Strips stale assistant thinking / redacted_thinking blocks. |
fileLog(_:) |
src/Sources/ThinkingProxy.swift |
Appends to /tmp/droidproxy-debug.log. |
start() calls startListener(allowCustomBindAddress: true). The listener uses NWParameters.tcp with allowLocalEndpointReuse = true on port 8317. If AppPreferences.bindAddress is set to something other than 0.0.0.0, the listener is restricted to that address via requiredLocalEndpoint; otherwise it binds to all interfaces.
Because a user-supplied bind address can be malformed or unassigned to any local interface, the listener has a one-shot fallback: if the stateUpdateHandler reports .failed while useCustomBind is true, it cancels the failed listener and re-invokes startListener(allowCustomBindAddress: false) on a background queue, falling back to the default all-interfaces bind so a bad address can't leave the proxy permanently down. The same fallback runs if NWListener init throws. stop() cancels the listener under stateQueue and clears isRunning.
handleConnection starts the connection and calls receiveRequest, which delegates to receiveNextChunk. That method reads up to 1 MiB per receive call and accumulates bytes. It locates the end of headers with a binary match on \r\n\r\n (Data([13, 10, 13, 10])) to avoid repeated O(N²) UTF-8 decodes on each chunk. Once headers are present, it parses Content-Length (parseContentLength) and keeps scheduling more reads while the body received is shorter than advertised and the stream is still open. When the body is complete (or the header/stream is truncated), it calls processRequest.
receiveNextChunk re-schedules itself asynchronously via connection.receive's completion handler rather than recursing synchronously, so a long stream of chunks does not build up the call stack.
processRequest decodes the request to a string, splits the request line into method / path / version, collects headers preserving original casing, and finds the body after \r\n\r\n. It then walks a fixed decision tree:
graph TD
A["processRequest"] --> E{"POST with body?"}
E -->|no| K["forwardRequest to :8318"]
E -->|yes| F["rewriteAntigravityModelAlias"]
F --> G{"isCursorModel?"}
G -->|yes| G1["gate on BETA_FLAG + cursor enabled,<br/>rewriteCursorModelAlias,<br/>forwardToCursor"]
G -->|no| H["processOpenAIFastMode<br/>(inject service_tier)"]
H --> I{"isClaudeModel?"}
I -->|yes| I1["ClaudeThinkingBlockSanitizer.sanitize"]
I -->|no| J["reasoningSummaryLog -> fileLog"]
I1 --> J
J --> L{"responses path AND<br/>OAuth Code Assist Gemini?"}
L -->|yes| L1["rewrite /responses -> /chat/completions"]
L -->|no| M["headersForForwarding<br/>(Anthropic-Beta rewrite)"]
L1 --> M
M --> K
Notable branches:
-
Antigravity model alias rewrite —
rewriteAntigravityModelAliasmaps client-facing aliases (ag-c46s-thinking→claude-sonnet-4-6,ag-c46o-thinking→claude-opus-4-6-thinking) by surgically replacing just the model value. -
Cursor model gating + rewrite + forward — when the model starts with
cursor-, the proxy requiresBETA_FLAGand thecursorprovider to be enabled (isCursorEnabled), rewrites the alias (cursor-composer-2.5→composer-2.5), then forwards directly to the Cursor API withforwardToCursor. Failures return400. -
OpenAI fast mode —
processOpenAIFastModeinjectsservice_tier:"priority"; see../features/fast-mode.md. -
Claude thinking-block sanitization — for Claude models,
ClaudeThinkingBlockSanitizer.sanitizestrips stale assistant thinking; see the subsection below. -
Reasoning summary log —
reasoningSummaryLogbuilds theREQUEST REASONING:line written to/tmp/droidproxy-debug.log. -
Gemini path rewrite — for
/v1/responses(and/api/v1/responses) on an OAuth Code Assist Gemini model, the path is rewritten to/v1/chat/completions.
The proxy applies only this closed set of mutations:
- Anthropic-Beta header rewrite for visible Claude thinking.
-
service_tier:"priority"injection for enabled GPT 5.x fast-mode models. - Antigravity and Cursor model-name alias rewrites.
- Claude thinking-block sanitization (stripping stale
thinking/redacted_thinking). - Path rewrite: OAuth Code Assist Gemini
/v1/responses→/v1/chat/completions.
It does not inject reasoning, reasoning_effort, thinking, output_config, budget_tokens, generationConfig.thinkingConfig, or any other reasoning field. Those are owned entirely by Droid CLI and forwarded unchanged. See ../features/reasoning-and-models.md.
When a Claude request enables thinking, Anthropic otherwise emits only signed empty thinking blocks unless the redact-thinking-2026-02-12 beta is dropped and the visible-thinking beta set is present. headersForForwarding gates on shouldRequestVisibleClaudeThinking, which is true only when the model is a Claude model (claude- or gemini-claude- prefix) and the request's thinking.type is enabled, adaptive, or auto.
When that holds, headersWithVisibleClaudeThinkingBetas:
- Collects every existing
Anthropic-Betavalue (comma-split) and appendsConfig.claudeVisibleThinkingBetas:static let claudeVisibleThinkingBetas = [ "claude-code-20250219", "oauth-2025-04-20", "interleaved-thinking-2025-05-14", "context-management-2025-06-27", "prompt-caching-scope-2026-01-05", "structured-outputs-2025-12-15", "fast-mode-2026-02-01", "token-efficient-tools-2026-03-28" ]
- De-duplicates case-insensitively and drops
redact-thinking-2026-02-12. - Re-emits a single
Anthropic-Betaheader with the merged, comma-joined list.
The proxy never round-trips request bodies through JSONSerialization, because that reorders object keys alphabetically and would break Anthropic's prompt-cache matching (the cache key depends on exact byte ordering). Instead it uses a hand-written scanner that locates fields by String.Index range and edits the raw string in place.
Two performance details matter:
-
routingInspectionKeysvsreasoningLogInspectionKeys.inspectRequestJSONFieldsscans only the smallroutingInspectionKeysset (model,service_tier,thinking). BecausefindObjectFieldLocationsearly-exits as soon as it has found all requested keys, routing decisions usually finish before the scanner ever reaches the (potentially huge)messagesarray. The debug-only keys (reasoning,reasoning_effort,output_config,generationConfig) are scanned separately inreasoningSummaryLog, and only when about to log, so a missing optional key never forces routing to traversemessages. -
The scanner primitives.
findObjectFieldLocationswalks an object's key/value pairs;consumeJSONValueadvances past a scalar, string, or composite value (delegating nested{}/[]toconsumeCompositeJSONValue, which tracks string/escape state and brace depth);parseJSONStringTokenreads a quoted string honoring\escapes. Each returns theString.Indexrange of the matched span, which callers use withreplaceSubrange/insertto mutate exact bytes.
ClaudeThinkingBlockSanitizer.sanitize (in src/Sources/ClaudeThinkingBlockSanitizer.swift) removes stale assistant reasoning blocks of type thinking and redacted_thinking from the messages array before forwarding.
The core rule: the latest active tool-use turn keeps its thinking. latestAssistantIndexWithTrailingToolResults walks backward from the end of messages across a run of trailing user turns that contain a tool_result block (isToolResultTurn); if that run is immediately preceded by an assistant turn, that assistant index is preserved. Anthropic requires the thinking block to remain on the assistant turn whose tool calls are still being answered, so stripping it would break the request. Every other assistant turn has its thinking / redacted_thinking blocks removed.
Two edge cases are handled explicitly:
-
Empty content guard. If removing the blocks would leave
"content":[](which Anthropic rejects), the content array is replaced withemptyContentPlaceholder=[{"type":"text","text":"..."}]instead of being emptied or the message dropped, so user/assistant role alternation is preserved. - Surgical removal. Like the proxy, the sanitizer edits raw string ranges (grouping adjacent removable blocks so commas stay valid) rather than re-serializing.
The sanitizer is covered by src/Tests/CLIProxyMenuBarTests/ClaudeThinkingBlockSanitizerTests.swift:
| Test | What it proves |
|---|---|
testPreservesThinkingForTrailingToolResults |
Thinking on the assistant turn answered by trailing tool_results is left untouched. |
testPreservesRedactedThinkingForTrailingToolResults |
Same for redacted_thinking. |
testStripsThinkingAfterNormalUserContent |
Thinking is stripped once a normal (non-tool-result) user turn follows. |
testStripsRedactedThinkingAfterNormalUserContent |
Same for redacted_thinking. |
testStripsEarlierAssistantThinkingWhenLatestToolResultCycleIsPreserved |
Only the latest tool-use cycle's thinking (signature:"new") is kept; the older one (signature:"old") is stripped. |
testPreservesThinkingWhenTrailingToolResultIncludesText |
A trailing user turn mixing tool_result + text still counts as an active tool-result turn, so thinking is preserved. |
testReplacesEmptiedAssistantContentWithPlaceholder |
An assistant turn whose only block was thinking becomes the placeholder text instead of "content":[]; roles still alternate user, assistant, user. |
forwardRequest opens a plain TCP NWConnection to 127.0.0.1:8318, rebuilds the request line and headers (excluding content-length, host, transfer-encoding), overrides Host to the backend, always sets Connection: close (the proxy does not support keep-alive/pipelining), recomputes Content-Length from the possibly-mutated body, and sends it. On .failed it returns 502.
receiveResponse → streamNextChunk streams the backend's response back to the client in ≤64 KiB chunks, again re-scheduling asynchronously rather than recursing. When the stream completes it signals end-of-stream (send(content: nil, isComplete: true)) and cancels both connections. finishStreaming is the shared idempotent teardown.
Cursor forwarding. forwardToCursor opens a TLS connection to cursor-api.standardagents.ai:443, strips the client authorization header, and injects Authorization: Bearer <key> where the key comes from loadCursorApiKey — which scans ~/.cli-proxy-api/ for the first enabled *.json whose type is cursor and reads its apiKey. receiveCursorResponse streams the reply back verbatim.
fileLog appends ISO-8601-timestamped lines to /tmp/droidproxy-debug.log on a dedicated serial queue. Per request it can emit INCOMING REQUEST, REWRITE MODEL, INJECTED service_tier=priority, SANITIZED CLAUDE THINKING BLOCKS, REWRITE PATH, and the reasoning summary. The reasoning line has the form:
REQUEST REASONING: model=<model> reasoning=<snippet> reasoning_effort=<snippet> thinking=<snippet> ...
Fields appear in reasoningSummaryOrder (reasoning, reasoning_effort, thinking, output_config, service_tier, generationConfig), each raw value truncated to reasoningSummarySnippetLimit (512) chars with CR/LF flattened to spaces. If only model= would be present, no line is emitted.
-
cli-proxy-apibackend (127.0.0.1:8318). Default forward target viaforwardRequest. The backend process is owned by ServerManager — seeserver-manager.md. -
cursor-api.standardagents.ai:443. Cursor model requests, authenticated with the key fromcursor.jsonin the auth directory. -
AppPreferences. ReadsbindAddress(listener bind) and the fast-mode flagsgpt53CodexFastMode,gpt54FastMode,gpt55FastMode. -
UserDefaultsenabledProviders.isCursorEnabledreads this dictionary to gate Cursor routing. -
AuthPaths.authDirectory(~/.cli-proxy-api/). Source of the Cursor API key. -
/tmp/droidproxy-debug.log. Per-request diagnostics.
-
Add/adjust a request mutation: edit the decision tree in
processRequest. Keep mutations surgical — reuseinspectRequestJSONFieldsand the JSON scanner helpers; never re-serialize the body. See../how-to-contribute/patterns-and-conventions.md. -
New model alias: add to
antigravityModelAliases/cursorModelAliases. -
Change visible-thinking betas: edit
Config.claudeVisibleThinkingBetas/claudeRedactedThinkingBeta. -
Fast-mode model set: edit the
switchinprocessOpenAIFastMode(and the matchingAppPreferencesflag). -
Thinking-block stripping rules: edit
ClaudeThinkingBlockSanitizerand extendClaudeThinkingBlockSanitizerTests. -
A new routing target / path rewrite: add a branch in
processRequestand a dedicatedforwardTo…if it needs a different upstream.
| File | Role |
|---|---|
src/Sources/ThinkingProxy.swift |
The proxy: listener, request parsing, mutation pipeline, forwarding, response streaming, Cursor routing, debug log. |
src/Sources/ClaudeThinkingBlockSanitizer.swift |
Strips stale Claude thinking / redacted_thinking blocks while preserving the latest active tool-use turn. |
src/Tests/CLIProxyMenuBarTests/ClaudeThinkingBlockSanitizerTests.swift |
Unit tests pinning the sanitizer's preserve/strip rules and the empty-content placeholder behavior. |