Skip to content

systems thinking proxy

Nik edited this page May 30, 2026 · 2 revisions

Thinking proxy

Active contributors: Ran, Nik

Purpose

The thinking proxy is a thin TCP HTTP proxy that listens on localhost:8317 and is the endpoint Droid CLI connects to. It is built directly on Apple's Network.framework (NWListener / NWConnection) rather than a high-level HTTP server, so it can stream large request and response bodies and edit raw bytes without re-serializing JSON.

Its job is to receive each HTTP request, apply a small, closed set of mutations, and forward the request to the bundled cli-proxy-api backend on 127.0.0.1:8318 (or, for Cursor models, to the Cursor API). Reasoning effort is owned by Droid CLI — the proxy forwards whatever reasoning/thinking values the client sends and does not inject them.

The whole proxy lives in src/Sources/ThinkingProxy.swift. Claude thinking-block stripping is delegated to src/Sources/ClaudeThinkingBlockSanitizer.swift, which is covered by src/Tests/CLIProxyMenuBarTests/ClaudeThinkingBlockSanitizerTests.swift.

Directory layout

src/Sources/ThinkingProxy.swift                                  # the proxy (listener, parsing, mutations, forwarding)
src/Sources/ClaudeThinkingBlockSanitizer.swift                   # strips stale Claude thinking/redacted_thinking blocks
src/Tests/CLIProxyMenuBarTests/ClaudeThinkingBlockSanitizerTests.swift  # sanitizer unit tests

Key abstractions

Type / function File Purpose
ThinkingProxy src/Sources/ThinkingProxy.swift The whole proxy: listener lifecycle, request parsing, mutation pipeline, forwarding, response streaming.
start() / startListener(allowCustomBindAddress:) / stop() src/Sources/ThinkingProxy.swift Listener lifecycle; custom bind-address handling with fallback to all-interfaces on failure.
receiveNextChunk(from:accumulatedData:) src/Sources/ThinkingProxy.swift Iterative request accumulation honoring Content-Length.
processRequest(data:connection:) src/Sources/ThinkingProxy.swift The decision tree: routing, mutations, path rewrites, logging.
RequestJSONFields / inspectRequestJSONFields(in:) src/Sources/ThinkingProxy.swift Surgical extraction of model / service_tier / thinking locations without full JSON parse.
headersForForwarding(_:requestFields:) / shouldRequestVisibleClaudeThinking(_:) src/Sources/ThinkingProxy.swift Anthropic-Beta header rewriting for visible Claude thinking.
processOpenAIFastMode(jsonString:path:fields:) src/Sources/ThinkingProxy.swift Injects service_tier:"priority" for enabled GPT 5.x fast-mode models.
rewriteAntigravityModelAlias(...) / rewriteCursorModelAlias(...) src/Sources/ThinkingProxy.swift Surgical model-name aliasing.
forwardRequest(...) / streamNextChunk(...) / finishStreaming(target:client:) src/Sources/ThinkingProxy.swift Forward to :8318 backend and stream the response back.
forwardToCursor(...) / loadCursorApiKey() / receiveCursorResponse(...) src/Sources/ThinkingProxy.swift Cursor model routing to cursor-api.standardagents.ai.
findObjectFieldLocations(...) / consumeJSONValue(...) / parseJSONStringToken(...) src/Sources/ThinkingProxy.swift Hand-written JSON scanner used for all surgical edits.
ClaudeThinkingBlockSanitizer.sanitize(_:) src/Sources/ClaudeThinkingBlockSanitizer.swift Strips stale assistant thinking / redacted_thinking blocks.
fileLog(_:) src/Sources/ThinkingProxy.swift Appends to /tmp/droidproxy-debug.log.

How it works

Starting and stopping the listener

start() calls startListener(allowCustomBindAddress: true). The listener uses NWParameters.tcp with allowLocalEndpointReuse = true on port 8317. If AppPreferences.bindAddress is set to something other than 0.0.0.0, the listener is restricted to that address via requiredLocalEndpoint; otherwise it binds to all interfaces.

Because a user-supplied bind address can be malformed or unassigned to any local interface, the listener has a one-shot fallback: if the stateUpdateHandler reports .failed while useCustomBind is true, it cancels the failed listener and re-invokes startListener(allowCustomBindAddress: false) on a background queue, falling back to the default all-interfaces bind so a bad address can't leave the proxy permanently down. The same fallback runs if NWListener init throws. stop() cancels the listener under stateQueue and clears isRunning.

Receiving a request

handleConnection starts the connection and calls receiveRequest, which delegates to receiveNextChunk. That method reads up to 1 MiB per receive call and accumulates bytes. It locates the end of headers with a binary match on \r\n\r\n (Data([13, 10, 13, 10])) to avoid repeated O(N²) UTF-8 decodes on each chunk. Once headers are present, it parses Content-Length (parseContentLength) and keeps scheduling more reads while the body received is shorter than advertised and the stream is still open. When the body is complete (or the header/stream is truncated), it calls processRequest.

receiveNextChunk re-schedules itself asynchronously via connection.receive's completion handler rather than recursing synchronously, so a long stream of chunks does not build up the call stack.

The processRequest decision tree

processRequest decodes the request to a string, splits the request line into method / path / version, collects headers preserving original casing, and finds the body after \r\n\r\n. It then walks a fixed decision tree:

graph TD
    A["processRequest"] --> E{"POST with body?"}
    E -->|no| K["forwardRequest to :8318"]
    E -->|yes| F["rewriteAntigravityModelAlias"]
    F --> G{"isCursorModel?"}
    G -->|yes| G1["gate on BETA_FLAG + cursor enabled,<br/>rewriteCursorModelAlias,<br/>forwardToCursor"]
    G -->|no| H["processOpenAIFastMode<br/>(inject service_tier)"]
    H --> I{"isClaudeModel?"}
    I -->|yes| I1["ClaudeThinkingBlockSanitizer.sanitize"]
    I -->|no| J["reasoningSummaryLog -> fileLog"]
    I1 --> J
    J --> L{"responses path AND<br/>OAuth Code Assist Gemini?"}
    L -->|yes| L1["rewrite /responses -> /chat/completions"]
    L -->|no| M["headersForForwarding<br/>(Anthropic-Beta rewrite)"]
    L1 --> M
    M --> K
Loading

Notable branches:

  • Antigravity model alias rewriterewriteAntigravityModelAlias maps client-facing aliases (ag-c46s-thinkingclaude-sonnet-4-6, ag-c46o-thinkingclaude-opus-4-6-thinking) by surgically replacing just the model value.
  • Cursor model gating + rewrite + forward — when the model starts with cursor-, the proxy requires BETA_FLAG and the cursor provider to be enabled (isCursorEnabled), rewrites the alias (cursor-composer-2.5composer-2.5), then forwards directly to the Cursor API with forwardToCursor. Failures return 400.
  • OpenAI fast modeprocessOpenAIFastMode injects service_tier:"priority"; see ../features/fast-mode.md.
  • Claude thinking-block sanitization — for Claude models, ClaudeThinkingBlockSanitizer.sanitize strips stale assistant thinking; see the subsection below.
  • Reasoning summary logreasoningSummaryLog builds the REQUEST REASONING: line written to /tmp/droidproxy-debug.log.
  • Gemini path rewrite — for /v1/responses (and /api/v1/responses) on an OAuth Code Assist Gemini model, the path is rewritten to /v1/chat/completions.

What it does and does not mutate

The proxy applies only this closed set of mutations:

  1. Anthropic-Beta header rewrite for visible Claude thinking.
  2. service_tier:"priority" injection for enabled GPT 5.x fast-mode models.
  3. Antigravity and Cursor model-name alias rewrites.
  4. Claude thinking-block sanitization (stripping stale thinking / redacted_thinking).
  5. Path rewrite: OAuth Code Assist Gemini /v1/responses/v1/chat/completions.

It does not inject reasoning, reasoning_effort, thinking, output_config, budget_tokens, generationConfig.thinkingConfig, or any other reasoning field. Those are owned entirely by Droid CLI and forwarded unchanged. See ../features/reasoning-and-models.md.

Anthropic-Beta header rewriting

When a Claude request enables thinking, Anthropic otherwise emits only signed empty thinking blocks unless the redact-thinking-2026-02-12 beta is dropped and the visible-thinking beta set is present. headersForForwarding gates on shouldRequestVisibleClaudeThinking, which is true only when the model is a Claude model (claude- or gemini-claude- prefix) and the request's thinking.type is enabled, adaptive, or auto.

When that holds, headersWithVisibleClaudeThinkingBetas:

  1. Collects every existing Anthropic-Beta value (comma-split) and appends Config.claudeVisibleThinkingBetas:
    static let claudeVisibleThinkingBetas = [
        "claude-code-20250219",
        "oauth-2025-04-20",
        "interleaved-thinking-2025-05-14",
        "context-management-2025-06-27",
        "prompt-caching-scope-2026-01-05",
        "structured-outputs-2025-12-15",
        "fast-mode-2026-02-01",
        "token-efficient-tools-2026-03-28"
    ]
  2. De-duplicates case-insensitively and drops redact-thinking-2026-02-12.
  3. Re-emits a single Anthropic-Beta header with the merged, comma-joined list.

Surgical JSON helpers

The proxy never round-trips request bodies through JSONSerialization, because that reorders object keys alphabetically and would break Anthropic's prompt-cache matching (the cache key depends on exact byte ordering). Instead it uses a hand-written scanner that locates fields by String.Index range and edits the raw string in place.

Two performance details matter:

  • routingInspectionKeys vs reasoningLogInspectionKeys. inspectRequestJSONFields scans only the small routingInspectionKeys set (model, service_tier, thinking). Because findObjectFieldLocations early-exits as soon as it has found all requested keys, routing decisions usually finish before the scanner ever reaches the (potentially huge) messages array. The debug-only keys (reasoning, reasoning_effort, output_config, generationConfig) are scanned separately in reasoningSummaryLog, and only when about to log, so a missing optional key never forces routing to traverse messages.
  • The scanner primitives. findObjectFieldLocations walks an object's key/value pairs; consumeJSONValue advances past a scalar, string, or composite value (delegating nested {}/[] to consumeCompositeJSONValue, which tracks string/escape state and brace depth); parseJSONStringToken reads a quoted string honoring \ escapes. Each returns the String.Index range of the matched span, which callers use with replaceSubrange / insert to mutate exact bytes.

Claude thinking-block sanitizer

ClaudeThinkingBlockSanitizer.sanitize (in src/Sources/ClaudeThinkingBlockSanitizer.swift) removes stale assistant reasoning blocks of type thinking and redacted_thinking from the messages array before forwarding.

The core rule: the latest active tool-use turn keeps its thinking. latestAssistantIndexWithTrailingToolResults walks backward from the end of messages across a run of trailing user turns that contain a tool_result block (isToolResultTurn); if that run is immediately preceded by an assistant turn, that assistant index is preserved. Anthropic requires the thinking block to remain on the assistant turn whose tool calls are still being answered, so stripping it would break the request. Every other assistant turn has its thinking / redacted_thinking blocks removed.

Two edge cases are handled explicitly:

  • Empty content guard. If removing the blocks would leave "content":[] (which Anthropic rejects), the content array is replaced with emptyContentPlaceholder = [{"type":"text","text":"..."}] instead of being emptied or the message dropped, so user/assistant role alternation is preserved.
  • Surgical removal. Like the proxy, the sanitizer edits raw string ranges (grouping adjacent removable blocks so commas stay valid) rather than re-serializing.

The sanitizer is covered by src/Tests/CLIProxyMenuBarTests/ClaudeThinkingBlockSanitizerTests.swift:

Test What it proves
testPreservesThinkingForTrailingToolResults Thinking on the assistant turn answered by trailing tool_results is left untouched.
testPreservesRedactedThinkingForTrailingToolResults Same for redacted_thinking.
testStripsThinkingAfterNormalUserContent Thinking is stripped once a normal (non-tool-result) user turn follows.
testStripsRedactedThinkingAfterNormalUserContent Same for redacted_thinking.
testStripsEarlierAssistantThinkingWhenLatestToolResultCycleIsPreserved Only the latest tool-use cycle's thinking (signature:"new") is kept; the older one (signature:"old") is stripped.
testPreservesThinkingWhenTrailingToolResultIncludesText A trailing user turn mixing tool_result + text still counts as an active tool-result turn, so thinking is preserved.
testReplacesEmptiedAssistantContentWithPlaceholder An assistant turn whose only block was thinking becomes the placeholder text instead of "content":[]; roles still alternate user, assistant, user.

Forwarding and response streaming

forwardRequest opens a plain TCP NWConnection to 127.0.0.1:8318, rebuilds the request line and headers (excluding content-length, host, transfer-encoding), overrides Host to the backend, always sets Connection: close (the proxy does not support keep-alive/pipelining), recomputes Content-Length from the possibly-mutated body, and sends it. On .failed it returns 502.

receiveResponsestreamNextChunk streams the backend's response back to the client in ≤64 KiB chunks, again re-scheduling asynchronously rather than recursing. When the stream completes it signals end-of-stream (send(content: nil, isComplete: true)) and cancels both connections. finishStreaming is the shared idempotent teardown.

Cursor forwarding. forwardToCursor opens a TLS connection to cursor-api.standardagents.ai:443, strips the client authorization header, and injects Authorization: Bearer <key> where the key comes from loadCursorApiKey — which scans ~/.cli-proxy-api/ for the first enabled *.json whose type is cursor and reads its apiKey. receiveCursorResponse streams the reply back verbatim.

Debug log

fileLog appends ISO-8601-timestamped lines to /tmp/droidproxy-debug.log on a dedicated serial queue. Per request it can emit INCOMING REQUEST, REWRITE MODEL, INJECTED service_tier=priority, SANITIZED CLAUDE THINKING BLOCKS, REWRITE PATH, and the reasoning summary. The reasoning line has the form:

REQUEST REASONING: model=<model> reasoning=<snippet> reasoning_effort=<snippet> thinking=<snippet> ...

Fields appear in reasoningSummaryOrder (reasoning, reasoning_effort, thinking, output_config, service_tier, generationConfig), each raw value truncated to reasoningSummarySnippetLimit (512) chars with CR/LF flattened to spaces. If only model= would be present, no line is emitted.

Integration points

  • cli-proxy-api backend (127.0.0.1:8318). Default forward target via forwardRequest. The backend process is owned by ServerManager — see server-manager.md.
  • cursor-api.standardagents.ai:443. Cursor model requests, authenticated with the key from cursor.json in the auth directory.
  • AppPreferences. Reads bindAddress (listener bind) and the fast-mode flags gpt53CodexFastMode, gpt54FastMode, gpt55FastMode.
  • UserDefaults enabledProviders. isCursorEnabled reads this dictionary to gate Cursor routing.
  • AuthPaths.authDirectory (~/.cli-proxy-api/). Source of the Cursor API key.
  • /tmp/droidproxy-debug.log. Per-request diagnostics.

Entry points for modification

  • Add/adjust a request mutation: edit the decision tree in processRequest. Keep mutations surgical — reuse inspectRequestJSONFields and the JSON scanner helpers; never re-serialize the body. See ../how-to-contribute/patterns-and-conventions.md.
  • New model alias: add to antigravityModelAliases / cursorModelAliases.
  • Change visible-thinking betas: edit Config.claudeVisibleThinkingBetas / claudeRedactedThinkingBeta.
  • Fast-mode model set: edit the switch in processOpenAIFastMode (and the matching AppPreferences flag).
  • Thinking-block stripping rules: edit ClaudeThinkingBlockSanitizer and extend ClaudeThinkingBlockSanitizerTests.
  • A new routing target / path rewrite: add a branch in processRequest and a dedicated forwardTo… if it needs a different upstream.

Key source files

File Role
src/Sources/ThinkingProxy.swift The proxy: listener, request parsing, mutation pipeline, forwarding, response streaming, Cursor routing, debug log.
src/Sources/ClaudeThinkingBlockSanitizer.swift Strips stale Claude thinking / redacted_thinking blocks while preserving the latest active tool-use turn.
src/Tests/CLIProxyMenuBarTests/ClaudeThinkingBlockSanitizerTests.swift Unit tests pinning the sanitizer's preserve/strip rules and the empty-content placeholder behavior.

Related pages

Clone this wiki locally