feat: add /v1/responses passthrough MVP#54
Merged
Conversation
…onses
Extends the Anthropic streaming path to capture cache_read_input_tokens
and cache_creation_input_tokens from message_start (and defensively from
message_delta for models that emit mid-stream usage updates), and to
emit a trailing OpenAI-shaped usage chunk (choices: [], usage: {...})
so clients with stream_options.include_usage: true see cache hit rate
even on streaming calls.
Parity with toOpenAIResponse:
- cache_read + cache_creation rolled into prompt_tokens
- cache_read surfaced as prompt_tokens_details.cached_tokens (only when
cache fields are present)
Also plumbs the buckets through StreamResult into the Anthropic SDK
adapter, which sets two new OTel span attrs (conditionally, when >0):
- prov.llm.cache_read_input_tokens
- prov.llm.cache_creation_input_tokens
The same attrs are now also set on the non-streaming path for
AgentWeave parity across streaming and non-streaming sessions.
Closes #52.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- centralize DOWNSTREAM_PROTOCOLS parsing in config (one source of truth; registry no longer duplicates env parsing inline) - drop unused ResponsesInputItem import in app.ts - rename emitDownstreamResponsesAsSse → emitMockResponsesAsSse with proper output[].content[].text walk; clarifies it's the mock-only path - strip Mux-internal fields (runtime, protocol) before forwarding to downstream — prevents 400s from strict OpenAI endpoints - promote protocol?: RequestProtocol to ChatCompletionsRequest, drop the intersection-type casts at resolveRoute call sites - document DOWNSTREAM_PROTOCOLS in .env.example - add docs/deploy-debian.md covering systemd unit, OpenClaw plumbing (requires openclaw 13085b0bdf for baseUrl honoring), AgentWeave passthrough, SSE proxy-buffering caveat, and troubleshooting Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
POST /v1/responseshandling in the Express appTesting
Closes #40