Skip to content

feat: add /v1/responses passthrough MVP#54

Merged
arniesaha merged 3 commits intomasterfrom
feat/stream-cache-usage
May 4, 2026
Merged

feat: add /v1/responses passthrough MVP#54
arniesaha merged 3 commits intomasterfrom
feat/stream-cache-usage

Conversation

@arniesaha
Copy link
Copy Markdown
Owner

Summary

  • add native POST /v1/responses handling in the Express app
  • allow providers to declare supported protocols and route responses requests only to responses-capable downstreams
  • add openai-compatible downstream passthrough + tests/docs for the MVP flow

Testing

  • npm test
  • npm run check

Closes #40

arniesaha and others added 3 commits April 23, 2026 01:10
…onses

Extends the Anthropic streaming path to capture cache_read_input_tokens
and cache_creation_input_tokens from message_start (and defensively from
message_delta for models that emit mid-stream usage updates), and to
emit a trailing OpenAI-shaped usage chunk (choices: [], usage: {...})
so clients with stream_options.include_usage: true see cache hit rate
even on streaming calls.

Parity with toOpenAIResponse:
  - cache_read + cache_creation rolled into prompt_tokens
  - cache_read surfaced as prompt_tokens_details.cached_tokens (only when
    cache fields are present)

Also plumbs the buckets through StreamResult into the Anthropic SDK
adapter, which sets two new OTel span attrs (conditionally, when >0):
  - prov.llm.cache_read_input_tokens
  - prov.llm.cache_creation_input_tokens
The same attrs are now also set on the non-streaming path for
AgentWeave parity across streaming and non-streaming sessions.

Closes #52.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- centralize DOWNSTREAM_PROTOCOLS parsing in config (one source of truth;
  registry no longer duplicates env parsing inline)
- drop unused ResponsesInputItem import in app.ts
- rename emitDownstreamResponsesAsSse → emitMockResponsesAsSse with
  proper output[].content[].text walk; clarifies it's the mock-only path
- strip Mux-internal fields (runtime, protocol) before forwarding to
  downstream — prevents 400s from strict OpenAI endpoints
- promote protocol?: RequestProtocol to ChatCompletionsRequest, drop the
  intersection-type casts at resolveRoute call sites
- document DOWNSTREAM_PROTOCOLS in .env.example
- add docs/deploy-debian.md covering systemd unit, OpenClaw plumbing
  (requires openclaw 13085b0bdf for baseUrl honoring), AgentWeave
  passthrough, SSE proxy-buffering caveat, and troubleshooting

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@arniesaha arniesaha merged commit e13d242 into master May 4, 2026
1 check passed
@arniesaha arniesaha deleted the feat/stream-cache-usage branch May 4, 2026 06:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(app): add /v1/responses endpoint for OpenAI Responses API clients

1 participant