feat(downstream): surface cache_read/cache_creation on streaming responses (closes #52) by arniesaha · Pull Request #53 · arniesaha/mux

arniesaha · 2026-04-23T08:11:10Z

Summary

Closes #52.

#49/#50 added Anthropic cache_control injection and surfaced cached_tokens on the non-streaming response path. The streaming path (streamAnthropicToOpenAI) was still dropping the cache buckets entirely — AgentWeave and clients saw cached_tokens = 0 even when hit rate was ~99%. This PR closes that observability gap.

streamAnthropicToOpenAI now captures cache_read_input_tokens and cache_creation_input_tokens from message_start.message.usage, and defensively from message_delta.usage (last value wins, matching the existing output_tokens handling)
Trailing usage chunk emitted on success in OpenAI include_usage shape: { choices: [], usage: { prompt_tokens, completion_tokens, total_tokens, prompt_tokens_details? } }
prompt_tokens rolls cache_read + cache_creation in, matching toOpenAIResponse — billable prompt size is now symmetric across streaming and non-streaming
prompt_tokens_details.cached_tokens emitted only when Anthropic actually reports cache fields — plain requests stay clean
StreamResult widened with cacheReadTokens / cacheCreationTokens; Anthropic adapter sets two new OTel span attrs (prov.llm.cache_read_input_tokens, prov.llm.cache_creation_input_tokens) on both streaming and non-streaming paths, only when >0

Notes for reviewers

Chunk ordering follows the existing emitDownstreamResponseAsSse convention: finish_reason chunk → usage chunk → [DONE]. The issue's pseudo-code had them flipped; I went with the codebase-consistent order since any OpenAI client written against include_usage tolerates both.
Span attrs use conditional spread (...(x > 0 ? {...} : {})) so they're absent rather than set-to-zero when cache is disabled or inactive — matches the pattern already used for callerAgentId.
No new config flag. MUX_ANTHROPIC_PROMPT_CACHE=false naturally makes the new code emit a plain usage chunk with no prompt_tokens_details, since Anthropic won't return cache fields when we don't send cache_control.

Test plan

npm test — 92/92 passing (5 new tests covering cache-on/off usage chunk, last-value-wins on message_delta, cacheReadTokens/cacheCreationTokens in StreamResult, log-event fields); 2 existing chunk-count assertions updated for the new trailing usage chunk
npm run check — clean
Deploy to prod, confirm cached_tokens > 0 on the trailing SSE chunk of agent-max turn 2+
AgentWeave: confirm prov.llm.cache_read_input_tokens span attribute appears on streamed anthropic spans

🤖 Generated with Claude Code

…onses Extends the Anthropic streaming path to capture cache_read_input_tokens and cache_creation_input_tokens from message_start (and defensively from message_delta for models that emit mid-stream usage updates), and to emit a trailing OpenAI-shaped usage chunk (choices: [], usage: {...}) so clients with stream_options.include_usage: true see cache hit rate even on streaming calls. Parity with toOpenAIResponse: - cache_read + cache_creation rolled into prompt_tokens - cache_read surfaced as prompt_tokens_details.cached_tokens (only when cache fields are present) Also plumbs the buckets through StreamResult into the Anthropic SDK adapter, which sets two new OTel span attrs (conditionally, when >0): - prov.llm.cache_read_input_tokens - prov.llm.cache_creation_input_tokens The same attrs are now also set on the non-streaming path for AgentWeave parity across streaming and non-streaming sessions. Closes #52. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

arniesaha merged commit 5f5ef7f into master Apr 23, 2026
1 check passed

arniesaha deleted the feat/stream-cache-usage branch April 23, 2026 08:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(downstream): surface cache_read/cache_creation on streaming responses (closes #52)#53

feat(downstream): surface cache_read/cache_creation on streaming responses (closes #52)#53
arniesaha merged 1 commit intomasterfrom
feat/stream-cache-usage

arniesaha commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

arniesaha commented Apr 23, 2026

Summary

Notes for reviewers

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant