Skip to content

feat(downstream): surface cache_read/cache_creation on streaming responses (closes #52)#53

Merged
arniesaha merged 1 commit intomasterfrom
feat/stream-cache-usage
Apr 23, 2026
Merged

feat(downstream): surface cache_read/cache_creation on streaming responses (closes #52)#53
arniesaha merged 1 commit intomasterfrom
feat/stream-cache-usage

Conversation

@arniesaha
Copy link
Copy Markdown
Owner

Summary

Closes #52.

#49/#50 added Anthropic cache_control injection and surfaced cached_tokens on the non-streaming response path. The streaming path (streamAnthropicToOpenAI) was still dropping the cache buckets entirely — AgentWeave and clients saw cached_tokens = 0 even when hit rate was ~99%. This PR closes that observability gap.

  • streamAnthropicToOpenAI now captures cache_read_input_tokens and cache_creation_input_tokens from message_start.message.usage, and defensively from message_delta.usage (last value wins, matching the existing output_tokens handling)
  • Trailing usage chunk emitted on success in OpenAI include_usage shape: { choices: [], usage: { prompt_tokens, completion_tokens, total_tokens, prompt_tokens_details? } }
  • prompt_tokens rolls cache_read + cache_creation in, matching toOpenAIResponse — billable prompt size is now symmetric across streaming and non-streaming
  • prompt_tokens_details.cached_tokens emitted only when Anthropic actually reports cache fields — plain requests stay clean
  • StreamResult widened with cacheReadTokens / cacheCreationTokens; Anthropic adapter sets two new OTel span attrs (prov.llm.cache_read_input_tokens, prov.llm.cache_creation_input_tokens) on both streaming and non-streaming paths, only when >0

Notes for reviewers

  • Chunk ordering follows the existing emitDownstreamResponseAsSse convention: finish_reason chunk → usage chunk → [DONE]. The issue's pseudo-code had them flipped; I went with the codebase-consistent order since any OpenAI client written against include_usage tolerates both.
  • Span attrs use conditional spread (...(x > 0 ? {...} : {})) so they're absent rather than set-to-zero when cache is disabled or inactive — matches the pattern already used for callerAgentId.
  • No new config flag. MUX_ANTHROPIC_PROMPT_CACHE=false naturally makes the new code emit a plain usage chunk with no prompt_tokens_details, since Anthropic won't return cache fields when we don't send cache_control.

Test plan

  • npm test — 92/92 passing (5 new tests covering cache-on/off usage chunk, last-value-wins on message_delta, cacheReadTokens/cacheCreationTokens in StreamResult, log-event fields); 2 existing chunk-count assertions updated for the new trailing usage chunk
  • npm run check — clean
  • Deploy to prod, confirm cached_tokens > 0 on the trailing SSE chunk of agent-max turn 2+
  • AgentWeave: confirm prov.llm.cache_read_input_tokens span attribute appears on streamed anthropic spans

🤖 Generated with Claude Code

…onses

Extends the Anthropic streaming path to capture cache_read_input_tokens
and cache_creation_input_tokens from message_start (and defensively from
message_delta for models that emit mid-stream usage updates), and to
emit a trailing OpenAI-shaped usage chunk (choices: [], usage: {...})
so clients with stream_options.include_usage: true see cache hit rate
even on streaming calls.

Parity with toOpenAIResponse:
  - cache_read + cache_creation rolled into prompt_tokens
  - cache_read surfaced as prompt_tokens_details.cached_tokens (only when
    cache fields are present)

Also plumbs the buckets through StreamResult into the Anthropic SDK
adapter, which sets two new OTel span attrs (conditionally, when >0):
  - prov.llm.cache_read_input_tokens
  - prov.llm.cache_creation_input_tokens
The same attrs are now also set on the non-streaming path for
AgentWeave parity across streaming and non-streaming sessions.

Closes #52.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@arniesaha arniesaha merged commit 5f5ef7f into master Apr 23, 2026
1 check passed
@arniesaha arniesaha deleted the feat/stream-cache-usage branch April 23, 2026 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Streaming path drops Anthropic cache_control usage stats

1 participant