Conversation
…onses
Extends the Anthropic streaming path to capture cache_read_input_tokens
and cache_creation_input_tokens from message_start (and defensively from
message_delta for models that emit mid-stream usage updates), and to
emit a trailing OpenAI-shaped usage chunk (choices: [], usage: {...})
so clients with stream_options.include_usage: true see cache hit rate
even on streaming calls.
Parity with toOpenAIResponse:
- cache_read + cache_creation rolled into prompt_tokens
- cache_read surfaced as prompt_tokens_details.cached_tokens (only when
cache fields are present)
Also plumbs the buckets through StreamResult into the Anthropic SDK
adapter, which sets two new OTel span attrs (conditionally, when >0):
- prov.llm.cache_read_input_tokens
- prov.llm.cache_creation_input_tokens
The same attrs are now also set on the non-streaming path for
AgentWeave parity across streaming and non-streaming sessions.
Closes #52.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #52.
#49/#50 added Anthropic
cache_controlinjection and surfacedcached_tokenson the non-streaming response path. The streaming path (streamAnthropicToOpenAI) was still dropping the cache buckets entirely — AgentWeave and clients sawcached_tokens = 0even when hit rate was ~99%. This PR closes that observability gap.streamAnthropicToOpenAInow capturescache_read_input_tokensandcache_creation_input_tokensfrommessage_start.message.usage, and defensively frommessage_delta.usage(last value wins, matching the existingoutput_tokenshandling)include_usageshape:{ choices: [], usage: { prompt_tokens, completion_tokens, total_tokens, prompt_tokens_details? } }prompt_tokensrolls cache_read + cache_creation in, matchingtoOpenAIResponse— billable prompt size is now symmetric across streaming and non-streamingprompt_tokens_details.cached_tokensemitted only when Anthropic actually reports cache fields — plain requests stay cleanStreamResultwidened withcacheReadTokens/cacheCreationTokens; Anthropic adapter sets two new OTel span attrs (prov.llm.cache_read_input_tokens,prov.llm.cache_creation_input_tokens) on both streaming and non-streaming paths, only when >0Notes for reviewers
emitDownstreamResponseAsSseconvention: finish_reason chunk → usage chunk →[DONE]. The issue's pseudo-code had them flipped; I went with the codebase-consistent order since any OpenAI client written againstinclude_usagetolerates both....(x > 0 ? {...} : {})) so they're absent rather than set-to-zero when cache is disabled or inactive — matches the pattern already used forcallerAgentId.MUX_ANTHROPIC_PROMPT_CACHE=falsenaturally makes the new code emit a plain usage chunk with noprompt_tokens_details, since Anthropic won't return cache fields when we don't sendcache_control.Test plan
npm test— 92/92 passing (5 new tests covering cache-on/off usage chunk, last-value-wins onmessage_delta, cacheReadTokens/cacheCreationTokens in StreamResult, log-event fields); 2 existing chunk-count assertions updated for the new trailing usage chunknpm run check— cleancached_tokens> 0 on the trailing SSE chunk of agent-max turn 2+prov.llm.cache_read_input_tokensspan attribute appears on streamed anthropic spans🤖 Generated with Claude Code