feat(anthropic): prompt caching with cache_control: ephemeral#40
Merged
Conversation
Closes #35. When `AnthropicModel(prompt_cache=True)` is set, locus marks the system prompt and the tool catalog with Anthropic's `cache_control: ephemeral` so subsequent turns reuse the cached input at ~1/10x cost (Anthropic ephemeral cache TTL ~5 min). Default is `False` for backward compatibility — opt in per-model. Implementation: - New `prompt_cache: bool` field on `AnthropicConfig` and a matching constructor kwarg on `AnthropicModel`. - When enabled, `params["system"]` becomes a block list: [{"type": "text", "text": <prompt>, "cache_control": {"type": "ephemeral"}}] When disabled, the bare-string form is preserved (no behaviour change for existing callers). - The last entry in the tool catalog gets the same `cache_control` marker so the catalog is cached too (Anthropic walks the markers in order; the last one anchors the cache point). - `cache_creation_input_tokens` / `cache_read_input_tokens` extracted from `response.usage` (when present) and surfaced on `ModelResponse.usage`. Old SDK responses without the fields work unchanged — the keys simply don't appear in usage. - `AgentState.with_token_usage()` extended with optional `cache_creation_tokens` / `cache_read_tokens` kwargs (default 0). All three call sites in `agent.py` updated to thread the values through from the response. - `ExecutionMetrics` gains `cache_creation_input_tokens` and `cache_read_input_tokens` fields (default 0) so cache hits/misses show up on `AgentResult.metrics`. Seven unit tests in `tests/unit/test_anthropic_prompt_caching.py` mock the Anthropic client and verify: - system prompt block-list shape with cache_control when enabled - bare-string preservation when disabled (back-compat) - cache_control on last tool entry only (catalog tail-marker) - cache token extraction from response.usage - old SDK response shape (no cache fields) doesn't break - ExecutionMetrics carries the new fields - AgentState.with_token_usage accumulates cache counts All 3,212 unit tests pass; pre-commit clean; mkdocs --strict clean. Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
63269a9 to
cb41fd3
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #35.
Problem
Locus's `AnthropicModel` didn't expose Anthropic's prompt caching. Long system prompts and large tool catalogs paid full input-token cost on every turn — even when 4-5k tokens of stable content could be cached at ~1/10× cost for ~5 minutes.
Fix
Opt-in via `AnthropicModel(prompt_cache=True)`. Default `False` for backward compatibility.
When enabled:
```python
params["system"] = [
{"type": "text", "text": system_prompt,
"cache_control": {"type": "ephemeral"}}
]
```
Backward compatibility
Tests
Seven new unit tests in `tests/unit/test_anthropic_prompt_caching.py` mock the Anthropic client and verify each layer of the wiring. All 3,212 pre-existing unit tests still pass; pre-commit clean; `mkdocs build --strict` clean.
How to use
```python
from locus import Agent
from locus.models.native.anthropic import AnthropicModel
agent = Agent(
model=AnthropicModel(
model="claude-sonnet-4-20250514",
prompt_cache=True, # turn it on
),
tools=[...],
system_prompt="",
)
result = agent.run_sync("...")
print(f"cache writes: {result.metrics.cache_creation_input_tokens}")
print(f"cache reads: {result.metrics.cache_read_input_tokens}")
```
Test plan