feat(anthropic): prompt caching with cache_control: ephemeral by fede-kamel · Pull Request #40 · oracle-samples/locus

fede-kamel · 2026-05-02T05:17:07Z

Closes #35.

Problem

Locus's `AnthropicModel` didn't expose Anthropic's prompt caching. Long system prompts and large tool catalogs paid full input-token cost on every turn — even when 4-5k tokens of stable content could be cached at ~1/10× cost for ~5 minutes.

Fix

Opt-in via `AnthropicModel(prompt_cache=True)`. Default `False` for backward compatibility.

When enabled:

System prompt is sent as a block list with `cache_control: ephemeral`:
```python
params["system"] = [
{"type": "text", "text": system_prompt,
"cache_control": {"type": "ephemeral"}}
]
```
Tool catalog — the last tool entry carries the same `cache_control` marker (Anthropic walks markers in order; the last one anchors the cache point, covering everything before it).
Cache token counts flow through:
- `response.usage.cache_creation_input_tokens` / `cache_read_input_tokens` extracted in the Anthropic provider
- Surfaced on `ModelResponse.usage`
- Accumulated in `AgentState` via new `with_token_usage(..., cache_creation_tokens=, cache_read_tokens=)` kwargs
- Surfaced on `ExecutionMetrics.cache_creation_input_tokens` and `.cache_read_input_tokens`

Backward compatibility

`prompt_cache=False` (default) preserves the bare-string `system` and untagged tools — zero behaviour change for existing callers.
Old Anthropic SDK responses without the cache fields work unchanged — the keys just don't appear in `usage`.
`AgentState.with_token_usage()` cache kwargs default to 0 — non-Anthropic providers see no change.

Tests

Seven new unit tests in `tests/unit/test_anthropic_prompt_caching.py` mock the Anthropic client and verify each layer of the wiring. All 3,212 pre-existing unit tests still pass; pre-commit clean; `mkdocs build --strict` clean.

How to use

```python
from locus import Agent
from locus.models.native.anthropic import AnthropicModel

agent = Agent(
model=AnthropicModel(
model="claude-sonnet-4-20250514",
prompt_cache=True, # turn it on
),
tools=[...],
system_prompt="",
)

result = agent.run_sync("...")
print(f"cache writes: {result.metrics.cache_creation_input_tokens}")
print(f"cache reads: {result.metrics.cache_read_input_tokens}")
```

Test plan

`hatch run test tests/unit/` — 3,212 pass + 1 skip (OCIOpenAIModel needs real config)
`pre-commit run --all-files` — clean
`hatch -e docs run mkdocs build --strict` — clean
DCO sign-off

Closes #35. When `AnthropicModel(prompt_cache=True)` is set, locus marks the system prompt and the tool catalog with Anthropic's `cache_control: ephemeral` so subsequent turns reuse the cached input at ~1/10x cost (Anthropic ephemeral cache TTL ~5 min). Default is `False` for backward compatibility — opt in per-model. Implementation: - New `prompt_cache: bool` field on `AnthropicConfig` and a matching constructor kwarg on `AnthropicModel`. - When enabled, `params["system"]` becomes a block list: [{"type": "text", "text": <prompt>, "cache_control": {"type": "ephemeral"}}] When disabled, the bare-string form is preserved (no behaviour change for existing callers). - The last entry in the tool catalog gets the same `cache_control` marker so the catalog is cached too (Anthropic walks the markers in order; the last one anchors the cache point). - `cache_creation_input_tokens` / `cache_read_input_tokens` extracted from `response.usage` (when present) and surfaced on `ModelResponse.usage`. Old SDK responses without the fields work unchanged — the keys simply don't appear in usage. - `AgentState.with_token_usage()` extended with optional `cache_creation_tokens` / `cache_read_tokens` kwargs (default 0). All three call sites in `agent.py` updated to thread the values through from the response. - `ExecutionMetrics` gains `cache_creation_input_tokens` and `cache_read_input_tokens` fields (default 0) so cache hits/misses show up on `AgentResult.metrics`. Seven unit tests in `tests/unit/test_anthropic_prompt_caching.py` mock the Anthropic client and verify: - system prompt block-list shape with cache_control when enabled - bare-string preservation when disabled (back-compat) - cache_control on last tool entry only (catalog tail-marker) - cache token extraction from response.usage - old SDK response shape (no cache fields) doesn't break - ExecutionMetrics carries the new fields - AgentState.with_token_usage accumulates cache counts All 3,212 unit tests pass; pre-commit clean; mkdocs --strict clean. Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

oracle-contributor-agreement Bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label May 2, 2026

fede-kamel force-pushed the feat/anthropic-prompt-caching branch from 63269a9 to cb41fd3 Compare May 2, 2026 14:01

fede-kamel merged commit 04fc6c3 into main May 2, 2026
10 checks passed

fede-kamel deleted the feat/anthropic-prompt-caching branch May 2, 2026 14:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(anthropic): prompt caching with cache_control: ephemeral#40

feat(anthropic): prompt caching with cache_control: ephemeral#40
fede-kamel merged 1 commit into
mainfrom
feat/anthropic-prompt-caching

fede-kamel commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fede-kamel commented May 2, 2026

Problem

Fix

Backward compatibility

Tests

How to use

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant