Skip to content

feat(anthropic): prompt caching with cache_control: ephemeral#40

Merged
fede-kamel merged 1 commit into
mainfrom
feat/anthropic-prompt-caching
May 2, 2026
Merged

feat(anthropic): prompt caching with cache_control: ephemeral#40
fede-kamel merged 1 commit into
mainfrom
feat/anthropic-prompt-caching

Conversation

@fede-kamel
Copy link
Copy Markdown
Contributor

Closes #35.

Problem

Locus's `AnthropicModel` didn't expose Anthropic's prompt caching. Long system prompts and large tool catalogs paid full input-token cost on every turn — even when 4-5k tokens of stable content could be cached at ~1/10× cost for ~5 minutes.

Fix

Opt-in via `AnthropicModel(prompt_cache=True)`. Default `False` for backward compatibility.

When enabled:

  1. System prompt is sent as a block list with `cache_control: ephemeral`:
    ```python
    params["system"] = [
    {"type": "text", "text": system_prompt,
    "cache_control": {"type": "ephemeral"}}
    ]
    ```
  2. Tool catalog — the last tool entry carries the same `cache_control` marker (Anthropic walks markers in order; the last one anchors the cache point, covering everything before it).
  3. Cache token counts flow through:
    • `response.usage.cache_creation_input_tokens` / `cache_read_input_tokens` extracted in the Anthropic provider
    • Surfaced on `ModelResponse.usage`
    • Accumulated in `AgentState` via new `with_token_usage(..., cache_creation_tokens=, cache_read_tokens=)` kwargs
    • Surfaced on `ExecutionMetrics.cache_creation_input_tokens` and `.cache_read_input_tokens`

Backward compatibility

  • `prompt_cache=False` (default) preserves the bare-string `system` and untagged tools — zero behaviour change for existing callers.
  • Old Anthropic SDK responses without the cache fields work unchanged — the keys just don't appear in `usage`.
  • `AgentState.with_token_usage()` cache kwargs default to 0 — non-Anthropic providers see no change.

Tests

Seven new unit tests in `tests/unit/test_anthropic_prompt_caching.py` mock the Anthropic client and verify each layer of the wiring. All 3,212 pre-existing unit tests still pass; pre-commit clean; `mkdocs build --strict` clean.

How to use

```python
from locus import Agent
from locus.models.native.anthropic import AnthropicModel

agent = Agent(
model=AnthropicModel(
model="claude-sonnet-4-20250514",
prompt_cache=True, # turn it on
),
tools=[...],
system_prompt="",
)

result = agent.run_sync("...")
print(f"cache writes: {result.metrics.cache_creation_input_tokens}")
print(f"cache reads: {result.metrics.cache_read_input_tokens}")
```

Test plan

  • `hatch run test tests/unit/` — 3,212 pass + 1 skip (OCIOpenAIModel needs real config)
  • `pre-commit run --all-files` — clean
  • `hatch -e docs run mkdocs build --strict` — clean
  • DCO sign-off

@oracle-contributor-agreement oracle-contributor-agreement Bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label May 2, 2026
Closes #35.

When `AnthropicModel(prompt_cache=True)` is set, locus marks the
system prompt and the tool catalog with Anthropic's `cache_control:
ephemeral` so subsequent turns reuse the cached input at ~1/10x cost
(Anthropic ephemeral cache TTL ~5 min).

Default is `False` for backward compatibility — opt in per-model.

Implementation:
- New `prompt_cache: bool` field on `AnthropicConfig` and a matching
  constructor kwarg on `AnthropicModel`.
- When enabled, `params["system"]` becomes a block list:
    [{"type": "text", "text": <prompt>,
      "cache_control": {"type": "ephemeral"}}]
  When disabled, the bare-string form is preserved (no behaviour
  change for existing callers).
- The last entry in the tool catalog gets the same `cache_control`
  marker so the catalog is cached too (Anthropic walks the markers
  in order; the last one anchors the cache point).
- `cache_creation_input_tokens` / `cache_read_input_tokens` extracted
  from `response.usage` (when present) and surfaced on
  `ModelResponse.usage`. Old SDK responses without the fields work
  unchanged — the keys simply don't appear in usage.
- `AgentState.with_token_usage()` extended with optional
  `cache_creation_tokens` / `cache_read_tokens` kwargs (default 0).
  All three call sites in `agent.py` updated to thread the values
  through from the response.
- `ExecutionMetrics` gains `cache_creation_input_tokens` and
  `cache_read_input_tokens` fields (default 0) so cache hits/misses
  show up on `AgentResult.metrics`.

Seven unit tests in `tests/unit/test_anthropic_prompt_caching.py`
mock the Anthropic client and verify:
- system prompt block-list shape with cache_control when enabled
- bare-string preservation when disabled (back-compat)
- cache_control on last tool entry only (catalog tail-marker)
- cache token extraction from response.usage
- old SDK response shape (no cache fields) doesn't break
- ExecutionMetrics carries the new fields
- AgentState.with_token_usage accumulates cache counts

All 3,212 unit tests pass; pre-commit clean; mkdocs --strict clean.

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
@fede-kamel fede-kamel force-pushed the feat/anthropic-prompt-caching branch from 63269a9 to cb41fd3 Compare May 2, 2026 14:01
@fede-kamel fede-kamel merged commit 04fc6c3 into main May 2, 2026
10 checks passed
@fede-kamel fede-kamel deleted the feat/anthropic-prompt-caching branch May 2, 2026 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

OCA Verified All contributors have signed the Oracle Contributor Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Anthropic prompt caching support

1 participant