Skip to content

Add Anthropic prompt caching breakpoints#1017

Merged
smolpaws merged 5 commits into
developfrom
codex/anthropic-prompt-caching
May 11, 2026
Merged

Add Anthropic prompt caching breakpoints#1017
smolpaws merged 5 commits into
developfrom
codex/anthropic-prompt-caching

Conversation

@smolpaws
Copy link
Copy Markdown
Collaborator

@smolpaws smolpaws commented May 11, 2026

Summary

  • split cacheable and dynamic system prompt content in the TypeScript agent-sdk runtime
  • apply Anthropic prompt caching breakpoints for native Anthropic and LiteLLM Claude request payloads
  • add focused tests for prompt splitting, provider support, and cache marker placement
  • add a reusable live smoke runner for validating Anthropic prompt caching against a local profile

Verification

  • npm test -w @smolpaws/agent-sdk
  • npm run build -w @smolpaws/agent-sdk
  • npm run typecheck
  • npm run lint
  • npm test
  • npm run typecheck
  • npm run lint
  • node scripts/anthropic-cache-smoke.mjs

Review

  • Agent Mail review was requested from PinkStone; no additional blocking findings came back before merge.
  • CodeRabbit and Gemini both focused on cache marker placement. The native Anthropic path was corrected to put cache_control on the tool_result content block, while the LiteLLM/OpenAI-compatible tool-message envelope behavior was intentionally kept for Python parity and the LiteLLM tool-result quirk.
  • Devin raised prompt-ordering and allowlist coverage questions. We kept the prompt split ordering intentionally and expanded the Claude 3 alias allowlist where it was worth doing.
  • Final live smoke validation used the local opus-46 profile and showed cache markers on outgoing requests plus cache-read tokens on all five turns of a single conversation, including turns with terminal and file_editor tool use.

Open in Devin Review

Summary by CodeRabbit

  • New Features

    • Prompt caching enabled for Anthropic and OpenAI-compatible LLMs to reduce latency and token usage.
    • System prompts are split into cacheable and dynamic parts so stable context is preserved while dynamic context can change.
  • Tests

    • Added tests covering prompt-caching behavior, cache-marker placement, and system-prompt composition and stability across providers.

Review Change Stack

Split static and dynamic system prompt content so Anthropic-compatible providers can cache only the stable prefix, matching the Python agent-sdk behavior. Mark the last user/tool turn for cache extension and cover the native Anthropic and LiteLLM Claude request shapes with focused tests.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 11, 2026

📝 Walkthrough

Walkthrough

Splits system prompts into cacheable and dynamic segments, detects Anthropic models that support prompt caching, emits ephemeral cache_control in Anthropic and OpenAI-compatible request payloads, updates Agent/condensation plumbing to carry split prompts, and adds tests validating marker placement and prompt stability.

Changes

Prompt Caching Implementation

Layer / File(s) Summary
Type Contract for Cacheable Prompts
packages/agent-sdk/src/sdk/llm/types.ts
ChatCompletionRequest gains optional cacheableSystemPrompt and dynamicSystemPrompt fields to carry split prompt segments through the LLM pipeline.
Provider Cache Support Detection
packages/agent-sdk/src/sdk/llm/providerQuirks.ts
Adds PROMPT_CACHE_MODELS list and exports supportsPromptCaching(config) which returns true for Anthropic cache-capable models.
Anthropic Provider Cache Integration
packages/agent-sdk/src/sdk/llm/anthropic.ts
Introduces EPHEMERAL_CACHE_CONTROL, extends content/message types with optional cache_control, updates toAnthropicMessages to mark the last eligible user/tool content block when requested, and adjusts requestBody to emit a cached system segment plus optional dynamic segment.
OpenAI-Compatible Provider Cache Integration
packages/agent-sdk/src/sdk/llm/openai-compatible.ts
Adds EPHEMERAL_CACHE_CONTROL and cache_control to content/message shapes, extends toOpenAIMessage with per-message cachePrompt to mark final user/tool blocks (ensuring text blocks for images), and rewrites request-body generation to include cached system content and per-message cache flags.
Agent System Prompt Refactoring
packages/agent-sdk/src/sdk/runtime/Agent.ts
Splits system-prompt generation into buildCacheableSystemPrompt() (stable base with tools) and buildDynamicSystemPrompt() (context-specific suffix), recomposes for non-condensation uses, and passes both into condensation requests.
Condensation Function Integration
packages/agent-sdk/src/sdk/runtime/condensation.ts
buildChatRequestWithCondensation accepts dynamicSystemPrompt, composes the dynamic portion with the conversation summary block, and returns cacheableSystemPrompt and optional dynamicSystemPrompt alongside the final systemPrompt.
Provider Support Detection Tests
packages/agent-sdk/src/sdk/llm/__tests__/providerQuirks.test.ts
Adds supportsPromptCaching test cases for allowed Anthropic models, LiteLLM Anthropic routing, excluded Anthropic variants, and non-Anthropic models.
End-to-End Provider Cache Tests
packages/agent-sdk/src/sdk/llm/__tests__/promptCaching.test.ts
New Vitest suite mocking fetch/streaming that asserts cache_control: { type: 'ephemeral' } placements: static system blocks and final user/tool messages for Anthropic, and cached system content plus final tool/user message marking for OpenAI-compatible requests.
Tests Adjustment
packages/agent-sdk/src/sdk/llm/__tests__/thinkingBlocks.test.ts
Loosens an Anthropic tool-result assertion from toEqual to toMatchObject.
Agent and Condensation Tests
packages/agent-sdk/src/sdk/runtime/__tests__/Agent.system-prompt.test.ts, packages/agent-sdk/src/sdk/runtime/__tests__/condensation.test.ts
Agent test verifies cacheableSystemPrompt stability across runs while dynamicSystemPrompt changes; condensation test verifies separated cacheable/dynamic segments and exact newline/summary formatting.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • enyst/OpenHands-Tab#706: Modifies condensation helpers and Agent usage related to splitting cacheable vs dynamic system prompts.
  • enyst/OpenHands-Tab#986: Also touches Agent system-prompt construction and LLM-context extraction, overlapping system prompt responsibilities.
  • enyst/OpenHands-Tab#584: Modifies LLM provider message builders (toAnthropicMessages / toOpenAIMessage) and related message formats.

Suggested labels

codex

"I hopped through prompts both old and new,
Static stays cozy, dynamic hops through.
Ephemeral crumbs tucked safe in a trail,
Anthropic and OpenAI follow the tale.
🐇✨ Cache kept neat, while context prevails."

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding prompt caching breakpoints for Anthropic LLM provider support.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/anthropic-prompt-caching

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@smolpaws
Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 11, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@smolpaws
Copy link
Copy Markdown
Collaborator Author

/gemini review

coderabbitai[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

gemini-code-assist[bot]

This comment was marked as resolved.

gemini-code-assist[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
packages/agent-sdk/src/sdk/runtime/Agent.ts (3)

1021-1049: 💤 Low value

Optional: Consider clarifying JSDoc terminology.

The JSDoc mentions "registered tool summaries" but the code actually uses tool definitions (name + description from getToolDefinitions()). Consider rewording to "tool definitions" for clarity, though this is a minor nitpick.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/agent-sdk/src/sdk/runtime/Agent.ts` around lines 1021 - 1049, Update
the JSDoc on buildCacheableSystemPrompt to replace the phrase "registered tool
summaries" with clearer wording such as "tool definitions (name + description)";
locate the comment above the private buildCacheableSystemPrompt() method and
change the description to reference getToolDefinitions() semantics so the doc
matches the implementation that extracts tool.function.name and
tool.function.description.

1051-1066: ⚡ Quick win

Consider adding JSDoc for consistency.

Since buildCacheableSystemPrompt() has comprehensive JSDoc explaining the caching strategy, adding similar documentation to buildDynamicSystemPrompt() would improve consistency and help developers understand the split. For example, explain that this returns runtime-mutated context (editor state, secrets, etc.) that should not be cached.

📝 Example JSDoc
+  /**
+   * Builds the dynamic system-prompt suffix that changes at runtime.
+   *
+   * This suffix contains runtime-mutated context such as the current editor state,
+   * available secrets, and LLM provider details. It should NOT be cached by
+   * Anthropic prompt caching, as it varies across runs.
+   *
+   * `@returns` The dynamic suffix, or null if no agentContext is configured.
+   */
   private buildDynamicSystemPrompt(): string | null {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/agent-sdk/src/sdk/runtime/Agent.ts` around lines 1051 - 1066, Add a
JSDoc block above the buildDynamicSystemPrompt method (matching the style used
for buildCacheableSystemPrompt) that explains this function returns the
runtime-mutated system prompt pieces (editor state, secrets, runtime context)
and therefore must not be cached; mention the inputs used (agentContext,
secrets.getRegisteredNames(), resolved llmModel/llmProvider/llmBaseUrl via
resolveSystemPromptLlmContext) and the intended usage so readers understand why
caching is split between buildCacheableSystemPrompt and
buildDynamicSystemPrompt.

1068-1073: ⚡ Quick win

Consider adding JSDoc for clarity.

This method recombines the cacheable and dynamic prompt parts for use in system prompt events. Adding JSDoc would clarify its role in the caching strategy and improve code documentation consistency.

📝 Example JSDoc
+  /**
+   * Recombines cacheable and dynamic system prompt parts into a single string.
+   *
+   * Used by ensureSystemPrompt() to emit the full system prompt event.
+   * The split parts are sent separately to LLM clients that support prompt caching.
+   */
   private buildSystemPrompt(): string {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/agent-sdk/src/sdk/runtime/Agent.ts` around lines 1068 - 1073, The
buildSystemPrompt method lacks documentation explaining its purpose and relation
to caching; add a concise JSDoc comment above the buildSystemPrompt method
describing that it recombines cacheable and dynamic prompt parts (via
buildCacheableSystemPrompt and buildDynamicSystemPrompt), explains the returned
string format (joined with double newlines), and notes its role in system prompt
events and caching strategy so callers understand when caching applies versus
when dynamic content is included.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@packages/agent-sdk/src/sdk/runtime/Agent.ts`:
- Around line 1021-1049: Update the JSDoc on buildCacheableSystemPrompt to
replace the phrase "registered tool summaries" with clearer wording such as
"tool definitions (name + description)"; locate the comment above the private
buildCacheableSystemPrompt() method and change the description to reference
getToolDefinitions() semantics so the doc matches the implementation that
extracts tool.function.name and tool.function.description.
- Around line 1051-1066: Add a JSDoc block above the buildDynamicSystemPrompt
method (matching the style used for buildCacheableSystemPrompt) that explains
this function returns the runtime-mutated system prompt pieces (editor state,
secrets, runtime context) and therefore must not be cached; mention the inputs
used (agentContext, secrets.getRegisteredNames(), resolved
llmModel/llmProvider/llmBaseUrl via resolveSystemPromptLlmContext) and the
intended usage so readers understand why caching is split between
buildCacheableSystemPrompt and buildDynamicSystemPrompt.
- Around line 1068-1073: The buildSystemPrompt method lacks documentation
explaining its purpose and relation to caching; add a concise JSDoc comment
above the buildSystemPrompt method describing that it recombines cacheable and
dynamic prompt parts (via buildCacheableSystemPrompt and
buildDynamicSystemPrompt), explains the returned string format (joined with
double newlines), and notes its role in system prompt events and caching
strategy so callers understand when caching applies versus when dynamic content
is included.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c11df5ac-36e6-4ca5-8762-6ee769f1abd7

📥 Commits

Reviewing files that changed from the base of the PR and between a841c0f and 139fcf6.

📒 Files selected for processing (7)
  • packages/agent-sdk/src/sdk/llm/__tests__/promptCaching.test.ts
  • packages/agent-sdk/src/sdk/llm/__tests__/thinkingBlocks.test.ts
  • packages/agent-sdk/src/sdk/llm/anthropic.ts
  • packages/agent-sdk/src/sdk/llm/openai-compatible.ts
  • packages/agent-sdk/src/sdk/llm/providerQuirks.ts
  • packages/agent-sdk/src/sdk/runtime/Agent.ts
  • packages/agent-sdk/src/sdk/runtime/__tests__/Agent.system-prompt.test.ts
🚧 Files skipped from review as they are similar to previous changes (5)
  • packages/agent-sdk/src/sdk/runtime/tests/Agent.system-prompt.test.ts
  • packages/agent-sdk/src/sdk/llm/providerQuirks.ts
  • packages/agent-sdk/src/sdk/llm/openai-compatible.ts
  • packages/agent-sdk/src/sdk/llm/anthropic.ts
  • packages/agent-sdk/src/sdk/llm/tests/promptCaching.test.ts

@github-actions
Copy link
Copy Markdown

🔧 VSCode Extension Built Successfully

• File: openhands-tab-0.9.3.vsix (548 KB)
• Download: https://github.com/enyst/OpenHands-Tab/actions/runs/25686354917

To install:

  1. Download the artifact from the run page above
  2. VS Code → Command Palette → "Extensions: Install from VSIX..."
  3. Select the downloaded .vsix

Built with Node 22. Commit 63b0c15.

@smolpaws smolpaws merged commit 41d4f04 into develop May 11, 2026
9 checks passed
@smolpaws smolpaws deleted the codex/anthropic-prompt-caching branch May 11, 2026 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants