Skip to content

fix anthropic PTC#53

Merged
j4ys0n merged 1 commit into
mainfrom
ptc
Mar 13, 2026
Merged

fix anthropic PTC#53
j4ys0n merged 1 commit into
mainfrom
ptc

Conversation

@j4ys0n
Copy link
Copy Markdown
Contributor

@j4ys0n j4ys0n commented Mar 13, 2026

No description provided.

Copilot AI review requested due to automatic review settings March 13, 2026 02:40
@j4ys0n
Copy link
Copy Markdown
Contributor Author

j4ys0n commented Mar 13, 2026

Automated review 🤖

Summary of Changes
This PR fixes Anthropic programmatic tool calling (PTC) behavior by adjusting how code_execution_tool_result blocks are handled during message replay and result mapping. Key changes include: (1) introducing CodeExecutionResultInfo as a shared type for code execution diagnostics, (2) filtering out code_execution_tool_result blocks from assistant message replay when resuming pending programmatic tool calls, and (3) surfacing codeExecutionResults in both streaming and non-streaming GenerateResult. Version bumped to 1.11.1, and a new debug example (examples/anthropic-ptc-debug.ts) is added to validate PTC flows.

Key Changes & Positives

  • Introduces CodeExecutionResultInfo type in src/types/common.types.ts to unify code execution diagnostics across providers 🟢
  • Adds filterAnthropicReplayContentBlocksForProgrammaticToolResultResume to exclude code_execution_tool_result blocks during PTC resume, preventing duplicate execution attempts in src/core/mapping/anthropic.mapper.ts (lines 182–197) 🟢
  • Surfaces codeExecutionResults in GenerateResult (non-streaming and streaming) with proper aggregation in mapFromProviderResponse and mapProviderStream (e.g., src/core/mapping/anthropic.mapper.ts, lines 681, 837–840) 🟢

Potential Issues & Recommendations

  1. Issue / Risk: The isProgrammaticToolResultResume method checks only the immediate next message (messages[assistantIndex + 1]) for role 'tool', which may miss cases where tool messages are interleaved or delayed.
    Impact: Could incorrectly include/exclude code_execution_tool_result blocks, leading to malformed replay or missed tool results.
    Recommendation: Add unit tests covering multi-tool-call sequences and verify behavior when tool messages appear after multiple assistant turns.
    Status: 🟡 Needs review

  2. Issue / Risk: In mapCodeExecutionResultData, the fallback case returns returnCode: 1 and errorCode without populating stdout/stderr, but the original implementation defaulted to empty strings for all fields—this may alter downstream error handling expectations.
    Impact: Consumers expecting empty stdout/stderr on error may need updates.
    Recommendation: Document error-case semantics explicitly in CodeExecutionResultInfo JSDoc and ensure tests cover all three content types (code_execution_result, encrypted_code_execution_result, error fallback).
    Status: 🟡 Needs review

Language/Framework Checks

  • CodeExecutionResultInfo correctly imported into result.types.ts and stream.types.ts (lines 2, 3 in each)
  • Zod schema usage preserved in examples/anthropic-ptc-debug.ts (z.object({})) for tool parameters
  • TypeScript strictness maintained: CodeExecutionResultInfo fields are properly typed as optional (encryptedStdout?, errorCode?, contentFileIds?)

Security & Privacy

  • No sensitive data exposure introduced; debug example uses mock data and writes artifacts to configurable outputRoot (examples/anthropic-ptc-debug.ts, lines 56–59)
  • cloneJson utility used consistently for deep copying messages and artifacts, avoiding accidental mutation

Build/CI & Ops

  • New example script added to package.json scripts (example:anthropic-ptc-debug)—ensure CI runs it or equivalent smoke test
  • Version bump to 1.11.1 requires release notes highlighting PTC fix and codeExecutionResults field

Tests

  • Unit tests added for code_execution_tool_result exclusion during replay (tests/unit/core/mapping/anthropic.mapper.spec.ts, lines 268–342)
  • Tests added for non-streaming and streaming codeExecutionResults surfaced in results (lines 803–820, 1234–1275)
  • Coverage need: Add integration test verifying end-to-end PTC flow with code execution (e.g., getCoreConfiglistProviders → summary) using the debug harness pattern

@j4ys0n j4ys0n merged commit 8c61c3f into main Mar 13, 2026
3 checks passed
@j4ys0n j4ys0n deleted the ptc branch March 13, 2026 02:41
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses Anthropic programmatic tool calling (PTC) behavior around code execution blocks, ensuring replay/resume flows don’t resend code_execution_tool_result blocks while also surfacing code execution diagnostics consistently in both streaming and non-streaming results.

Changes:

  • Filter code_execution_tool_result blocks out of assistant replay content when resuming programmatic tool results.
  • Surface code execution diagnostics via a shared CodeExecutionResultInfo type on both GenerateResult and stream chunks.
  • Add a debug example harness for Anthropic PTC and bump package version to 1.11.1.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/unit/core/mapping/anthropic.mapper.spec.ts Adds coverage for filtering replay blocks and for surfacing code execution diagnostics in streaming/non-streaming mappings.
src/types/stream.types.ts Reuses the shared CodeExecutionResultInfo shape for code_execution_result stream chunks.
src/types/result.types.ts Adds codeExecutionResults?: CodeExecutionResultInfo[] to GenerateResult.
src/types/common.types.ts Introduces CodeExecutionResultInfo as the provider-agnostic diagnostics payload.
src/core/mapping/anthropic.mapper.ts Implements replay filtering logic and collects code execution diagnostics into results/chunks.
package.json Bumps version and adds an example script entry.
examples/anthropic-ptc-debug.ts Adds a runnable debug harness for investigating Anthropic PTC streaming behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread src/types/stream.types.ts
Comment on lines +2 to 5
import { Citation, ProviderKey, CodeExecutionResultInfo } from './common.types' // Import ProviderKey
// Remove unused Provider import
// import { Provider } from './common.types' // Removed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants