[WIP]: Tool hardening fixes #149

jherr · 2025-12-15T05:31:45Z

🎯 Changes

Fixing tool call issues and adding a lot of tests to make sure they are 100% stable and reliable.

✅ Checklist

I have followed the steps in the Contributing guide.
I have tested this code locally with pnpm run test:pr.

🚀 Release Impact

This change affects published code, and I have generated a changeset.
This change is docs/CI/dev-only (no release).

Summary by CodeRabbit

Release Notes

New Features
- Added support for tool execution approvals, allowing tools to require user consent before running.
- Enhanced tool orchestration to properly handle sequential and parallel tool execution flows.
- Improved async handling for client and server tool executions with better state management.
Tests
- Added comprehensive test suites covering approval flows, client tools, server tools, and tool sequencing scenarios.
- Introduced a test simulator adapter for deterministic tool execution testing.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-15T05:31:55Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Introduces an approval-based tool execution workflow with async state management. Adds a queuing system for post-stream actions, tracks in-flight tool executions, converts onToolCall callback to synchronous, and defers continuation logic. Includes comprehensive LLM simulator testing infrastructure and end-to-end test suites for tool execution flows.

Changes

Cohort / File(s)	Summary
Core Chat Client Orchestration `packages/typescript/ai-client/src/chat-client.ts`	Introduced queuing system (`postStreamActions`), in-flight tool execution tracking (`pendingToolExecutions`), and continuation deferral (`continuationPending`). Changed `onToolCall` signature from async to synchronous with internal promise orchestration. Added `drainPostStreamActions`, `checkForContinuation`, and `queuePostStreamAction` helpers. Modified stream lifecycle to wait for pending executions before finalization and drain queued actions in finally blocks.
Type System Extensions `packages/typescript/ai/src/types.ts`	Added optional `approval?: { id: string; approved: boolean }` field to `ToolCall` public API to carry user-approval metadata.
Message Conversion & State Collection `packages/typescript/ai/src/message-converters.ts`, `packages/typescript/ai/src/core/chat.ts`	Enhanced `collectClientState` to inspect both UIMessage (parts array) and ModelMessage (toolCalls array) formats for approval data. Updated message converters to conditionally include approval data in tool-call entries when approval state is 'approval-responded'.
LLM Simulator Adapter `packages/typescript/smoke-tests/adapters/src/llm-simulator.ts`	New comprehensive simulator for deterministic tool-call testing. Exports `LLMSimulatorAdapter` class, `SimulatorScript`/`SimulatorToolCall`/`SimulatorIteration` interfaces, `createLLMSimulator` factory, and `SimulatorScripts` utility with preset patterns (single/multi-tool, approvals, sequential/parallel). Supports stateless and stateful iteration selection with ID generation.
Adapter Package Configuration `packages/typescript/smoke-tests/adapters/package.json`, `packages/typescript/smoke-tests/adapters/vitest.config.ts`	Added exports field for `./src/llm-simulator.ts`, test scripts (`test`, `test:watch`), and `vitest` dev dependency. Created Vitest configuration with path alias (`@tanstack/ai`), globals, Node environment, coverage settings, and exclusion patterns.
Adapter Test Suites `packages/typescript/smoke-tests/adapters/src/tests/tools/*.test.ts`	Added six test modules: `server-tool.test.ts`, `client-tool.test.ts`, `approval.test.ts`, `multi-tool.test.ts`, `sequences.test.ts`, `error-handling.test.ts`. Cover server/client tool execution, approvals, parallel/sequential flows, mixed tool types, error propagation, and iteration tracking via simulator.
E2E API & Page Route `packages/typescript/smoke-tests/e2e/src/routes/api.tools-test.ts`, `packages/typescript/smoke-tests/e2e/src/routes/tools-test.tsx`	Added `/api/tools-test` handler supporting multiple predefined scenarios (text-only, single/multi-tool, approval, sequential/parallel). Defined server tools (get_weather, fetch_data, get_time, delete_file with approval) and client tools (show_notification, display_chart). Added React component `ToolsTestPage` with scenario selector, event logging, approvals UI, tool call tracking, and integration with `useChat`.
E2E Dependencies `packages/typescript/smoke-tests/e2e/package.json`	Added `@tanstack/tests-adapters` (workspace:*) and `zod` (^4.1.13) dependencies.
E2E Playwright Tests `packages/typescript/smoke-tests/e2e/tests/tools/*.spec.ts`	Added four test suites: `approval-flow.spec.ts`, `client-tool.spec.ts`, `race-conditions.spec.ts`, `server-client-sequence.spec.ts`. Cover approval workflows (single, sequential, parallel), client tool execution, mixed server-client sequences, race conditions, and stress tests. Each includes helpers for scenario selection, completion detection, metadata extraction, and event log parsing.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client/App
    participant CC as ChatClient
    participant Queue as Action Queue
    participant Tool as Tool Executor
    participant Stream as Stream Lifecycle

    Client->>CC: streamResponse(messages)
    CC->>Stream: Initialize stream
    Stream->>Stream: Reset pendingToolExecutions
    
    loop During Stream
        Stream->>Tool: onToolCall (sync)
        Tool->>Tool: Create executionPromise
        Tool->>Queue: Store in pendingToolExecutions
        Tool->>Tool: Execute async (detached)
        Tool-->>Queue: Remove from pendingToolExecutions on completion
    end
    
    Stream->>Stream: Stream completes
    Stream->>Stream: Wait for all pendingToolExecutions
    Stream->>Stream: Check streamCompletedSuccessfully
    
    alt Auto-send criteria met
        Stream->>CC: checkForContinuation()
        CC->>Queue: queuePostStreamAction(continuation)
    end
    
    Stream->>Queue: drainPostStreamActions()
    Queue->>Queue: Execute all queued actions
    Queue-->>Client: Finally block runs
    
    Note over CC,Queue: Tool result added while loading
    Client->>CC: addToolResult(result)
    alt Stream in progress
        CC->>Queue: queuePostStreamAction(continuation)
    else No active stream
        CC->>CC: Trigger continuation directly
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

packages/typescript/ai-client/src/chat-client.ts: Dense async orchestration logic with state management across multiple new data structures (postStreamActions, pendingToolExecutions, continuationPending), control flow changes to onToolCall, and integration into stream lifecycle. Requires careful analysis of timing, race conditions, and state transitions.
Type system changes (types.ts, message-converters.ts, chat.ts): Moderate complexity; requires tracing approval field propagation across message formats.
LLM Simulator (llm-simulator.ts): Large new public API surface (~15 exports); comprehensive implementation supporting multiple iteration patterns and tool types. Needs validation of contract consistency across all preset scripts.
Adapter & E2E tests: High volume (~6 test modules + 4 E2E specs) with heterogeneous scenarios (server tools, client tools, approvals, race conditions, sequencing). Each test suite has distinct logic and assertions that demand individual scrutiny.
Interdependencies: Changes span chat-client → types → message converters → simulator usage in tests; tracing data flow through all layers is essential.

Areas requiring extra attention:

Async orchestration guards and edge cases in chat-client.ts (e.g., concurrent streams, pending execution cleanup, continuation re-entrance)
Approval state propagation through collectClientState and message-converters under various message formats
LLM simulator contract compliance with all SimulatorScripts presets
E2E test reliability under race conditions and timing edge cases

Poem

🐰 Queues and tools dance in the sun,
Approvals bloom when work is done,
We hitch the stream with post-stream cheer,
Orchestration whispers, loud and clear,
Async hops through fields so green,
The finest tool-flow ever seen! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title uses '[WIP]' prefix and is vague ('Tool hardening fixes') without clearly conveying the main changes or their scope.	Replace '[WIP]' with a clear, specific title describing the primary change, such as 'Add tool execution queuing and approval support' or similar.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description check	✅ Passed	The PR description addresses the template structure but lacks specific details about what 'tool call issues' were fixed and what improvements the added tests validate.
Docstring Coverage	✅ Passed	Docstring coverage is 96.77% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

nx-cloud · 2025-12-15T05:33:22Z

View your CI Pipeline Execution ↗ for commit 20511e6

Command	Status	Duration	Result
`nx affected --targets=test:sherif,test:knip,tes...`	✅ Succeeded	7s	View ↗
`nx run-many --targets=build --exclude=examples/**`	✅ Succeeded	2s	View ↗

☁️ Nx Cloud last updated this comment at 2025-12-16 17:07:44 UTC

pkg-pr-new · 2025-12-15T05:33:35Z

Open in StackBlitz

@tanstack/ai

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai@149

@tanstack/ai-anthropic

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-anthropic@149

@tanstack/ai-client

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-client@149

@tanstack/ai-devtools-core

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-devtools-core@149

@tanstack/ai-gemini

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-gemini@149

@tanstack/ai-ollama

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-ollama@149

@tanstack/ai-openai

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-openai@149

@tanstack/ai-react

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-react@149

@tanstack/ai-react-ui

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-react-ui@149

@tanstack/ai-solid

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-solid@149

@tanstack/ai-solid-ui

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-solid-ui@149

@tanstack/ai-svelte

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-svelte@149

@tanstack/ai-vue

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-vue@149

@tanstack/ai-vue-ui

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-vue-ui@149

@tanstack/react-ai-devtools

npm i https://pkg.pr.new/TanStack/ai/@tanstack/react-ai-devtools@149

@tanstack/solid-ai-devtools

npm i https://pkg.pr.new/TanStack/ai/@tanstack/solid-ai-devtools@149

commit: 3752db0

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (2)

packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts (1)

7-16: Duplicate collectChunks helper - see earlier comment.

This is the same helper duplicated from error-handling.test.ts. Consider extracting to shared test utilities.

packages/typescript/smoke-tests/adapters/src/tests/tools/client-tool.test.ts (1)

7-16: Duplicate collectChunks helper - see earlier comment.

Third instance of this helper. Extracting to a shared module would reduce maintenance burden.

🧹 Nitpick comments (16)

packages/typescript/ai-client/src/chat-client.ts (1)
134-169: Consider defensive error message extraction.

The async execution tracking pattern is solid. However, at line 159, error.message assumes the caught value is an Error object. If a non-Error value is thrown, this could result in undefined.
           } catch (error: any) {
             await this.addToolResult({
               toolCallId: args.toolCallId,
               tool: args.toolName,
               output: null,
               state: 'output-error',
-              errorText: error.message,
+              errorText: error?.message ?? String(error),
             })
packages/typescript/smoke-tests/adapters/src/llm-simulator.ts (3)
148-169: Consider edge case: empty content string.

The content streaming splits on spaces, but an empty string '' would result in words = [''], yielding a single empty delta. This is unlikely to cause issues but worth noting.
// Current behavior with empty string:
''.split(' ') // [''] - yields one empty content chunk
If empty content should yield no chunks, add a guard:
-    if (iteration.content) {
+    if (iteration.content && iteration.content.length > 0) {
223-231: Stub implementation for structuredOutput.

This is a minimal mock that always returns empty data. This is acceptable for a test simulator, but consider documenting that it's a stub.
 async structuredOutput(
   _options: StructuredOutputOptions<Record<string, unknown>>,
 ): Promise<StructuredOutputResult<unknown>> {
-  // Simple mock implementation
+  // Stub implementation - always returns empty object
+  // Real structured output scenarios should use a different approach
   return {
     data: {},
     rawText: '{}',
   }
 }
379-420: singleServerTool and singleClientTool are identical.

Both methods produce the exact same script structure. This appears intentional (the distinction is in how the tool is defined, not in the script), but the naming could mislead users into thinking the scripts differ.

Consider either:

Adding a comment clarifying that the script is the same, or

Consolidating into a single singleTool method with a comment
 /**
  * Script for a single server tool call
+ * Note: Script structure is identical to singleClientTool - 
+ * the server/client distinction is in the tool definition, not the script.
  */
packages/typescript/smoke-tests/e2e/tests/tools/client-tool.spec.ts (2)
103-115: Silent JSON parse failure returns empty array.

The empty catch block silently handles parse errors. Consider logging parse failures in debug mode to aid troubleshooting.
     try {
       return JSON.parse(el.textContent || '[]')
     } catch {
+      console.warn('Failed to parse event-log-json:', el.textContent)
       return []
     }
215-232: Test expectations are intentionally relaxed.

The "triple client sequence" test notes a known issue with the 3rd tool. The comment explains this well. Consider adding a TODO or tracking issue reference.
-    // NOTE: This tests a complex multi-step continuation flow that requires
-    // additional investigation. Currently verifying at least 2 tools complete.
+    // TODO: Track issue for 3rd tool not triggering in complex continuation flows
+    // Currently verifying at least 2 tools complete while this is investigated.
packages/typescript/smoke-tests/adapters/src/tests/tools/multi-tool.test.ts (1)
76-82: Consider using typed chunk filtering.

The chunk type assertions rely on string matching. Consider using type guards for stronger typing.
// Could define a type guard for safer filtering:
const isToolResultChunk = (c: StreamChunk): c is ToolResultStreamChunk => 
  c.type === 'tool_result'

const toolResultChunks = chunks.filter(isToolResultChunk)
packages/typescript/smoke-tests/e2e/tests/tools/server-client-sequence.spec.ts (2)
14-133: Helper functions are duplicated across test files.

The selectScenario, waitForTestComplete, getMetadata, getEventLog, and getToolCalls helpers are nearly identical to those in client-tool.spec.ts. This creates maintenance burden.

Extract shared helpers to a common test utilities file:
// packages/typescript/smoke-tests/e2e/tests/tools/test-helpers.ts
import { type Page } from '@playwright/test'

export async function selectScenario(page: Page, scenario: string): Promise<void> {
  // ... shared implementation
}

export async function waitForTestComplete(
  page: Page,
  timeout?: number,
  expectedToolCount?: number,
): Promise<void> {
  // ... shared implementation
}

// etc.
Then import in both test files:
import { selectScenario, waitForTestComplete, getMetadata } from './test-helpers'
104-116: Return type includes timestamp field unlike client-tool.spec.ts.

The getEventLog return type here includes timestamp: number, while client-tool.spec.ts only has type and toolName. This inconsistency could cause confusion.

If extracting to shared helpers, standardize the return type to include all fields:
Promise<Array<{ type: string; toolName: string; timestamp?: number }>>
packages/typescript/smoke-tests/e2e/src/routes/tools-test.tsx (1)
355-410: Non-null assertions on tc.approval!.id! rely on filter guarantees.

The pendingApprovals filter at line 229-231 checks tc.approval?.needsApproval but doesn't explicitly verify tc.approval.id exists. While this is likely safe in practice for a test page, consider optional chaining for defensive coding:
-                if (!respondedApprovals.current.has(tc.approval!.id!)) {
-                    respondedApprovals.current.add(tc.approval!.id!)
+                const approvalId = tc.approval?.id
+                if (approvalId && !respondedApprovals.current.has(approvalId)) {
+                    respondedApprovals.current.add(approvalId)
packages/typescript/smoke-tests/e2e/tests/tools/race-conditions.spec.ts (1)
19-47: Consider reducing arbitrary timeouts in selectScenario.

The function uses multiple waitForTimeout calls (500ms, 200ms, 300ms) that could make tests flaky or slow. While hydration waits are sometimes necessary, consider using more deterministic waits where possible:
 async function selectScenario(page: Page, scenario: string): Promise<void> {
-  // Wait for hydration
-  await page.waitForTimeout(500)
+  // Wait for component to be interactive
+  await page.waitForSelector('#scenario-select:not([disabled])', { timeout: 3000 })
However, given this is e2e test infrastructure and React hydration timing can be unpredictable, the current approach is acceptable.
packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts (1)
7-16: Consider extracting collectChunks to a shared test utility.

This helper function is duplicated across multiple test files (error-handling.test.ts, sequences.test.ts, client-tool.test.ts). Consider extracting it to a shared test utilities module to reduce duplication.
// e.g., packages/typescript/smoke-tests/adapters/src/tests/test-utils.ts
export async function collectChunks<T>(stream: AsyncIterable<T>): Promise<Array<T>> {
  const chunks: Array<T> = []
  for await (const chunk of stream) {
    chunks.push(chunk)
  }
  return chunks
}
packages/typescript/smoke-tests/adapters/src/tests/tools/server-tool.test.ts (1)
7-16: Consider extracting collectChunks helper to a shared test utility.

This helper is duplicated across multiple test files (also in approval.test.ts). Consider extracting it to a shared test utilities module to reduce duplication and ensure consistency.

For example, create a file like packages/typescript/smoke-tests/adapters/src/test-utils.ts:
export async function collectChunks<T>(stream: AsyncIterable<T>): Promise<Array<T>> {
  const chunks: Array<T> = []
  for await (const chunk of stream) {
    chunks.push(chunk)
  }
  return chunks
}
packages/typescript/smoke-tests/e2e/src/routes/api.tools-test.ts (1)
427-489: Abort signal propagation may be incomplete.

The abortController is created independently but isn't linked to the incoming requestSignal. If the client aborts the request, the chat stream won't be notified to stop.

Consider linking them:
         const abortController = new AbortController()
 
+        // Link request abort to our controller
+        requestSignal?.addEventListener('abort', () => {
+          abortController.abort()
+        })
+
         try {
Alternatively, pass requestSignal directly if toStreamResponse and chat support it.
packages/typescript/smoke-tests/e2e/tests/tools/approval-flow.spec.ts (2)
15-46: Consider using more robust waiting strategies instead of fixed timeouts.

The selectScenario function uses multiple waitForTimeout calls (500ms, 200ms, 300ms) which can be flaky in CI environments. While the retry loop and waitForFunction help mitigate this, the hardcoded delays may cause unnecessary slowness or intermittent failures.

Consider replacing fixed timeouts with condition-based waits where possible:
 async function selectScenario(page: Page, scenario: string): Promise<void> {
-  // Wait for hydration
-  await page.waitForTimeout(500)
+  // Wait for hydration - wait for select to be interactive
+  await page.waitForSelector('#scenario-select:not([disabled])', { state: 'visible' })
 
   // Try selecting multiple times
   for (let attempt = 0; attempt < 3; attempt++) {
     await page.focus('#scenario-select')
     await page.selectOption('#scenario-select', scenario)
-    await page.waitForTimeout(200)
 
     const currentScenario = await page.evaluate(() =>
       document.getElementById('test-metadata')?.getAttribute('data-scenario'),
     )
     if (currentScenario === scenario) break
+    await page.waitForTimeout(100) // Small delay only on retry
   }
180-218: Potential race condition in sequential approvals test.

The test waits for the approval section and buttons but the condition on lines 196-201 might pass when the first approval button is still visible after clicking. Consider explicitly waiting for the first approval to be processed before checking for the second.
     // Approve the first one
     await page.click('.approve-button')
 
+    // Wait for first approval to be processed (button should disappear)
+    await page.waitForFunction(
+      () => document.querySelectorAll('.approve-button').length === 0,
+      { timeout: 5000 },
+    ).catch(() => {}); // May not disappear if second comes quickly
+
     // Wait for second approval to appear
     await page.waitForFunction(
       () => {
         const section = document.getElementById('approval-section')
         const buttons = document.querySelectorAll('.approve-button')
-        // Either new approval appeared or section is visible
-        return section !== null && buttons.length > 0
+        // New approval should have appeared
+        return section !== null && buttons.length >= 1
       },
       { timeout: 10000 },
     )

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e011bf8 and 3752db0.

⛔ Files ignored due to path filters (1)

pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (20)

packages/typescript/ai-client/src/chat-client.ts (8 hunks)
packages/typescript/ai/src/core/chat.ts (2 hunks)
packages/typescript/ai/src/message-converters.ts (1 hunks)
packages/typescript/ai/src/types.ts (1 hunks)
packages/typescript/smoke-tests/adapters/package.json (2 hunks)
packages/typescript/smoke-tests/adapters/src/llm-simulator.ts (1 hunks)
packages/typescript/smoke-tests/adapters/src/tests/tools/approval.test.ts (1 hunks)
packages/typescript/smoke-tests/adapters/src/tests/tools/client-tool.test.ts (1 hunks)
packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts (1 hunks)
packages/typescript/smoke-tests/adapters/src/tests/tools/multi-tool.test.ts (1 hunks)
packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts (1 hunks)
packages/typescript/smoke-tests/adapters/src/tests/tools/server-tool.test.ts (1 hunks)
packages/typescript/smoke-tests/adapters/vitest.config.ts (1 hunks)
packages/typescript/smoke-tests/e2e/package.json (1 hunks)
packages/typescript/smoke-tests/e2e/src/routes/api.tools-test.ts (1 hunks)
packages/typescript/smoke-tests/e2e/src/routes/tools-test.tsx (1 hunks)
packages/typescript/smoke-tests/e2e/tests/tools/approval-flow.spec.ts (1 hunks)
packages/typescript/smoke-tests/e2e/tests/tools/client-tool.spec.ts (1 hunks)
packages/typescript/smoke-tests/e2e/tests/tools/race-conditions.spec.ts (1 hunks)
packages/typescript/smoke-tests/e2e/tests/tools/server-client-sequence.spec.ts (1 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx}: Use tree-shakeable adapter architecture for provider implementations - export specialized adapters (text, embedding, summarize, image) as separate imports from /adapters subpath rather than monolithic adapters
Use Zod for runtime schema validation and type inference, particularly for tool input/output definitions with toolDefinition() and Zod schema inference
Implement isomorphic tool system using toolDefinition() with .server() and .client() implementations for dual-environment execution
Use type-safe per-model configuration with provider options typed based on selected model to ensure compile-time safety
Implement stream processing with StreamProcessor for handling chunked responses and support partial JSON parsing for streaming AI responses

Files:

packages/typescript/smoke-tests/adapters/src/tests/tools/server-tool.test.ts
packages/typescript/smoke-tests/adapters/vitest.config.ts
packages/typescript/smoke-tests/e2e/src/routes/tools-test.tsx
packages/typescript/smoke-tests/e2e/tests/tools/client-tool.spec.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts
packages/typescript/smoke-tests/e2e/src/routes/api.tools-test.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/approval.test.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/multi-tool.test.ts
packages/typescript/smoke-tests/e2e/tests/tools/server-client-sequence.spec.ts
packages/typescript/smoke-tests/e2e/tests/tools/approval-flow.spec.ts
packages/typescript/smoke-tests/e2e/tests/tools/race-conditions.spec.ts
packages/typescript/smoke-tests/adapters/src/llm-simulator.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts
packages/typescript/ai/src/message-converters.ts
packages/typescript/ai/src/core/chat.ts
packages/typescript/ai/src/types.ts
packages/typescript/ai-client/src/chat-client.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/client-tool.test.ts

**/*.test.ts

📄 CodeRabbit inference engine (CLAUDE.md)

Write unit tests using Vitest alongside source files with .test.ts naming convention

Files:

packages/typescript/smoke-tests/adapters/src/tests/tools/server-tool.test.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/approval.test.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/multi-tool.test.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/client-tool.test.ts

**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Use camelCase for function and variable names throughout the codebase

Files:

packages/typescript/smoke-tests/adapters/src/tests/tools/server-tool.test.ts
packages/typescript/smoke-tests/adapters/vitest.config.ts
packages/typescript/smoke-tests/e2e/src/routes/tools-test.tsx
packages/typescript/smoke-tests/e2e/tests/tools/client-tool.spec.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts
packages/typescript/smoke-tests/e2e/src/routes/api.tools-test.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/approval.test.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/multi-tool.test.ts
packages/typescript/smoke-tests/e2e/tests/tools/server-client-sequence.spec.ts
packages/typescript/smoke-tests/e2e/tests/tools/approval-flow.spec.ts
packages/typescript/smoke-tests/e2e/tests/tools/race-conditions.spec.ts
packages/typescript/smoke-tests/adapters/src/llm-simulator.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts
packages/typescript/ai/src/message-converters.ts
packages/typescript/ai/src/core/chat.ts
packages/typescript/ai/src/types.ts
packages/typescript/ai-client/src/chat-client.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/client-tool.test.ts

🧠 Learnings (10)

📚 Learning: 2025-12-13T17:09:09.783Z

Learnt from: CR
Repo: TanStack/ai PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-13T17:09:09.783Z
Learning: Applies to **/*.{ts,tsx} : Implement isomorphic tool system using `toolDefinition()` with `.server()` and `.client()` implementations for dual-environment execution

Applied to files:

packages/typescript/smoke-tests/adapters/src/tests/tools/server-tool.test.ts
packages/typescript/smoke-tests/e2e/src/routes/tools-test.tsx
packages/typescript/smoke-tests/e2e/tests/tools/client-tool.spec.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts
packages/typescript/smoke-tests/e2e/src/routes/api.tools-test.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/multi-tool.test.ts
packages/typescript/smoke-tests/e2e/tests/tools/server-client-sequence.spec.ts
packages/typescript/smoke-tests/adapters/package.json
packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts
packages/typescript/ai-client/src/chat-client.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/client-tool.test.ts

📚 Learning: 2025-12-13T17:09:09.783Z

Learnt from: CR
Repo: TanStack/ai PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-13T17:09:09.783Z
Learning: Applies to **/*.{ts,tsx} : Use Zod for runtime schema validation and type inference, particularly for tool input/output definitions with `toolDefinition()` and Zod schema inference

Applied to files:

packages/typescript/smoke-tests/adapters/src/tests/tools/server-tool.test.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/multi-tool.test.ts
packages/typescript/smoke-tests/adapters/package.json
packages/typescript/smoke-tests/adapters/src/tests/tools/client-tool.test.ts

📚 Learning: 2025-12-13T17:09:09.783Z

Learnt from: CR
Repo: TanStack/ai PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-13T17:09:09.783Z
Learning: Applies to packages/typescript/*/src/adapters/*.ts : Create individual adapter implementations for each provider capability (text, embed, summarize, image) with separate exports to enable tree-shaking

Applied to files:

packages/typescript/smoke-tests/adapters/src/tests/tools/server-tool.test.ts
packages/typescript/smoke-tests/e2e/package.json
packages/typescript/smoke-tests/adapters/vitest.config.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/multi-tool.test.ts
packages/typescript/smoke-tests/adapters/package.json
packages/typescript/smoke-tests/adapters/src/llm-simulator.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/client-tool.test.ts

📚 Learning: 2025-12-13T17:09:09.783Z

Learnt from: CR
Repo: TanStack/ai PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-13T17:09:09.783Z
Learning: Applies to packages/typescript/*/src/index.ts : Export tree-shakeable adapters with clear subpath exports in package.json (e.g., `tanstack/ai/adapters`, `tanstack/ai-openai/adapters`) to minimize bundle size

Applied to files:

packages/typescript/smoke-tests/e2e/package.json
packages/typescript/smoke-tests/adapters/vitest.config.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts
packages/typescript/smoke-tests/adapters/package.json
packages/typescript/smoke-tests/adapters/src/llm-simulator.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts

📚 Learning: 2025-12-13T17:09:09.783Z

Learnt from: CR
Repo: TanStack/ai PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-13T17:09:09.783Z
Learning: Applies to packages/*/package.json : Use `workspace:*` protocol for internal package dependencies in package.json (e.g., `"tanstack/ai": "workspace:*"`)

Applied to files:

packages/typescript/smoke-tests/e2e/package.json

📚 Learning: 2025-12-13T17:09:09.784Z

Learnt from: CR
Repo: TanStack/ai PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-13T17:09:09.784Z
Learning: Use Nx workspace with affected commands to optimize testing and building only changed packages and their dependents

Applied to files:

packages/typescript/smoke-tests/e2e/package.json

📚 Learning: 2025-12-13T17:09:09.783Z

Learnt from: CR
Repo: TanStack/ai PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-13T17:09:09.783Z
Learning: Applies to **/*.test.ts : Write unit tests using Vitest alongside source files with `.test.ts` naming convention

Applied to files:

packages/typescript/smoke-tests/e2e/package.json
packages/typescript/smoke-tests/adapters/vitest.config.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/multi-tool.test.ts
packages/typescript/smoke-tests/e2e/tests/tools/server-client-sequence.spec.ts
packages/typescript/smoke-tests/adapters/package.json
packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts

📚 Learning: 2025-12-13T17:09:09.783Z

Learnt from: CR
Repo: TanStack/ai PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-13T17:09:09.783Z
Learning: Applies to **/*.{ts,tsx} : Use tree-shakeable adapter architecture for provider implementations - export specialized adapters (text, embedding, summarize, image) as separate imports from `/adapters` subpath rather than monolithic adapters

Applied to files:

packages/typescript/smoke-tests/adapters/vitest.config.ts
packages/typescript/smoke-tests/adapters/package.json
packages/typescript/smoke-tests/adapters/src/llm-simulator.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/client-tool.test.ts

📚 Learning: 2025-12-13T17:09:09.783Z

Learnt from: CR
Repo: TanStack/ai PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-13T17:09:09.783Z
Learning: Applies to .eslintrc* : Use ESLint with custom TanStack config for linting all TypeScript and JavaScript files

Applied to files:

packages/typescript/smoke-tests/adapters/vitest.config.ts

📚 Learning: 2025-12-13T17:09:09.783Z

Learnt from: CR
Repo: TanStack/ai PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-13T17:09:09.783Z
Learning: Applies to **/*.{ts,tsx} : Implement stream processing with StreamProcessor for handling chunked responses and support partial JSON parsing for streaming AI responses

Applied to files:

packages/typescript/ai-client/src/chat-client.ts

🧬 Code graph analysis (10)

packages/typescript/smoke-tests/e2e/src/routes/tools-test.tsx (2)

packages/typescript/ai-client/src/chat-client.ts (3)

sendMessage (270-280)

stop (402-409)

addToolApprovalResponse (456-493)

packages/typescript/smoke-tests/e2e/src/routes/api.tools-test.ts (1)

Route (424-490)

packages/typescript/smoke-tests/e2e/tests/tools/client-tool.spec.ts (2)

packages/typescript/ai-svelte/src/create-chat.svelte.ts (1)

isLoading (119-121)

packages/typescript/ai/src/tools/tool-calls.ts (1)

getToolCalls (111-115)

packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts (2)

packages/typescript/smoke-tests/adapters/src/llm-simulator.ts (3)

SimulatorScripts (379-500)

createLLMSimulator (366-370)

SimulatorScript (35-40)

packages/typescript/ai/src/core/chat.ts (2)

chat (80-107)

chat (750-780)

packages/typescript/smoke-tests/e2e/src/routes/api.tools-test.ts (2)

packages/typescript/smoke-tests/adapters/src/llm-simulator.ts (2)

SimulatorScript (35-40)

createLLMSimulator (366-370)

packages/typescript/ai/src/core/chat.ts (2)

chat (80-107)

chat (750-780)

packages/typescript/smoke-tests/adapters/src/tests/tools/multi-tool.test.ts (1)

packages/typescript/smoke-tests/adapters/src/llm-simulator.ts (4)

SimulatorScripts (379-500)

createLLMSimulator (366-370)

SimulatorScript (35-40)

approvalTool (425-441)

packages/typescript/smoke-tests/e2e/tests/tools/server-client-sequence.spec.ts (1)

packages/typescript/ai/src/tools/tool-calls.ts (1)

getToolCalls (111-115)

packages/typescript/smoke-tests/e2e/tests/tools/race-conditions.spec.ts (1)

packages/typescript/ai-svelte/src/create-chat.svelte.ts (1)

isLoading (119-121)

packages/typescript/smoke-tests/adapters/src/llm-simulator.ts (1)

packages/typescript/ai/src/types.ts (2)

ChatOptions (553-587)

StreamChunk (677-685)

packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts (2)

packages/typescript/smoke-tests/adapters/src/llm-simulator.ts (2)

SimulatorScript (35-40)

createLLMSimulator (366-370)

packages/typescript/ai/src/core/chat.ts (2)

chat (80-107)

chat (750-780)

packages/typescript/smoke-tests/adapters/src/tests/tools/client-tool.test.ts (2)

packages/typescript/smoke-tests/adapters/src/llm-simulator.ts (3)

SimulatorScripts (379-500)

createLLMSimulator (366-370)

SimulatorScript (35-40)

packages/typescript/ai/src/core/chat.ts (2)

chat (80-107)

chat (750-780)

🪛 ast-grep (0.40.0)

packages/typescript/smoke-tests/e2e/src/routes/tools-test.tsx

[warning] 591-591: Usage of dangerouslySetInnerHTML detected. This bypasses React's built-in XSS protection. Always sanitize HTML content using libraries like DOMPurify before injecting it into the DOM to prevent XSS attacks.
Context: dangerouslySetInnerHTML
Note: [CWE-79] Improper Neutralization of Input During Web Page Generation [REFERENCES]
- https://reactjs.org/docs/dom-elements.html#dangerouslysetinnerhtml
- https://cwe.mitre.org/data/definitions/79.html

(react-unsafe-html-injection)

[warning] 598-598: Usage of dangerouslySetInnerHTML detected. This bypasses React's built-in XSS protection. Always sanitize HTML content using libraries like DOMPurify before injecting it into the DOM to prevent XSS attacks.
Context: dangerouslySetInnerHTML
Note: [CWE-79] Improper Neutralization of Input During Web Page Generation [REFERENCES]
- https://reactjs.org/docs/dom-elements.html#dangerouslysetinnerhtml
- https://cwe.mitre.org/data/definitions/79.html

(react-unsafe-html-injection)

🪛 Biome (2.1.2)

packages/typescript/smoke-tests/e2e/src/routes/tools-test.tsx

[error] 592-592: Avoid passing content using the dangerouslySetInnerHTML prop.

Setting content using code can expose users to cross-site scripting (XSS) attacks

(lint/security/noDangerouslySetInnerHtml)

[error] 599-600: Avoid passing content using the dangerouslySetInnerHTML prop.

Setting content using code can expose users to cross-site scripting (XSS) attacks

(lint/security/noDangerouslySetInnerHtml)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Preview
GitHub Check: Test

🔇 Additional comments (70)

packages/typescript/smoke-tests/e2e/package.json (3)

25-25: Correct use of workspace protocol for internal package.

The @tanstack/tests-adapters dependency correctly uses the workspace:* protocol for internal monorepo dependencies, aligning with established practices.

29-29: Verify vite-tsconfig-paths configuration.

The vite-tsconfig-paths dependency is newly added. Confirm that it's configured in the Vite config file (vite.config.ts) and is actually needed for the e2e tests, particularly if TypeScript path aliases are used in test files.

25-25: @tanstack/tests-adapters is properly configured and zod versions are consistent across core packages.

The dependency is correctly defined in the workspace and uses workspace:* as expected. Zod versions are aligned at ^4.1.13 across main packages including this smoke test, tests-adapters, and other core AI packages. Minor version variations exist in test-specific packages (ai-anthropic-live-tests uses ^3.24.1) and the root ai package uses a range (^3.0.0 || ^4.0.0), which is acceptable for those contexts.

packages/typescript/ai/src/types.ts (1)

69-73: LGTM! Clean extension of ToolCall for approval data.

The optional approval field provides a backward-compatible way to carry approval decisions back to the server. The simpler structure (compared to ToolCallPart.approval) appropriately contains only what's needed in the response payload.

packages/typescript/smoke-tests/adapters/vitest.config.ts (1)

1-26: LGTM! Well-structured Vitest configuration for the smoke-tests package.

The path alias, coverage settings, and exclusion patterns are appropriate. Based on learnings, this follows the project's convention for test file naming (.test.ts).

packages/typescript/ai/src/message-converters.ts (1)

110-117: LGTM! Correctly propagates approval data for responded tool calls.

The conditional spread ensures approval metadata is only included when the tool call has been explicitly approved or denied (approval-responded state with a defined approved value). This aligns with the new ToolCall.approval type.

packages/typescript/ai/src/core/chat.ts (1)

517-546: LGTM! Enhanced approval collection supports both message formats.

The dual-format handling (UIMessage with parts array and ModelMessage with toolCalls array) provides necessary flexibility. Note that both code paths can potentially add to the same approvals Map for the same message if it happens to have both structures, but this is harmless since the approval decision would be the same value.

packages/typescript/ai-client/src/chat-client.ts (5)

29-33: LGTM! Appropriate state variables for async tool execution tracking.

The pendingToolExecutions Map keyed by toolCallId enables proper lifecycle management, and the continuationPending flag prevents duplicate continuation attempts during action draining.

238-242: LGTM! Proper synchronization before stream finalization.

Awaiting all pending tool executions ensures client tools complete before the stream is finalized and continuation checks occur. The outer try/catch in processStream will handle any rejections.

310-373: LGTM! Well-structured stream lifecycle with proper guards and continuation logic.

The concurrent stream guard, state reset, and success flag pattern correctly handle the stream lifecycle. The finally block order is intentional: setIsLoading(false) before drainPostStreamActions allows queued actions to trigger continuation without being blocked by the loading check.

444-450: LGTM! Proper deferral of continuation when stream is in progress.

Queueing the continuation check when isLoading is true ensures we don't attempt concurrent streams, and the action will be executed after the current stream completes.

498-529: LGTM! Robust action queue and continuation helpers.

The sequential draining and dual-guard (continuationPending and isLoading) in checkForContinuation correctly prevent concurrent continuation attempts. The finally block ensures the flag is always reset.

packages/typescript/smoke-tests/adapters/src/llm-simulator.ts (5)

1-40: Well-structured type definitions for the simulator.

The interfaces are clear and well-documented. The SimulatorIteration interface properly captures the possible LLM turn outcomes (content, tool calls, finish reasons).

67-94: Clean adapter class design following the established pattern.

The adapter correctly implements the expected structure with kind, name, and models properties. The reset() method provides a way to reuse the adapter instance across tests.

95-130: Complex iteration detection logic with good documentation.

The stateless operation logic for E2E tests is well thought out, enabling both stateful (unit tests) and stateless (E2E with fresh adapter instances) operation modes. The comments clearly explain the two modes.

172-194: Tool call streaming looks correct.

Each tool call gets a unique ID (either from script or auto-generated), and the chunk structure matches the expected ToolCallStreamChunk format with proper index values for parallel tools.

363-370: Factory function is clean and straightforward.

The createLLMSimulator factory provides a clean API for creating adapter instances.

packages/typescript/smoke-tests/e2e/tests/tools/client-tool.spec.ts (3)

16-44: Robust scenario selection with retry logic.

The selectScenario helper handles React hydration issues well with retries and verification. The timeouts are reasonable for test stability.

50-77: Flexible completion detection with dual conditions.

The waitForTestComplete helper smartly checks both the testComplete flag and tool counts, which provides resilience against race conditions where the flag might not be set.

253-271: Good debugging support on test failure.

Capturing screenshots and logging events/metadata on failure will significantly help with debugging flaky E2E tests.

packages/typescript/smoke-tests/adapters/src/tests/tools/multi-tool.test.ts (5)

1-16: Proper imports and reusable helper.

The collectChunks helper is a clean utility for consuming async iterables in tests. Follows Vitest conventions as per coding guidelines.

19-83: Comprehensive parallel tool execution test.

The test properly verifies that all three server tools execute, and validates both tool call and result chunk counts. Good use of vi.fn() for mocking.

85-154: Good coverage of mixed tool types.

This test exercises server, approval, and client tools simultaneously, verifying the correct behavior for each type:

Server tool executes immediately

Approval tool waits for approval

Client tool emits tool-input-available

Well-structured test using toolDefinition() with Zod schemas as per coding guidelines.

157-213: Solid test for repeated tool invocations with different arguments.

The test correctly verifies that the same tool can be called multiple times with different arguments, and that each call receives the correct city-specific data.

215-267: Clean tool selection test.

Verifies that only the requested tool executes when multiple tools are available. Good negative assertions for non-called tools.

packages/typescript/smoke-tests/e2e/tests/tools/server-client-sequence.spec.ts (2)

197-209: Good sequence verification using timestamps.

The test properly verifies the order of tool executions by comparing timestamps. The conditional check handles cases where timestamps might not be present.

263-284: Text-only test uses different completion strategy.

This test waits for isLoading to become false rather than using waitForTestComplete. This is appropriate since there are no tools to count, but the additional waitForTimeout(500) seems like a workaround.

packages/typescript/smoke-tests/adapters/package.json (3)

17-18: Test scripts added correctly.

The test and test:watch scripts follow standard Vitest patterns.

34-34: No issues found. The vitest version ^4.0.14 is consistent across the entire monorepo, including the root package.json and all other packages. The version aligns properly with the project's testing setup.

9-14: Exports field correctly exposes the simulator for cross-package imports using .ts files.

This is appropriate for a private test package. The TypeScript configuration supports direct .ts imports via allowImportingTsExtensions, and the e2e test package successfully imports from this export. No built output is needed since the package is private and used internally during development.

packages/typescript/smoke-tests/e2e/src/routes/tools-test.tsx (5)

1-6: LGTM!

Imports are appropriate for the test page functionality. Uses the recommended patterns: toolDefinition from @tanstack/ai, Zod for schema validation, and React hooks as expected. Based on learnings, this correctly implements the isomorphic tool system with client implementations.

27-97: LGTM!

The createTrackedTools function correctly implements the isomorphic tool system using toolDefinition() with .client() implementations. Input and output schemas use Zod as per coding guidelines. The event tracking pattern is well-suited for e2e test observability.

156-157: LGTM!

The useRef pattern ensures tools are created once with the stable addEvent callback. Since addEvent is memoized with an empty dependency array, this approach correctly avoids unnecessary re-creation of tools on re-renders.

172-198: LGTM!

The completion detection logic appropriately handles multiple completion states (complete, output-available, or having output/result). The dependency array is correct, and the early return guard at line 173 prevents premature completion detection.

589-609: Static analysis warning is a false positive in this context.

The dangerouslySetInnerHTML usage here serializes application-controlled state (toolEvents, toolCalls) that doesn't contain unsanitized user input. The JSON data originates from internal tool execution tracking and is safe. This pattern of embedding JSON in <script type="application/json"> tags for test harnesses is appropriate for e2e test infrastructure.

packages/typescript/smoke-tests/e2e/tests/tools/race-conditions.spec.ts (4)

53-80: LGTM!

The completion detection logic is well-designed with two conditions (explicit completion flag OR sufficient completed tools without loading). The trailing 200ms buffer for React state propagation is a pragmatic approach for e2e tests.

143-180: LGTM!

This test effectively validates the specific bug where sequential client tool calls could block each other. The execution order verification via event log timestamps and the duration upper bound (< 10s) are good regression guards against blocking behavior.

246-276: LGTM!

Smart handling of the parallel approvals scenario - capturing button IDs before clicking prevents issues with React re-renders removing buttons from the DOM. The brief wait between clicks is appropriate for test stability.

399-414: LGTM!

The failure hook provides excellent debugging support with screenshots and state dumps. This will be valuable for investigating flaky tests or race condition failures.

packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts (5)

19-103: LGTM!

Comprehensive error handling tests covering both synchronous throws and async rejections. The assertions correctly verify that error details propagate to tool_result chunks with the expected error messages.

106-145: LGTM!

Good coverage of the edge case where the LLM calls a tool that isn't registered. The test verifies graceful error handling rather than crashes.

147-183: LGTM!

Correctly tests the scenario where a tool definition has no execute function, verifying that tool-input-available is emitted for client-side handling. This aligns with the isomorphic tool system pattern.

224-268: LGTM!

Effective test for the maxIterations strategy. Verifies that the agent loop terminates at the specified limit even when the script would continue indefinitely, and confirms proper completion signaling via done chunks.

270-349: LGTM!

Good coverage of the auto-stringification behavior for non-string tool returns. Tests verify that objects are properly JSON-serialized and primitives like numbers are stringified correctly.

packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts (5)

19-133: LGTM!

Thorough testing of server tool sequencing with proper call order verification using mock.invocationCallOrder. The assertions correctly verify both execution count and order.

135-194: LGTM!

Excellent test of the server→client tool sequence pattern. Correctly verifies that server tools produce tool_result chunks while client tools (without execute) produce tool-input-available chunks for client-side handling.

196-276: LGTM!

Well-designed test for client→server continuation flow. The message structure correctly simulates a scenario where a client tool result has been injected, allowing verification that the server tool executes as expected in the continuation.

278-347: LGTM!

Clean implementation of the three-tool sequence test using a shared array to track execution order. The toEqual(['A', 'B', 'C']) assertion provides strict order verification.

349-418: LGTM!

Effective test for the parallel→sequential tool pattern. The indexOf comparisons correctly verify that both parallel tools (A, B) complete before the sequential tool (combine) executes.

packages/typescript/smoke-tests/adapters/src/tests/tools/client-tool.test.ts (4)

19-109: LGTM!

Comprehensive testing of client tools without execute functions. The getCurrentIteration() check is a clever way to verify the stream paused waiting for client input rather than advancing to the next iteration.

111-154: LGTM!

Good test demonstrating that client tools with execute functions behave similarly to server tools - they auto-execute and produce tool_result chunks. The output schema with Zod provides type safety for the returned data.

156-223: LGTM!

Well-structured test for the message injection pattern. The message structure with parts containing the complete tool call correctly simulates a client that has already provided a tool result, allowing verification of the continuation flow.

225-278: LGTM!

Effective test of mixed client tool behavior where one tool has an execute function and one doesn't. The assertions correctly verify that only the tool without execute emits tool-input-available.

packages/typescript/smoke-tests/adapters/src/tests/tools/approval.test.ts (6)

1-16: LGTM! Well-structured test file with proper imports and helper.

The imports are appropriate for the test suite, and the generic collectChunks helper is cleanly implemented for collecting async iterable chunks.

18-111: LGTM! Comprehensive approval-requested test coverage.

The tests properly verify:

Approval-requested chunks are emitted for tools with needsApproval: true

Tool execution is deferred until approval is granted

Iteration stops when approval is pending

The as any casts on chunk types are acceptable for test assertions.

113-187: LGTM! Good test for approval acceptance flow.

The test correctly verifies that when an approval is pre-granted in the message history (via parts with approved: true), the tool executes and produces a tool_result chunk.

189-259: LGTM! Denial flow properly tested.

The test correctly verifies that when approval is denied (approved: false), the tool is not executed and the LLM produces a content response instead.

261-316: LGTM! Multiple approval tools test is well-designed.

The test verifies that when multiple tools requiring approval are called in the same iteration, each emits an approval-requested chunk and neither executes until approved.

318-380: LGTM! Critical edge case for mixed approval scenarios.

This test is particularly valuable as it verifies that non-approval tools execute immediately while approval-requiring tools wait, even when both are called in the same iteration.

packages/typescript/smoke-tests/adapters/src/tests/tools/server-tool.test.ts (4)

18-143: LGTM! Comprehensive server tool execution tests.

Good coverage of:

Basic tool execution with simple arguments

Complex nested argument schemas with Zod validation

Proper tool call and result chunk generation

The tests follow the coding guidelines by using Zod for runtime schema validation.

145-227: LGTM! Good coverage for different tool return types.

Tests properly verify:

Tools returning JSON objects with outputSchema

Tools returning plain objects (framework handles stringification)

Result content contains expected values

229-299: LGTM! Important ID tracking verification.

These tests ensure proper correlation between tool_call and tool_result chunks via IDs, which is critical for the agent loop to function correctly. The auto-generated ID format test (/^call-\d+-\d+$/) validates the fallback behavior.

301-343: LGTM! Good test for combined content and tool call handling.

This test verifies that content and tool calls can coexist in the same iteration, which is important for realistic LLM response patterns.

packages/typescript/smoke-tests/e2e/src/routes/api.tools-test.ts (3)

14-339: LGTM! Well-structured test scenario definitions.

The scenarios provide comprehensive coverage for:

Basic flows (text-only, single server/client tools)

Approval workflows

Sequential and parallel tool executions

Mixed server/client sequences

Race condition scenarios

The scenario structure aligns well with the LLM simulator's SimulatorScript interface.

341-422: LGTM! Tool definitions follow best practices.

The tool definitions properly:

Use Zod for runtime schema validation as per coding guidelines

Implement the isomorphic tool system with .server() and .client() implementations

Include needsApproval: true for the delete_file tool to test approval flows

492-553: LGTM! Clean scenario-to-tools mapping.

The switch statement clearly maps each scenario to its required tools. The default case returning an empty array is consistent with the text-only scenario behavior.

packages/typescript/smoke-tests/e2e/tests/tools/approval-flow.spec.ts (4)

48-125: LGTM! Well-designed test helper functions.

The helpers provide good abstraction for:

waitForApproval: Simple selector-based wait

waitForTestComplete: Multi-condition wait with good fallback logic

getMetadata/getEventLog: Clean data extraction from DOM

The 200ms buffer in waitForTestComplete (line 85) is a pragmatic approach for React state settlement.

127-178: LGTM! Single approval flow tests are well-structured.

Both approve and deny cases properly:

Select the scenario

Wait for approval UI

Perform the action

Verify metadata counts

The 500ms wait on line 172 for denial processing is acceptable as denial may not set testComplete immediately.

220-314: LGTM! Parallel approvals and mixed flow tests are robust.

The parallel approvals test (lines 220-253) correctly:

Captures button IDs before clicking to handle DOM re-renders

Verifies both approvals are granted

The mixed flow tests properly verify event log entries for both approval and client tool execution.

316-332: LGTM! Good debugging support for test failures.

The afterEach hook provides valuable debugging information:

Full-page screenshots with descriptive filenames

Console output of metadata and events

This will be helpful for diagnosing flaky tests in CI.

coderabbitai · 2025-12-15T05:36:52Z

packages/typescript/smoke-tests/adapters/src/llm-simulator.ts

+  private determineIterationFromMessages(
+    messages: Array<{
+      role: string
+      toolCalls?: Array<{
+        id: string
+        approval?: { id: string; approved: boolean }
+      }>
+      toolCallId?: string
+      parts?: Array<{
+        type: string
+        id?: string
+        output?: any
+        approval?: { approved?: boolean }
+      }>
+    }>,
+  ): number | null {
+    if (!messages || messages.length === 0) {
+      return 0 // Fresh conversation, start at iteration 0
+    }
+
+    // Find all assistant messages with tool calls
+    const assistantToolCallMessages = messages.filter(
+      (m) => m.role === 'assistant' && m.toolCalls && m.toolCalls.length > 0,
+    )
+
+    if (assistantToolCallMessages.length === 0) {
+      // No tool calls in history, might be first iteration or continuation
+      // Check if there's a user message (fresh start)
+      const hasUserMessage = messages.some((m) => m.role === 'user')
+      return hasUserMessage ? 0 : null
+    }
+
+    // Get all completed tool call IDs from:
+    // 1. Separate tool result messages (role: 'tool')
+    // 2. Parts array with output set (client tool results) - UIMessage format
+    // 3. Parts array with approval.approved set (approval responses) - UIMessage format
+    // 4. toolCalls array with approval.approved set (approval responses) - ModelMessage format
+    const completedToolIds = new Set<string>()
+
+    for (const msg of messages) {
+      // Check for role: 'tool' messages (server tool results)
+      if (msg.role === 'tool' && msg.toolCallId) {
+        completedToolIds.add(msg.toolCallId)
+      }
+
+      // Check for UIMessage format: parts with output or approval responses
+      if (msg.parts) {
+        for (const part of msg.parts) {
+          if (part.type === 'tool-call' && part.id) {
+            // Client tool results have output set
+            if (part.output !== undefined) {
+              completedToolIds.add(part.id)
+            }
+            // Approval tools are complete when approval.approved is set (true or false)
+            if (part.approval?.approved !== undefined) {
+              completedToolIds.add(part.id)
+            }
+          }
+        }
+      }
+
+      // Check for ModelMessage format: toolCalls with approval info
+      if (msg.role === 'assistant' && msg.toolCalls) {
+        for (const tc of msg.toolCalls) {
+          // Approval tools are complete when approval.approved is set
+          if (tc.approval?.approved !== undefined) {
+            completedToolIds.add(tc.id)
+          }
+        }
+      }
+    }
+
+    // Count how many complete tool call rounds we have
+    let completedRounds = 0
+    for (const assistantMsg of assistantToolCallMessages) {
+      const toolCalls = assistantMsg.toolCalls as Array<{ id: string }>
+      const allToolsComplete = toolCalls.every((tc) =>
+        completedToolIds.has(tc.id),
+      )
+      if (allToolsComplete) {
+        completedRounds++
+      }
+    }
+
+    // The next iteration is completedRounds (0-indexed)
+    // e.g., if we've completed 1 round, we're on iteration 1
+    return completedRounds
+  }


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find ModelMessage and UIMessage type definitions echo "=== Searching for ModelMessage type definition ===" rg -n "type ModelMessage|interface ModelMessage" --type=ts -A 20 | head -100 echo -e "\n=== Searching for UIMessage type definition ===" rg -n "type UIMessage|interface UIMessage" --type=ts -A 20 | head -100 echo -e "\n=== Searching for toolCalls in message types ===" rg -n "toolCalls" --type=ts -B 2 -A 2 | head -80

Repository: TanStack/ai

Length of output: 13022

🏁 Script executed:

#!/bin/bash # Find MessagePart type definition to understand tool-call parts echo "=== Searching for MessagePart type definition ===" rg -n "type MessagePart|interface MessagePart" --type=ts -A 30 | head -120 # Check for ToolCall definition echo -e "\n=== Searching for ToolCall type definition ===" rg -n "type ToolCall|interface ToolCall" --type=ts -A 15 | head -80 # Check the actual llm-simulator.ts file to see imports and context echo -e "\n=== Checking llm-simulator.ts imports and context ===" head -50 packages/typescript/smoke-tests/adapters/src/llm-simulator.ts

Repository: TanStack/ai

Length of output: 13699

🏁 Script executed:

#!/bin/bash # Get complete ToolCallPart and ToolResultPart definitions echo "=== Complete ToolCallPart definition ===" rg -n "export interface ToolCallPart" --type=ts -A 20 | head -50 echo -e "\n=== Complete ToolResultPart definition ===" rg -n "export interface ToolResultPart" --type=ts -A 10 | head -40 # Check if there's a different message type being used in llm-simulator echo -e "\n=== Search for message types in llm-simulator context ===" grep -n "messages\|Message" packages/typescript/smoke-tests/adapters/src/llm-simulator.ts | head -40

Repository: TanStack/ai

Length of output: 5228

🏁 Script executed:

#!/bin/bash # Check how determineIterationFromMessages is called and what messages are passed echo "=== Checking method calls and context ===" grep -B 5 -A 5 "determineIterationFromMessages" packages/typescript/smoke-tests/adapters/src/llm-simulator.ts | head -80 # Check the options type to see what message format is expected echo -e "\n=== Checking options and message types ===" grep -n "options\." packages/typescript/smoke-tests/adapters/src/llm-simulator.ts | grep -E "messages|options" | head -20

Repository: TanStack/ai

Length of output: 1246

🏁 Script executed:

#!/bin/bash # Find the ChatOptions interface to see what message types are expected echo "=== ChatOptions in llm-simulator ===" rg -n "ChatOptions" packages/typescript/smoke-tests/adapters/src/llm-simulator.ts -B 2 -A 2 # Find the actual ChatOptions type definition echo -e "\n=== ChatOptions type definition ===" rg -n "interface ChatOptions|type ChatOptions" --type=ts -A 30 | head -100 # Check what the adapter signature expects echo -e "\n=== Check adapter method signature ===" grep -n "adapter.*(" packages/typescript/smoke-tests/adapters/src/llm-simulator.ts | head -20 </parameter>

Repository: TanStack/ai

Length of output: 3072

🏁 Script executed:

#!/bin/bash # Check if there's any documentation or usage showing what message types are actually passed echo "=== Checking how messages are used in llm-simulator ===" sed -n '273,360p' packages/typescript/smoke-tests/adapters/src/llm-simulator.ts # Also check if there are tests that show actual usage echo -e "\n=== Check test usage ===" find packages/typescript/smoke-tests -name "*.test.*" -o -name "*.spec.*" | head -5 </parameter>

Repository: TanStack/ai

Length of output: 3705

🏁 Script executed:

#!/bin/bash # Check for all calls to determineIterationFromMessages echo "=== All calls to determineIterationFromMessages ===" rg -n "determineIterationFromMessages" packages/typescript/smoke-tests/adapters/src/llm-simulator.ts # Check if messages could come from different sources with different types echo -e "\n=== Check other adapters or similar patterns ===" find packages/typescript/smoke-tests/adapters/src -name "*.ts" -type f | xargs grep -l "messages" | head -5 # Verify the actual usage context echo -e "\n=== Check full context around line 108 ===" sed -n '100,130p' packages/typescript/smoke-tests/adapters/src/llm-simulator.ts

Repository: TanStack/ai

Length of output: 1938

🏁 Script executed:

#!/bin/bash # Get the complete ToolCall type definition from types.ts echo "=== ToolCall type in ModelMessage ===" sed -n '62,74p' packages/typescript/ai/src/types.ts # Compare with what the method expects echo -e "\n=== What the method expects vs actual ToolCall ===" echo "Method expects: toolCalls with approval?.approved" echo "Actual ToolCall has:" rg -n "export interface ToolCall" --type=ts -A 12 | grep -A 12 "packages/typescript/ai/src"

Repository: TanStack/ai

Length of output: 3808

🏁 Script executed:

#!/bin/bash # Check for camelCase compliance in the method echo "=== Checking variable/function names in determineIterationFromMessages ===" sed -n '273,360p' packages/typescript/smoke-tests/adapters/src/llm-simulator.ts | grep -E "^\s*(const|let|var|for|function)" | head -30

Repository: TanStack/ai

Length of output: 631

Remove unreachable code that checks for UIMessage format.

The method only receives Array<ModelMessage> per ChatOptions, which doesn't have a parts property. Lines 318-330 that check msg.parts for UIMessage format are dead code—ModelMessage has toolCalls (not parts), and UIMessage has parts (but is never passed here). The logic for handling ModelMessage format with toolCalls and approval.approved is correct, but the defensive UIMessage checks should be removed to avoid confusion about supported message types.

🤖 Prompt for AI Agents

In packages/typescript/smoke-tests/adapters/src/llm-simulator.ts around lines 273-360 (specifically remove the unreachable UIMessage handling at ~lines 318-330), remove the block that checks msg.parts and iterates parts for type 'tool-call' (the UIMessage format) because this method only receives ModelMessage instances per ChatOptions; keep the remaining logic that handles role === 'tool' and assistant msg.toolCalls approvals so completedToolIds is still populated, and delete any now-unused variables or comments tied only to the removed block.

coderabbitai · 2025-12-15T05:36:52Z

packages/typescript/smoke-tests/e2e/tests/tools/server-client-sequence.spec.ts

+  test.beforeEach(async ({ page }) => {
+    await page.goto('/tools-test')
+    await page.waitForSelector('#run-test-button')
+  })


⚠️ Potential issue | 🟡 Minor

Missing :not([disabled]) check in beforeEach.

Unlike client-tool.spec.ts which waits for #run-test-button:not([disabled]), this file only waits for #run-test-button. This could cause test flakiness if the button is initially disabled.

test.beforeEach(async ({ page }) => { await page.goto('/tools-test') - await page.waitForSelector('#run-test-button') + await page.waitForSelector('#run-test-button:not([disabled])', { + timeout: 10000, + }) })

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

test.beforeEach(async ({ page }) => {

await page.goto('/tools-test')

await page.waitForSelector('#run-test-button')

})

test.beforeEach(async ({ page }) => {

await page.goto('/tools-test')

await page.waitForSelector('#run-test-button:not([disabled])', {

timeout: 10000,

})

})

🤖 Prompt for AI Agents

In packages/typescript/smoke-tests/e2e/tests/tools/server-client-sequence.spec.ts around lines 136-139, the beforeEach uses await page.waitForSelector('#run-test-button') which can pass while the button is still disabled; change the wait to target the enabled button (e.g. await page.waitForSelector('#run-test-button:not([disabled])') or otherwise wait until the element is not disabled) so the test only proceeds when the run button is interactive.

initial tool hardening fixes

3752db0

jherr requested a review from a team December 15, 2025 05:31

jherr marked this pull request as draft December 15, 2025 05:32

coderabbitai bot reviewed Dec 15, 2025

View reviewed changes

AlemTuzlak approved these changes Dec 15, 2025

View reviewed changes

tool test router

20511e6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[WIP]: Tool hardening fixes #149

[WIP]: Tool hardening fixes #149

Uh oh!

jherr commented Dec 15, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 15, 2025 •

edited

Loading

Review skipped

Uh oh!

nx-cloud bot commented Dec 15, 2025 •

edited

Loading

Uh oh!

pkg-pr-new bot commented Dec 15, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 15, 2025

Uh oh!

coderabbitai bot Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[WIP]: Tool hardening fixes #149

Are you sure you want to change the base?

[WIP]: Tool hardening fixes #149

Uh oh!

Conversation

jherr commented Dec 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎯 Changes

✅ Checklist

🚀 Release Impact

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

nx-cloud bot commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pkg-pr-new bot commented Dec 15, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jherr commented Dec 15, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 15, 2025 •

edited

Loading

nx-cloud bot commented Dec 15, 2025 •

edited

Loading