Skip to content

Conversation

@jherr
Copy link
Contributor

@jherr jherr commented Dec 15, 2025

🎯 Changes

Fixing tool call issues and adding a lot of tests to make sure they are 100% stable and reliable.

✅ Checklist

  • I have followed the steps in the Contributing guide.
  • I have tested this code locally with pnpm run test:pr.

🚀 Release Impact

  • This change affects published code, and I have generated a changeset.
  • This change is docs/CI/dev-only (no release).

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for tool execution approvals, allowing tools to require user consent before running.
    • Enhanced tool orchestration to properly handle sequential and parallel tool execution flows.
    • Improved async handling for client and server tool executions with better state management.
  • Tests

    • Added comprehensive test suites covering approval flows, client tools, server tools, and tool sequencing scenarios.
    • Introduced a test simulator adapter for deterministic tool execution testing.

✏️ Tip: You can customize this high-level summary in your review settings.

@jherr jherr requested a review from a team December 15, 2025 05:31
@coderabbitai
Copy link

coderabbitai bot commented Dec 15, 2025

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Introduces an approval-based tool execution workflow with async state management. Adds a queuing system for post-stream actions, tracks in-flight tool executions, converts onToolCall callback to synchronous, and defers continuation logic. Includes comprehensive LLM simulator testing infrastructure and end-to-end test suites for tool execution flows.

Changes

Cohort / File(s) Summary
Core Chat Client Orchestration
packages/typescript/ai-client/src/chat-client.ts
Introduced queuing system (postStreamActions), in-flight tool execution tracking (pendingToolExecutions), and continuation deferral (continuationPending). Changed onToolCall signature from async to synchronous with internal promise orchestration. Added drainPostStreamActions, checkForContinuation, and queuePostStreamAction helpers. Modified stream lifecycle to wait for pending executions before finalization and drain queued actions in finally blocks.
Type System Extensions
packages/typescript/ai/src/types.ts
Added optional approval?: { id: string; approved: boolean } field to ToolCall public API to carry user-approval metadata.
Message Conversion & State Collection
packages/typescript/ai/src/message-converters.ts, packages/typescript/ai/src/core/chat.ts
Enhanced collectClientState to inspect both UIMessage (parts array) and ModelMessage (toolCalls array) formats for approval data. Updated message converters to conditionally include approval data in tool-call entries when approval state is 'approval-responded'.
LLM Simulator Adapter
packages/typescript/smoke-tests/adapters/src/llm-simulator.ts
New comprehensive simulator for deterministic tool-call testing. Exports LLMSimulatorAdapter class, SimulatorScript/SimulatorToolCall/SimulatorIteration interfaces, createLLMSimulator factory, and SimulatorScripts utility with preset patterns (single/multi-tool, approvals, sequential/parallel). Supports stateless and stateful iteration selection with ID generation.
Adapter Package Configuration
packages/typescript/smoke-tests/adapters/package.json, packages/typescript/smoke-tests/adapters/vitest.config.ts
Added exports field for ./src/llm-simulator.ts, test scripts (test, test:watch), and vitest dev dependency. Created Vitest configuration with path alias (@tanstack/ai), globals, Node environment, coverage settings, and exclusion patterns.
Adapter Test Suites
packages/typescript/smoke-tests/adapters/src/tests/tools/*.test.ts
Added six test modules: server-tool.test.ts, client-tool.test.ts, approval.test.ts, multi-tool.test.ts, sequences.test.ts, error-handling.test.ts. Cover server/client tool execution, approvals, parallel/sequential flows, mixed tool types, error propagation, and iteration tracking via simulator.
E2E API & Page Route
packages/typescript/smoke-tests/e2e/src/routes/api.tools-test.ts, packages/typescript/smoke-tests/e2e/src/routes/tools-test.tsx
Added /api/tools-test handler supporting multiple predefined scenarios (text-only, single/multi-tool, approval, sequential/parallel). Defined server tools (get_weather, fetch_data, get_time, delete_file with approval) and client tools (show_notification, display_chart). Added React component ToolsTestPage with scenario selector, event logging, approvals UI, tool call tracking, and integration with useChat.
E2E Dependencies
packages/typescript/smoke-tests/e2e/package.json
Added @tanstack/tests-adapters (workspace:*) and zod (^4.1.13) dependencies.
E2E Playwright Tests
packages/typescript/smoke-tests/e2e/tests/tools/*.spec.ts
Added four test suites: approval-flow.spec.ts, client-tool.spec.ts, race-conditions.spec.ts, server-client-sequence.spec.ts. Cover approval workflows (single, sequential, parallel), client tool execution, mixed server-client sequences, race conditions, and stress tests. Each includes helpers for scenario selection, completion detection, metadata extraction, and event log parsing.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client/App
    participant CC as ChatClient
    participant Queue as Action Queue
    participant Tool as Tool Executor
    participant Stream as Stream Lifecycle

    Client->>CC: streamResponse(messages)
    CC->>Stream: Initialize stream
    Stream->>Stream: Reset pendingToolExecutions
    
    loop During Stream
        Stream->>Tool: onToolCall (sync)
        Tool->>Tool: Create executionPromise
        Tool->>Queue: Store in pendingToolExecutions
        Tool->>Tool: Execute async (detached)
        Tool-->>Queue: Remove from pendingToolExecutions on completion
    end
    
    Stream->>Stream: Stream completes
    Stream->>Stream: Wait for all pendingToolExecutions
    Stream->>Stream: Check streamCompletedSuccessfully
    
    alt Auto-send criteria met
        Stream->>CC: checkForContinuation()
        CC->>Queue: queuePostStreamAction(continuation)
    end
    
    Stream->>Queue: drainPostStreamActions()
    Queue->>Queue: Execute all queued actions
    Queue-->>Client: Finally block runs
    
    Note over CC,Queue: Tool result added while loading
    Client->>CC: addToolResult(result)
    alt Stream in progress
        CC->>Queue: queuePostStreamAction(continuation)
    else No active stream
        CC->>CC: Trigger continuation directly
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

  • packages/typescript/ai-client/src/chat-client.ts: Dense async orchestration logic with state management across multiple new data structures (postStreamActions, pendingToolExecutions, continuationPending), control flow changes to onToolCall, and integration into stream lifecycle. Requires careful analysis of timing, race conditions, and state transitions.
  • Type system changes (types.ts, message-converters.ts, chat.ts): Moderate complexity; requires tracing approval field propagation across message formats.
  • LLM Simulator (llm-simulator.ts): Large new public API surface (~15 exports); comprehensive implementation supporting multiple iteration patterns and tool types. Needs validation of contract consistency across all preset scripts.
  • Adapter & E2E tests: High volume (~6 test modules + 4 E2E specs) with heterogeneous scenarios (server tools, client tools, approvals, race conditions, sequencing). Each test suite has distinct logic and assertions that demand individual scrutiny.
  • Interdependencies: Changes span chat-client → types → message converters → simulator usage in tests; tracing data flow through all layers is essential.

Areas requiring extra attention:

  • Async orchestration guards and edge cases in chat-client.ts (e.g., concurrent streams, pending execution cleanup, continuation re-entrance)
  • Approval state propagation through collectClientState and message-converters under various message formats
  • LLM simulator contract compliance with all SimulatorScripts presets
  • E2E test reliability under race conditions and timing edge cases

Poem

🐰 Queues and tools dance in the sun,
Approvals bloom when work is done,
We hitch the stream with post-stream cheer,
Orchestration whispers, loud and clear,
Async hops through fields so green,
The finest tool-flow ever seen!

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title uses '[WIP]' prefix and is vague ('Tool hardening fixes') without clearly conveying the main changes or their scope. Replace '[WIP]' with a clear, specific title describing the primary change, such as 'Add tool execution queuing and approval support' or similar.
✅ Passed checks (2 passed)
Check name Status Explanation
Description check ✅ Passed The PR description addresses the template structure but lacks specific details about what 'tool call issues' were fixed and what improvements the added tests validate.
Docstring Coverage ✅ Passed Docstring coverage is 96.77% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jherr jherr marked this pull request as draft December 15, 2025 05:32
@nx-cloud
Copy link

nx-cloud bot commented Dec 15, 2025

View your CI Pipeline Execution ↗ for commit 20511e6

Command Status Duration Result
nx affected --targets=test:sherif,test:knip,tes... ✅ Succeeded 7s View ↗
nx run-many --targets=build --exclude=examples/** ✅ Succeeded 2s View ↗

☁️ Nx Cloud last updated this comment at 2025-12-16 17:07:44 UTC

@pkg-pr-new
Copy link

pkg-pr-new bot commented Dec 15, 2025

Open in StackBlitz

@tanstack/ai

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai@149

@tanstack/ai-anthropic

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-anthropic@149

@tanstack/ai-client

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-client@149

@tanstack/ai-devtools-core

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-devtools-core@149

@tanstack/ai-gemini

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-gemini@149

@tanstack/ai-ollama

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-ollama@149

@tanstack/ai-openai

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-openai@149

@tanstack/ai-react

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-react@149

@tanstack/ai-react-ui

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-react-ui@149

@tanstack/ai-solid

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-solid@149

@tanstack/ai-solid-ui

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-solid-ui@149

@tanstack/ai-svelte

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-svelte@149

@tanstack/ai-vue

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-vue@149

@tanstack/ai-vue-ui

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-vue-ui@149

@tanstack/react-ai-devtools

npm i https://pkg.pr.new/TanStack/ai/@tanstack/react-ai-devtools@149

@tanstack/solid-ai-devtools

npm i https://pkg.pr.new/TanStack/ai/@tanstack/solid-ai-devtools@149

commit: 3752db0

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts (1)

7-16: Duplicate collectChunks helper - see earlier comment.

This is the same helper duplicated from error-handling.test.ts. Consider extracting to shared test utilities.

packages/typescript/smoke-tests/adapters/src/tests/tools/client-tool.test.ts (1)

7-16: Duplicate collectChunks helper - see earlier comment.

Third instance of this helper. Extracting to a shared module would reduce maintenance burden.

🧹 Nitpick comments (16)
packages/typescript/ai-client/src/chat-client.ts (1)

134-169: Consider defensive error message extraction.

The async execution tracking pattern is solid. However, at line 159, error.message assumes the caught value is an Error object. If a non-Error value is thrown, this could result in undefined.

           } catch (error: any) {
             await this.addToolResult({
               toolCallId: args.toolCallId,
               tool: args.toolName,
               output: null,
               state: 'output-error',
-              errorText: error.message,
+              errorText: error?.message ?? String(error),
             })
packages/typescript/smoke-tests/adapters/src/llm-simulator.ts (3)

148-169: Consider edge case: empty content string.

The content streaming splits on spaces, but an empty string '' would result in words = [''], yielding a single empty delta. This is unlikely to cause issues but worth noting.

// Current behavior with empty string:
''.split(' ') // [''] - yields one empty content chunk

If empty content should yield no chunks, add a guard:

-    if (iteration.content) {
+    if (iteration.content && iteration.content.length > 0) {

223-231: Stub implementation for structuredOutput.

This is a minimal mock that always returns empty data. This is acceptable for a test simulator, but consider documenting that it's a stub.

 async structuredOutput(
   _options: StructuredOutputOptions<Record<string, unknown>>,
 ): Promise<StructuredOutputResult<unknown>> {
-  // Simple mock implementation
+  // Stub implementation - always returns empty object
+  // Real structured output scenarios should use a different approach
   return {
     data: {},
     rawText: '{}',
   }
 }

379-420: singleServerTool and singleClientTool are identical.

Both methods produce the exact same script structure. This appears intentional (the distinction is in how the tool is defined, not in the script), but the naming could mislead users into thinking the scripts differ.

Consider either:

  1. Adding a comment clarifying that the script is the same, or
  2. Consolidating into a single singleTool method with a comment
 /**
  * Script for a single server tool call
+ * Note: Script structure is identical to singleClientTool - 
+ * the server/client distinction is in the tool definition, not the script.
  */
packages/typescript/smoke-tests/e2e/tests/tools/client-tool.spec.ts (2)

103-115: Silent JSON parse failure returns empty array.

The empty catch block silently handles parse errors. Consider logging parse failures in debug mode to aid troubleshooting.

     try {
       return JSON.parse(el.textContent || '[]')
     } catch {
+      console.warn('Failed to parse event-log-json:', el.textContent)
       return []
     }

215-232: Test expectations are intentionally relaxed.

The "triple client sequence" test notes a known issue with the 3rd tool. The comment explains this well. Consider adding a TODO or tracking issue reference.

-    // NOTE: This tests a complex multi-step continuation flow that requires
-    // additional investigation. Currently verifying at least 2 tools complete.
+    // TODO: Track issue for 3rd tool not triggering in complex continuation flows
+    // Currently verifying at least 2 tools complete while this is investigated.
packages/typescript/smoke-tests/adapters/src/tests/tools/multi-tool.test.ts (1)

76-82: Consider using typed chunk filtering.

The chunk type assertions rely on string matching. Consider using type guards for stronger typing.

// Could define a type guard for safer filtering:
const isToolResultChunk = (c: StreamChunk): c is ToolResultStreamChunk => 
  c.type === 'tool_result'

const toolResultChunks = chunks.filter(isToolResultChunk)
packages/typescript/smoke-tests/e2e/tests/tools/server-client-sequence.spec.ts (2)

14-133: Helper functions are duplicated across test files.

The selectScenario, waitForTestComplete, getMetadata, getEventLog, and getToolCalls helpers are nearly identical to those in client-tool.spec.ts. This creates maintenance burden.

Extract shared helpers to a common test utilities file:

// packages/typescript/smoke-tests/e2e/tests/tools/test-helpers.ts
import { type Page } from '@playwright/test'

export async function selectScenario(page: Page, scenario: string): Promise<void> {
  // ... shared implementation
}

export async function waitForTestComplete(
  page: Page,
  timeout?: number,
  expectedToolCount?: number,
): Promise<void> {
  // ... shared implementation
}

// etc.

Then import in both test files:

import { selectScenario, waitForTestComplete, getMetadata } from './test-helpers'

104-116: Return type includes timestamp field unlike client-tool.spec.ts.

The getEventLog return type here includes timestamp: number, while client-tool.spec.ts only has type and toolName. This inconsistency could cause confusion.

If extracting to shared helpers, standardize the return type to include all fields:

Promise<Array<{ type: string; toolName: string; timestamp?: number }>>
packages/typescript/smoke-tests/e2e/src/routes/tools-test.tsx (1)

355-410: Non-null assertions on tc.approval!.id! rely on filter guarantees.

The pendingApprovals filter at line 229-231 checks tc.approval?.needsApproval but doesn't explicitly verify tc.approval.id exists. While this is likely safe in practice for a test page, consider optional chaining for defensive coding:

-                if (!respondedApprovals.current.has(tc.approval!.id!)) {
-                    respondedApprovals.current.add(tc.approval!.id!)
+                const approvalId = tc.approval?.id
+                if (approvalId && !respondedApprovals.current.has(approvalId)) {
+                    respondedApprovals.current.add(approvalId)
packages/typescript/smoke-tests/e2e/tests/tools/race-conditions.spec.ts (1)

19-47: Consider reducing arbitrary timeouts in selectScenario.

The function uses multiple waitForTimeout calls (500ms, 200ms, 300ms) that could make tests flaky or slow. While hydration waits are sometimes necessary, consider using more deterministic waits where possible:

 async function selectScenario(page: Page, scenario: string): Promise<void> {
-  // Wait for hydration
-  await page.waitForTimeout(500)
+  // Wait for component to be interactive
+  await page.waitForSelector('#scenario-select:not([disabled])', { timeout: 3000 })

However, given this is e2e test infrastructure and React hydration timing can be unpredictable, the current approach is acceptable.

packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts (1)

7-16: Consider extracting collectChunks to a shared test utility.

This helper function is duplicated across multiple test files (error-handling.test.ts, sequences.test.ts, client-tool.test.ts). Consider extracting it to a shared test utilities module to reduce duplication.

// e.g., packages/typescript/smoke-tests/adapters/src/tests/test-utils.ts
export async function collectChunks<T>(stream: AsyncIterable<T>): Promise<Array<T>> {
  const chunks: Array<T> = []
  for await (const chunk of stream) {
    chunks.push(chunk)
  }
  return chunks
}
packages/typescript/smoke-tests/adapters/src/tests/tools/server-tool.test.ts (1)

7-16: Consider extracting collectChunks helper to a shared test utility.

This helper is duplicated across multiple test files (also in approval.test.ts). Consider extracting it to a shared test utilities module to reduce duplication and ensure consistency.

For example, create a file like packages/typescript/smoke-tests/adapters/src/test-utils.ts:

export async function collectChunks<T>(stream: AsyncIterable<T>): Promise<Array<T>> {
  const chunks: Array<T> = []
  for await (const chunk of stream) {
    chunks.push(chunk)
  }
  return chunks
}
packages/typescript/smoke-tests/e2e/src/routes/api.tools-test.ts (1)

427-489: Abort signal propagation may be incomplete.

The abortController is created independently but isn't linked to the incoming requestSignal. If the client aborts the request, the chat stream won't be notified to stop.

Consider linking them:

         const abortController = new AbortController()
 
+        // Link request abort to our controller
+        requestSignal?.addEventListener('abort', () => {
+          abortController.abort()
+        })
+
         try {

Alternatively, pass requestSignal directly if toStreamResponse and chat support it.

packages/typescript/smoke-tests/e2e/tests/tools/approval-flow.spec.ts (2)

15-46: Consider using more robust waiting strategies instead of fixed timeouts.

The selectScenario function uses multiple waitForTimeout calls (500ms, 200ms, 300ms) which can be flaky in CI environments. While the retry loop and waitForFunction help mitigate this, the hardcoded delays may cause unnecessary slowness or intermittent failures.

Consider replacing fixed timeouts with condition-based waits where possible:

 async function selectScenario(page: Page, scenario: string): Promise<void> {
-  // Wait for hydration
-  await page.waitForTimeout(500)
+  // Wait for hydration - wait for select to be interactive
+  await page.waitForSelector('#scenario-select:not([disabled])', { state: 'visible' })
 
   // Try selecting multiple times
   for (let attempt = 0; attempt < 3; attempt++) {
     await page.focus('#scenario-select')
     await page.selectOption('#scenario-select', scenario)
-    await page.waitForTimeout(200)
 
     const currentScenario = await page.evaluate(() =>
       document.getElementById('test-metadata')?.getAttribute('data-scenario'),
     )
     if (currentScenario === scenario) break
+    await page.waitForTimeout(100) // Small delay only on retry
   }

180-218: Potential race condition in sequential approvals test.

The test waits for the approval section and buttons but the condition on lines 196-201 might pass when the first approval button is still visible after clicking. Consider explicitly waiting for the first approval to be processed before checking for the second.

     // Approve the first one
     await page.click('.approve-button')
 
+    // Wait for first approval to be processed (button should disappear)
+    await page.waitForFunction(
+      () => document.querySelectorAll('.approve-button').length === 0,
+      { timeout: 5000 },
+    ).catch(() => {}); // May not disappear if second comes quickly
+
     // Wait for second approval to appear
     await page.waitForFunction(
       () => {
         const section = document.getElementById('approval-section')
         const buttons = document.querySelectorAll('.approve-button')
-        // Either new approval appeared or section is visible
-        return section !== null && buttons.length > 0
+        // New approval should have appeared
+        return section !== null && buttons.length >= 1
       },
       { timeout: 10000 },
     )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e011bf8 and 3752db0.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (20)
  • packages/typescript/ai-client/src/chat-client.ts (8 hunks)
  • packages/typescript/ai/src/core/chat.ts (2 hunks)
  • packages/typescript/ai/src/message-converters.ts (1 hunks)
  • packages/typescript/ai/src/types.ts (1 hunks)
  • packages/typescript/smoke-tests/adapters/package.json (2 hunks)
  • packages/typescript/smoke-tests/adapters/src/llm-simulator.ts (1 hunks)
  • packages/typescript/smoke-tests/adapters/src/tests/tools/approval.test.ts (1 hunks)
  • packages/typescript/smoke-tests/adapters/src/tests/tools/client-tool.test.ts (1 hunks)
  • packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts (1 hunks)
  • packages/typescript/smoke-tests/adapters/src/tests/tools/multi-tool.test.ts (1 hunks)
  • packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts (1 hunks)
  • packages/typescript/smoke-tests/adapters/src/tests/tools/server-tool.test.ts (1 hunks)
  • packages/typescript/smoke-tests/adapters/vitest.config.ts (1 hunks)
  • packages/typescript/smoke-tests/e2e/package.json (1 hunks)
  • packages/typescript/smoke-tests/e2e/src/routes/api.tools-test.ts (1 hunks)
  • packages/typescript/smoke-tests/e2e/src/routes/tools-test.tsx (1 hunks)
  • packages/typescript/smoke-tests/e2e/tests/tools/approval-flow.spec.ts (1 hunks)
  • packages/typescript/smoke-tests/e2e/tests/tools/client-tool.spec.ts (1 hunks)
  • packages/typescript/smoke-tests/e2e/tests/tools/race-conditions.spec.ts (1 hunks)
  • packages/typescript/smoke-tests/e2e/tests/tools/server-client-sequence.spec.ts (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx}: Use tree-shakeable adapter architecture for provider implementations - export specialized adapters (text, embedding, summarize, image) as separate imports from /adapters subpath rather than monolithic adapters
Use Zod for runtime schema validation and type inference, particularly for tool input/output definitions with toolDefinition() and Zod schema inference
Implement isomorphic tool system using toolDefinition() with .server() and .client() implementations for dual-environment execution
Use type-safe per-model configuration with provider options typed based on selected model to ensure compile-time safety
Implement stream processing with StreamProcessor for handling chunked responses and support partial JSON parsing for streaming AI responses

Files:

  • packages/typescript/smoke-tests/adapters/src/tests/tools/server-tool.test.ts
  • packages/typescript/smoke-tests/adapters/vitest.config.ts
  • packages/typescript/smoke-tests/e2e/src/routes/tools-test.tsx
  • packages/typescript/smoke-tests/e2e/tests/tools/client-tool.spec.ts
  • packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts
  • packages/typescript/smoke-tests/e2e/src/routes/api.tools-test.ts
  • packages/typescript/smoke-tests/adapters/src/tests/tools/approval.test.ts
  • packages/typescript/smoke-tests/adapters/src/tests/tools/multi-tool.test.ts
  • packages/typescript/smoke-tests/e2e/tests/tools/server-client-sequence.spec.ts
  • packages/typescript/smoke-tests/e2e/tests/tools/approval-flow.spec.ts
  • packages/typescript/smoke-tests/e2e/tests/tools/race-conditions.spec.ts
  • packages/typescript/smoke-tests/adapters/src/llm-simulator.ts
  • packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts
  • packages/typescript/ai/src/message-converters.ts
  • packages/typescript/ai/src/core/chat.ts
  • packages/typescript/ai/src/types.ts
  • packages/typescript/ai-client/src/chat-client.ts
  • packages/typescript/smoke-tests/adapters/src/tests/tools/client-tool.test.ts
**/*.test.ts

📄 CodeRabbit inference engine (CLAUDE.md)

Write unit tests using Vitest alongside source files with .test.ts naming convention

Files:

  • packages/typescript/smoke-tests/adapters/src/tests/tools/server-tool.test.ts
  • packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts
  • packages/typescript/smoke-tests/adapters/src/tests/tools/approval.test.ts
  • packages/typescript/smoke-tests/adapters/src/tests/tools/multi-tool.test.ts
  • packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts
  • packages/typescript/smoke-tests/adapters/src/tests/tools/client-tool.test.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Use camelCase for function and variable names throughout the codebase

Files:

  • packages/typescript/smoke-tests/adapters/src/tests/tools/server-tool.test.ts
  • packages/typescript/smoke-tests/adapters/vitest.config.ts
  • packages/typescript/smoke-tests/e2e/src/routes/tools-test.tsx
  • packages/typescript/smoke-tests/e2e/tests/tools/client-tool.spec.ts
  • packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts
  • packages/typescript/smoke-tests/e2e/src/routes/api.tools-test.ts
  • packages/typescript/smoke-tests/adapters/src/tests/tools/approval.test.ts
  • packages/typescript/smoke-tests/adapters/src/tests/tools/multi-tool.test.ts
  • packages/typescript/smoke-tests/e2e/tests/tools/server-client-sequence.spec.ts
  • packages/typescript/smoke-tests/e2e/tests/tools/approval-flow.spec.ts
  • packages/typescript/smoke-tests/e2e/tests/tools/race-conditions.spec.ts
  • packages/typescript/smoke-tests/adapters/src/llm-simulator.ts
  • packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts
  • packages/typescript/ai/src/message-converters.ts
  • packages/typescript/ai/src/core/chat.ts
  • packages/typescript/ai/src/types.ts
  • packages/typescript/ai-client/src/chat-client.ts
  • packages/typescript/smoke-tests/adapters/src/tests/tools/client-tool.test.ts
🧠 Learnings (10)
📚 Learning: 2025-12-13T17:09:09.783Z
Learnt from: CR
Repo: TanStack/ai PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-13T17:09:09.783Z
Learning: Applies to **/*.{ts,tsx} : Implement isomorphic tool system using `toolDefinition()` with `.server()` and `.client()` implementations for dual-environment execution

Applied to files:

  • packages/typescript/smoke-tests/adapters/src/tests/tools/server-tool.test.ts
  • packages/typescript/smoke-tests/e2e/src/routes/tools-test.tsx
  • packages/typescript/smoke-tests/e2e/tests/tools/client-tool.spec.ts
  • packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts
  • packages/typescript/smoke-tests/e2e/src/routes/api.tools-test.ts
  • packages/typescript/smoke-tests/adapters/src/tests/tools/multi-tool.test.ts
  • packages/typescript/smoke-tests/e2e/tests/tools/server-client-sequence.spec.ts
  • packages/typescript/smoke-tests/adapters/package.json
  • packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts
  • packages/typescript/ai-client/src/chat-client.ts
  • packages/typescript/smoke-tests/adapters/src/tests/tools/client-tool.test.ts
📚 Learning: 2025-12-13T17:09:09.783Z
Learnt from: CR
Repo: TanStack/ai PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-13T17:09:09.783Z
Learning: Applies to **/*.{ts,tsx} : Use Zod for runtime schema validation and type inference, particularly for tool input/output definitions with `toolDefinition()` and Zod schema inference

Applied to files:

  • packages/typescript/smoke-tests/adapters/src/tests/tools/server-tool.test.ts
  • packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts
  • packages/typescript/smoke-tests/adapters/src/tests/tools/multi-tool.test.ts
  • packages/typescript/smoke-tests/adapters/package.json
  • packages/typescript/smoke-tests/adapters/src/tests/tools/client-tool.test.ts
📚 Learning: 2025-12-13T17:09:09.783Z
Learnt from: CR
Repo: TanStack/ai PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-13T17:09:09.783Z
Learning: Applies to packages/typescript/*/src/adapters/*.ts : Create individual adapter implementations for each provider capability (text, embed, summarize, image) with separate exports to enable tree-shaking

Applied to files:

  • packages/typescript/smoke-tests/adapters/src/tests/tools/server-tool.test.ts
  • packages/typescript/smoke-tests/e2e/package.json
  • packages/typescript/smoke-tests/adapters/vitest.config.ts
  • packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts
  • packages/typescript/smoke-tests/adapters/src/tests/tools/multi-tool.test.ts
  • packages/typescript/smoke-tests/adapters/package.json
  • packages/typescript/smoke-tests/adapters/src/llm-simulator.ts
  • packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts
  • packages/typescript/smoke-tests/adapters/src/tests/tools/client-tool.test.ts
📚 Learning: 2025-12-13T17:09:09.783Z
Learnt from: CR
Repo: TanStack/ai PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-13T17:09:09.783Z
Learning: Applies to packages/typescript/*/src/index.ts : Export tree-shakeable adapters with clear subpath exports in package.json (e.g., `tanstack/ai/adapters`, `tanstack/ai-openai/adapters`) to minimize bundle size

Applied to files:

  • packages/typescript/smoke-tests/e2e/package.json
  • packages/typescript/smoke-tests/adapters/vitest.config.ts
  • packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts
  • packages/typescript/smoke-tests/adapters/package.json
  • packages/typescript/smoke-tests/adapters/src/llm-simulator.ts
  • packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts
📚 Learning: 2025-12-13T17:09:09.783Z
Learnt from: CR
Repo: TanStack/ai PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-13T17:09:09.783Z
Learning: Applies to packages/*/package.json : Use `workspace:*` protocol for internal package dependencies in package.json (e.g., `"tanstack/ai": "workspace:*"`)

Applied to files:

  • packages/typescript/smoke-tests/e2e/package.json
📚 Learning: 2025-12-13T17:09:09.784Z
Learnt from: CR
Repo: TanStack/ai PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-13T17:09:09.784Z
Learning: Use Nx workspace with affected commands to optimize testing and building only changed packages and their dependents

Applied to files:

  • packages/typescript/smoke-tests/e2e/package.json
📚 Learning: 2025-12-13T17:09:09.783Z
Learnt from: CR
Repo: TanStack/ai PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-13T17:09:09.783Z
Learning: Applies to **/*.test.ts : Write unit tests using Vitest alongside source files with `.test.ts` naming convention

Applied to files:

  • packages/typescript/smoke-tests/e2e/package.json
  • packages/typescript/smoke-tests/adapters/vitest.config.ts
  • packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts
  • packages/typescript/smoke-tests/adapters/src/tests/tools/multi-tool.test.ts
  • packages/typescript/smoke-tests/e2e/tests/tools/server-client-sequence.spec.ts
  • packages/typescript/smoke-tests/adapters/package.json
  • packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts
📚 Learning: 2025-12-13T17:09:09.783Z
Learnt from: CR
Repo: TanStack/ai PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-13T17:09:09.783Z
Learning: Applies to **/*.{ts,tsx} : Use tree-shakeable adapter architecture for provider implementations - export specialized adapters (text, embedding, summarize, image) as separate imports from `/adapters` subpath rather than monolithic adapters

Applied to files:

  • packages/typescript/smoke-tests/adapters/vitest.config.ts
  • packages/typescript/smoke-tests/adapters/package.json
  • packages/typescript/smoke-tests/adapters/src/llm-simulator.ts
  • packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts
  • packages/typescript/smoke-tests/adapters/src/tests/tools/client-tool.test.ts
📚 Learning: 2025-12-13T17:09:09.783Z
Learnt from: CR
Repo: TanStack/ai PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-13T17:09:09.783Z
Learning: Applies to .eslintrc* : Use ESLint with custom TanStack config for linting all TypeScript and JavaScript files

Applied to files:

  • packages/typescript/smoke-tests/adapters/vitest.config.ts
📚 Learning: 2025-12-13T17:09:09.783Z
Learnt from: CR
Repo: TanStack/ai PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-13T17:09:09.783Z
Learning: Applies to **/*.{ts,tsx} : Implement stream processing with StreamProcessor for handling chunked responses and support partial JSON parsing for streaming AI responses

Applied to files:

  • packages/typescript/ai-client/src/chat-client.ts
🧬 Code graph analysis (10)
packages/typescript/smoke-tests/e2e/src/routes/tools-test.tsx (2)
packages/typescript/ai-client/src/chat-client.ts (3)
  • sendMessage (270-280)
  • stop (402-409)
  • addToolApprovalResponse (456-493)
packages/typescript/smoke-tests/e2e/src/routes/api.tools-test.ts (1)
  • Route (424-490)
packages/typescript/smoke-tests/e2e/tests/tools/client-tool.spec.ts (2)
packages/typescript/ai-svelte/src/create-chat.svelte.ts (1)
  • isLoading (119-121)
packages/typescript/ai/src/tools/tool-calls.ts (1)
  • getToolCalls (111-115)
packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts (2)
packages/typescript/smoke-tests/adapters/src/llm-simulator.ts (3)
  • SimulatorScripts (379-500)
  • createLLMSimulator (366-370)
  • SimulatorScript (35-40)
packages/typescript/ai/src/core/chat.ts (2)
  • chat (80-107)
  • chat (750-780)
packages/typescript/smoke-tests/e2e/src/routes/api.tools-test.ts (2)
packages/typescript/smoke-tests/adapters/src/llm-simulator.ts (2)
  • SimulatorScript (35-40)
  • createLLMSimulator (366-370)
packages/typescript/ai/src/core/chat.ts (2)
  • chat (80-107)
  • chat (750-780)
packages/typescript/smoke-tests/adapters/src/tests/tools/multi-tool.test.ts (1)
packages/typescript/smoke-tests/adapters/src/llm-simulator.ts (4)
  • SimulatorScripts (379-500)
  • createLLMSimulator (366-370)
  • SimulatorScript (35-40)
  • approvalTool (425-441)
packages/typescript/smoke-tests/e2e/tests/tools/server-client-sequence.spec.ts (1)
packages/typescript/ai/src/tools/tool-calls.ts (1)
  • getToolCalls (111-115)
packages/typescript/smoke-tests/e2e/tests/tools/race-conditions.spec.ts (1)
packages/typescript/ai-svelte/src/create-chat.svelte.ts (1)
  • isLoading (119-121)
packages/typescript/smoke-tests/adapters/src/llm-simulator.ts (1)
packages/typescript/ai/src/types.ts (2)
  • ChatOptions (553-587)
  • StreamChunk (677-685)
packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts (2)
packages/typescript/smoke-tests/adapters/src/llm-simulator.ts (2)
  • SimulatorScript (35-40)
  • createLLMSimulator (366-370)
packages/typescript/ai/src/core/chat.ts (2)
  • chat (80-107)
  • chat (750-780)
packages/typescript/smoke-tests/adapters/src/tests/tools/client-tool.test.ts (2)
packages/typescript/smoke-tests/adapters/src/llm-simulator.ts (3)
  • SimulatorScripts (379-500)
  • createLLMSimulator (366-370)
  • SimulatorScript (35-40)
packages/typescript/ai/src/core/chat.ts (2)
  • chat (80-107)
  • chat (750-780)
🪛 ast-grep (0.40.0)
packages/typescript/smoke-tests/e2e/src/routes/tools-test.tsx

[warning] 591-591: Usage of dangerouslySetInnerHTML detected. This bypasses React's built-in XSS protection. Always sanitize HTML content using libraries like DOMPurify before injecting it into the DOM to prevent XSS attacks.
Context: dangerouslySetInnerHTML
Note: [CWE-79] Improper Neutralization of Input During Web Page Generation [REFERENCES]
- https://reactjs.org/docs/dom-elements.html#dangerouslysetinnerhtml
- https://cwe.mitre.org/data/definitions/79.html

(react-unsafe-html-injection)


[warning] 598-598: Usage of dangerouslySetInnerHTML detected. This bypasses React's built-in XSS protection. Always sanitize HTML content using libraries like DOMPurify before injecting it into the DOM to prevent XSS attacks.
Context: dangerouslySetInnerHTML
Note: [CWE-79] Improper Neutralization of Input During Web Page Generation [REFERENCES]
- https://reactjs.org/docs/dom-elements.html#dangerouslysetinnerhtml
- https://cwe.mitre.org/data/definitions/79.html

(react-unsafe-html-injection)

🪛 Biome (2.1.2)
packages/typescript/smoke-tests/e2e/src/routes/tools-test.tsx

[error] 592-592: Avoid passing content using the dangerouslySetInnerHTML prop.

Setting content using code can expose users to cross-site scripting (XSS) attacks

(lint/security/noDangerouslySetInnerHtml)


[error] 599-600: Avoid passing content using the dangerouslySetInnerHTML prop.

Setting content using code can expose users to cross-site scripting (XSS) attacks

(lint/security/noDangerouslySetInnerHtml)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Preview
  • GitHub Check: Test
🔇 Additional comments (70)
packages/typescript/smoke-tests/e2e/package.json (3)

25-25: Correct use of workspace protocol for internal package.

The @tanstack/tests-adapters dependency correctly uses the workspace:* protocol for internal monorepo dependencies, aligning with established practices.


29-29: Verify vite-tsconfig-paths configuration.

The vite-tsconfig-paths dependency is newly added. Confirm that it's configured in the Vite config file (vite.config.ts) and is actually needed for the e2e tests, particularly if TypeScript path aliases are used in test files.


25-25: @tanstack/tests-adapters is properly configured and zod versions are consistent across core packages.

The dependency is correctly defined in the workspace and uses workspace:* as expected. Zod versions are aligned at ^4.1.13 across main packages including this smoke test, tests-adapters, and other core AI packages. Minor version variations exist in test-specific packages (ai-anthropic-live-tests uses ^3.24.1) and the root ai package uses a range (^3.0.0 || ^4.0.0), which is acceptable for those contexts.

packages/typescript/ai/src/types.ts (1)

69-73: LGTM! Clean extension of ToolCall for approval data.

The optional approval field provides a backward-compatible way to carry approval decisions back to the server. The simpler structure (compared to ToolCallPart.approval) appropriately contains only what's needed in the response payload.

packages/typescript/smoke-tests/adapters/vitest.config.ts (1)

1-26: LGTM! Well-structured Vitest configuration for the smoke-tests package.

The path alias, coverage settings, and exclusion patterns are appropriate. Based on learnings, this follows the project's convention for test file naming (.test.ts).

packages/typescript/ai/src/message-converters.ts (1)

110-117: LGTM! Correctly propagates approval data for responded tool calls.

The conditional spread ensures approval metadata is only included when the tool call has been explicitly approved or denied (approval-responded state with a defined approved value). This aligns with the new ToolCall.approval type.

packages/typescript/ai/src/core/chat.ts (1)

517-546: LGTM! Enhanced approval collection supports both message formats.

The dual-format handling (UIMessage with parts array and ModelMessage with toolCalls array) provides necessary flexibility. Note that both code paths can potentially add to the same approvals Map for the same message if it happens to have both structures, but this is harmless since the approval decision would be the same value.

packages/typescript/ai-client/src/chat-client.ts (5)

29-33: LGTM! Appropriate state variables for async tool execution tracking.

The pendingToolExecutions Map keyed by toolCallId enables proper lifecycle management, and the continuationPending flag prevents duplicate continuation attempts during action draining.


238-242: LGTM! Proper synchronization before stream finalization.

Awaiting all pending tool executions ensures client tools complete before the stream is finalized and continuation checks occur. The outer try/catch in processStream will handle any rejections.


310-373: LGTM! Well-structured stream lifecycle with proper guards and continuation logic.

The concurrent stream guard, state reset, and success flag pattern correctly handle the stream lifecycle. The finally block order is intentional: setIsLoading(false) before drainPostStreamActions allows queued actions to trigger continuation without being blocked by the loading check.


444-450: LGTM! Proper deferral of continuation when stream is in progress.

Queueing the continuation check when isLoading is true ensures we don't attempt concurrent streams, and the action will be executed after the current stream completes.


498-529: LGTM! Robust action queue and continuation helpers.

The sequential draining and dual-guard (continuationPending and isLoading) in checkForContinuation correctly prevent concurrent continuation attempts. The finally block ensures the flag is always reset.

packages/typescript/smoke-tests/adapters/src/llm-simulator.ts (5)

1-40: Well-structured type definitions for the simulator.

The interfaces are clear and well-documented. The SimulatorIteration interface properly captures the possible LLM turn outcomes (content, tool calls, finish reasons).


67-94: Clean adapter class design following the established pattern.

The adapter correctly implements the expected structure with kind, name, and models properties. The reset() method provides a way to reuse the adapter instance across tests.


95-130: Complex iteration detection logic with good documentation.

The stateless operation logic for E2E tests is well thought out, enabling both stateful (unit tests) and stateless (E2E with fresh adapter instances) operation modes. The comments clearly explain the two modes.


172-194: Tool call streaming looks correct.

Each tool call gets a unique ID (either from script or auto-generated), and the chunk structure matches the expected ToolCallStreamChunk format with proper index values for parallel tools.


363-370: Factory function is clean and straightforward.

The createLLMSimulator factory provides a clean API for creating adapter instances.

packages/typescript/smoke-tests/e2e/tests/tools/client-tool.spec.ts (3)

16-44: Robust scenario selection with retry logic.

The selectScenario helper handles React hydration issues well with retries and verification. The timeouts are reasonable for test stability.


50-77: Flexible completion detection with dual conditions.

The waitForTestComplete helper smartly checks both the testComplete flag and tool counts, which provides resilience against race conditions where the flag might not be set.


253-271: Good debugging support on test failure.

Capturing screenshots and logging events/metadata on failure will significantly help with debugging flaky E2E tests.

packages/typescript/smoke-tests/adapters/src/tests/tools/multi-tool.test.ts (5)

1-16: Proper imports and reusable helper.

The collectChunks helper is a clean utility for consuming async iterables in tests. Follows Vitest conventions as per coding guidelines.


19-83: Comprehensive parallel tool execution test.

The test properly verifies that all three server tools execute, and validates both tool call and result chunk counts. Good use of vi.fn() for mocking.


85-154: Good coverage of mixed tool types.

This test exercises server, approval, and client tools simultaneously, verifying the correct behavior for each type:

  • Server tool executes immediately
  • Approval tool waits for approval
  • Client tool emits tool-input-available

Well-structured test using toolDefinition() with Zod schemas as per coding guidelines.


157-213: Solid test for repeated tool invocations with different arguments.

The test correctly verifies that the same tool can be called multiple times with different arguments, and that each call receives the correct city-specific data.


215-267: Clean tool selection test.

Verifies that only the requested tool executes when multiple tools are available. Good negative assertions for non-called tools.

packages/typescript/smoke-tests/e2e/tests/tools/server-client-sequence.spec.ts (2)

197-209: Good sequence verification using timestamps.

The test properly verifies the order of tool executions by comparing timestamps. The conditional check handles cases where timestamps might not be present.


263-284: Text-only test uses different completion strategy.

This test waits for isLoading to become false rather than using waitForTestComplete. This is appropriate since there are no tools to count, but the additional waitForTimeout(500) seems like a workaround.

packages/typescript/smoke-tests/adapters/package.json (3)

17-18: Test scripts added correctly.

The test and test:watch scripts follow standard Vitest patterns.


34-34: No issues found. The vitest version ^4.0.14 is consistent across the entire monorepo, including the root package.json and all other packages. The version aligns properly with the project's testing setup.


9-14: Exports field correctly exposes the simulator for cross-package imports using .ts files.

This is appropriate for a private test package. The TypeScript configuration supports direct .ts imports via allowImportingTsExtensions, and the e2e test package successfully imports from this export. No built output is needed since the package is private and used internally during development.

packages/typescript/smoke-tests/e2e/src/routes/tools-test.tsx (5)

1-6: LGTM!

Imports are appropriate for the test page functionality. Uses the recommended patterns: toolDefinition from @tanstack/ai, Zod for schema validation, and React hooks as expected. Based on learnings, this correctly implements the isomorphic tool system with client implementations.


27-97: LGTM!

The createTrackedTools function correctly implements the isomorphic tool system using toolDefinition() with .client() implementations. Input and output schemas use Zod as per coding guidelines. The event tracking pattern is well-suited for e2e test observability.


156-157: LGTM!

The useRef pattern ensures tools are created once with the stable addEvent callback. Since addEvent is memoized with an empty dependency array, this approach correctly avoids unnecessary re-creation of tools on re-renders.


172-198: LGTM!

The completion detection logic appropriately handles multiple completion states (complete, output-available, or having output/result). The dependency array is correct, and the early return guard at line 173 prevents premature completion detection.


589-609: Static analysis warning is a false positive in this context.

The dangerouslySetInnerHTML usage here serializes application-controlled state (toolEvents, toolCalls) that doesn't contain unsanitized user input. The JSON data originates from internal tool execution tracking and is safe. This pattern of embedding JSON in <script type="application/json"> tags for test harnesses is appropriate for e2e test infrastructure.

packages/typescript/smoke-tests/e2e/tests/tools/race-conditions.spec.ts (4)

53-80: LGTM!

The completion detection logic is well-designed with two conditions (explicit completion flag OR sufficient completed tools without loading). The trailing 200ms buffer for React state propagation is a pragmatic approach for e2e tests.


143-180: LGTM!

This test effectively validates the specific bug where sequential client tool calls could block each other. The execution order verification via event log timestamps and the duration upper bound (< 10s) are good regression guards against blocking behavior.


246-276: LGTM!

Smart handling of the parallel approvals scenario - capturing button IDs before clicking prevents issues with React re-renders removing buttons from the DOM. The brief wait between clicks is appropriate for test stability.


399-414: LGTM!

The failure hook provides excellent debugging support with screenshots and state dumps. This will be valuable for investigating flaky tests or race condition failures.

packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts (5)

19-103: LGTM!

Comprehensive error handling tests covering both synchronous throws and async rejections. The assertions correctly verify that error details propagate to tool_result chunks with the expected error messages.


106-145: LGTM!

Good coverage of the edge case where the LLM calls a tool that isn't registered. The test verifies graceful error handling rather than crashes.


147-183: LGTM!

Correctly tests the scenario where a tool definition has no execute function, verifying that tool-input-available is emitted for client-side handling. This aligns with the isomorphic tool system pattern.


224-268: LGTM!

Effective test for the maxIterations strategy. Verifies that the agent loop terminates at the specified limit even when the script would continue indefinitely, and confirms proper completion signaling via done chunks.


270-349: LGTM!

Good coverage of the auto-stringification behavior for non-string tool returns. Tests verify that objects are properly JSON-serialized and primitives like numbers are stringified correctly.

packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts (5)

19-133: LGTM!

Thorough testing of server tool sequencing with proper call order verification using mock.invocationCallOrder. The assertions correctly verify both execution count and order.


135-194: LGTM!

Excellent test of the server→client tool sequence pattern. Correctly verifies that server tools produce tool_result chunks while client tools (without execute) produce tool-input-available chunks for client-side handling.


196-276: LGTM!

Well-designed test for client→server continuation flow. The message structure correctly simulates a scenario where a client tool result has been injected, allowing verification that the server tool executes as expected in the continuation.


278-347: LGTM!

Clean implementation of the three-tool sequence test using a shared array to track execution order. The toEqual(['A', 'B', 'C']) assertion provides strict order verification.


349-418: LGTM!

Effective test for the parallel→sequential tool pattern. The indexOf comparisons correctly verify that both parallel tools (A, B) complete before the sequential tool (combine) executes.

packages/typescript/smoke-tests/adapters/src/tests/tools/client-tool.test.ts (4)

19-109: LGTM!

Comprehensive testing of client tools without execute functions. The getCurrentIteration() check is a clever way to verify the stream paused waiting for client input rather than advancing to the next iteration.


111-154: LGTM!

Good test demonstrating that client tools with execute functions behave similarly to server tools - they auto-execute and produce tool_result chunks. The output schema with Zod provides type safety for the returned data.


156-223: LGTM!

Well-structured test for the message injection pattern. The message structure with parts containing the complete tool call correctly simulates a client that has already provided a tool result, allowing verification of the continuation flow.


225-278: LGTM!

Effective test of mixed client tool behavior where one tool has an execute function and one doesn't. The assertions correctly verify that only the tool without execute emits tool-input-available.

packages/typescript/smoke-tests/adapters/src/tests/tools/approval.test.ts (6)

1-16: LGTM! Well-structured test file with proper imports and helper.

The imports are appropriate for the test suite, and the generic collectChunks helper is cleanly implemented for collecting async iterable chunks.


18-111: LGTM! Comprehensive approval-requested test coverage.

The tests properly verify:

  1. Approval-requested chunks are emitted for tools with needsApproval: true
  2. Tool execution is deferred until approval is granted
  3. Iteration stops when approval is pending

The as any casts on chunk types are acceptable for test assertions.


113-187: LGTM! Good test for approval acceptance flow.

The test correctly verifies that when an approval is pre-granted in the message history (via parts with approved: true), the tool executes and produces a tool_result chunk.


189-259: LGTM! Denial flow properly tested.

The test correctly verifies that when approval is denied (approved: false), the tool is not executed and the LLM produces a content response instead.


261-316: LGTM! Multiple approval tools test is well-designed.

The test verifies that when multiple tools requiring approval are called in the same iteration, each emits an approval-requested chunk and neither executes until approved.


318-380: LGTM! Critical edge case for mixed approval scenarios.

This test is particularly valuable as it verifies that non-approval tools execute immediately while approval-requiring tools wait, even when both are called in the same iteration.

packages/typescript/smoke-tests/adapters/src/tests/tools/server-tool.test.ts (4)

18-143: LGTM! Comprehensive server tool execution tests.

Good coverage of:

  • Basic tool execution with simple arguments
  • Complex nested argument schemas with Zod validation
  • Proper tool call and result chunk generation

The tests follow the coding guidelines by using Zod for runtime schema validation.


145-227: LGTM! Good coverage for different tool return types.

Tests properly verify:

  • Tools returning JSON objects with outputSchema
  • Tools returning plain objects (framework handles stringification)
  • Result content contains expected values

229-299: LGTM! Important ID tracking verification.

These tests ensure proper correlation between tool_call and tool_result chunks via IDs, which is critical for the agent loop to function correctly. The auto-generated ID format test (/^call-\d+-\d+$/) validates the fallback behavior.


301-343: LGTM! Good test for combined content and tool call handling.

This test verifies that content and tool calls can coexist in the same iteration, which is important for realistic LLM response patterns.

packages/typescript/smoke-tests/e2e/src/routes/api.tools-test.ts (3)

14-339: LGTM! Well-structured test scenario definitions.

The scenarios provide comprehensive coverage for:

  • Basic flows (text-only, single server/client tools)
  • Approval workflows
  • Sequential and parallel tool executions
  • Mixed server/client sequences
  • Race condition scenarios

The scenario structure aligns well with the LLM simulator's SimulatorScript interface.


341-422: LGTM! Tool definitions follow best practices.

The tool definitions properly:

  • Use Zod for runtime schema validation as per coding guidelines
  • Implement the isomorphic tool system with .server() and .client() implementations
  • Include needsApproval: true for the delete_file tool to test approval flows

492-553: LGTM! Clean scenario-to-tools mapping.

The switch statement clearly maps each scenario to its required tools. The default case returning an empty array is consistent with the text-only scenario behavior.

packages/typescript/smoke-tests/e2e/tests/tools/approval-flow.spec.ts (4)

48-125: LGTM! Well-designed test helper functions.

The helpers provide good abstraction for:

  • waitForApproval: Simple selector-based wait
  • waitForTestComplete: Multi-condition wait with good fallback logic
  • getMetadata/getEventLog: Clean data extraction from DOM

The 200ms buffer in waitForTestComplete (line 85) is a pragmatic approach for React state settlement.


127-178: LGTM! Single approval flow tests are well-structured.

Both approve and deny cases properly:

  • Select the scenario
  • Wait for approval UI
  • Perform the action
  • Verify metadata counts

The 500ms wait on line 172 for denial processing is acceptable as denial may not set testComplete immediately.


220-314: LGTM! Parallel approvals and mixed flow tests are robust.

The parallel approvals test (lines 220-253) correctly:

  • Captures button IDs before clicking to handle DOM re-renders
  • Verifies both approvals are granted

The mixed flow tests properly verify event log entries for both approval and client tool execution.


316-332: LGTM! Good debugging support for test failures.

The afterEach hook provides valuable debugging information:

  • Full-page screenshots with descriptive filenames
  • Console output of metadata and events

This will be helpful for diagnosing flaky tests in CI.

Comment on lines +273 to +360
private determineIterationFromMessages(
messages: Array<{
role: string
toolCalls?: Array<{
id: string
approval?: { id: string; approved: boolean }
}>
toolCallId?: string
parts?: Array<{
type: string
id?: string
output?: any
approval?: { approved?: boolean }
}>
}>,
): number | null {
if (!messages || messages.length === 0) {
return 0 // Fresh conversation, start at iteration 0
}

// Find all assistant messages with tool calls
const assistantToolCallMessages = messages.filter(
(m) => m.role === 'assistant' && m.toolCalls && m.toolCalls.length > 0,
)

if (assistantToolCallMessages.length === 0) {
// No tool calls in history, might be first iteration or continuation
// Check if there's a user message (fresh start)
const hasUserMessage = messages.some((m) => m.role === 'user')
return hasUserMessage ? 0 : null
}

// Get all completed tool call IDs from:
// 1. Separate tool result messages (role: 'tool')
// 2. Parts array with output set (client tool results) - UIMessage format
// 3. Parts array with approval.approved set (approval responses) - UIMessage format
// 4. toolCalls array with approval.approved set (approval responses) - ModelMessage format
const completedToolIds = new Set<string>()

for (const msg of messages) {
// Check for role: 'tool' messages (server tool results)
if (msg.role === 'tool' && msg.toolCallId) {
completedToolIds.add(msg.toolCallId)
}

// Check for UIMessage format: parts with output or approval responses
if (msg.parts) {
for (const part of msg.parts) {
if (part.type === 'tool-call' && part.id) {
// Client tool results have output set
if (part.output !== undefined) {
completedToolIds.add(part.id)
}
// Approval tools are complete when approval.approved is set (true or false)
if (part.approval?.approved !== undefined) {
completedToolIds.add(part.id)
}
}
}
}

// Check for ModelMessage format: toolCalls with approval info
if (msg.role === 'assistant' && msg.toolCalls) {
for (const tc of msg.toolCalls) {
// Approval tools are complete when approval.approved is set
if (tc.approval?.approved !== undefined) {
completedToolIds.add(tc.id)
}
}
}
}

// Count how many complete tool call rounds we have
let completedRounds = 0
for (const assistantMsg of assistantToolCallMessages) {
const toolCalls = assistantMsg.toolCalls as Array<{ id: string }>
const allToolsComplete = toolCalls.every((tc) =>
completedToolIds.has(tc.id),
)
if (allToolsComplete) {
completedRounds++
}
}

// The next iteration is completedRounds (0-indexed)
// e.g., if we've completed 1 round, we're on iteration 1
return completedRounds
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find ModelMessage and UIMessage type definitions
echo "=== Searching for ModelMessage type definition ==="
rg -n "type ModelMessage|interface ModelMessage" --type=ts -A 20 | head -100

echo -e "\n=== Searching for UIMessage type definition ==="
rg -n "type UIMessage|interface UIMessage" --type=ts -A 20 | head -100

echo -e "\n=== Searching for toolCalls in message types ==="
rg -n "toolCalls" --type=ts -B 2 -A 2 | head -80

Repository: TanStack/ai

Length of output: 13022


🏁 Script executed:

#!/bin/bash
# Find MessagePart type definition to understand tool-call parts
echo "=== Searching for MessagePart type definition ==="
rg -n "type MessagePart|interface MessagePart" --type=ts -A 30 | head -120

# Check for ToolCall definition
echo -e "\n=== Searching for ToolCall type definition ==="
rg -n "type ToolCall|interface ToolCall" --type=ts -A 15 | head -80

# Check the actual llm-simulator.ts file to see imports and context
echo -e "\n=== Checking llm-simulator.ts imports and context ==="
head -50 packages/typescript/smoke-tests/adapters/src/llm-simulator.ts

Repository: TanStack/ai

Length of output: 13699


🏁 Script executed:

#!/bin/bash
# Get complete ToolCallPart and ToolResultPart definitions
echo "=== Complete ToolCallPart definition ==="
rg -n "export interface ToolCallPart" --type=ts -A 20 | head -50

echo -e "\n=== Complete ToolResultPart definition ==="
rg -n "export interface ToolResultPart" --type=ts -A 10 | head -40

# Check if there's a different message type being used in llm-simulator
echo -e "\n=== Search for message types in llm-simulator context ==="
grep -n "messages\|Message" packages/typescript/smoke-tests/adapters/src/llm-simulator.ts | head -40

Repository: TanStack/ai

Length of output: 5228


🏁 Script executed:

#!/bin/bash
# Check how determineIterationFromMessages is called and what messages are passed
echo "=== Checking method calls and context ==="
grep -B 5 -A 5 "determineIterationFromMessages" packages/typescript/smoke-tests/adapters/src/llm-simulator.ts | head -80

# Check the options type to see what message format is expected
echo -e "\n=== Checking options and message types ==="
grep -n "options\." packages/typescript/smoke-tests/adapters/src/llm-simulator.ts | grep -E "messages|options" | head -20

Repository: TanStack/ai

Length of output: 1246


🏁 Script executed:

#!/bin/bash
# Find the ChatOptions interface to see what message types are expected
echo "=== ChatOptions in llm-simulator ==="
rg -n "ChatOptions" packages/typescript/smoke-tests/adapters/src/llm-simulator.ts -B 2 -A 2

# Find the actual ChatOptions type definition
echo -e "\n=== ChatOptions type definition ==="
rg -n "interface ChatOptions|type ChatOptions" --type=ts -A 30 | head -100

# Check what the adapter signature expects
echo -e "\n=== Check adapter method signature ==="
grep -n "adapter.*(" packages/typescript/smoke-tests/adapters/src/llm-simulator.ts | head -20
</parameter>

Repository: TanStack/ai

Length of output: 3072


🏁 Script executed:

#!/bin/bash
# Check if there's any documentation or usage showing what message types are actually passed
echo "=== Checking how messages are used in llm-simulator ==="
sed -n '273,360p' packages/typescript/smoke-tests/adapters/src/llm-simulator.ts

# Also check if there are tests that show actual usage
echo -e "\n=== Check test usage ==="
find packages/typescript/smoke-tests -name "*.test.*" -o -name "*.spec.*" | head -5
</parameter>

Repository: TanStack/ai

Length of output: 3705


🏁 Script executed:

#!/bin/bash
# Check for all calls to determineIterationFromMessages
echo "=== All calls to determineIterationFromMessages ==="
rg -n "determineIterationFromMessages" packages/typescript/smoke-tests/adapters/src/llm-simulator.ts

# Check if messages could come from different sources with different types
echo -e "\n=== Check other adapters or similar patterns ==="
find packages/typescript/smoke-tests/adapters/src -name "*.ts" -type f | xargs grep -l "messages" | head -5

# Verify the actual usage context
echo -e "\n=== Check full context around line 108 ==="
sed -n '100,130p' packages/typescript/smoke-tests/adapters/src/llm-simulator.ts

Repository: TanStack/ai

Length of output: 1938


🏁 Script executed:

#!/bin/bash
# Get the complete ToolCall type definition from types.ts
echo "=== ToolCall type in ModelMessage ==="
sed -n '62,74p' packages/typescript/ai/src/types.ts

# Compare with what the method expects
echo -e "\n=== What the method expects vs actual ToolCall ==="
echo "Method expects: toolCalls with approval?.approved"
echo "Actual ToolCall has:"
rg -n "export interface ToolCall" --type=ts -A 12 | grep -A 12 "packages/typescript/ai/src"

Repository: TanStack/ai

Length of output: 3808


🏁 Script executed:

#!/bin/bash
# Check for camelCase compliance in the method
echo "=== Checking variable/function names in determineIterationFromMessages ==="
sed -n '273,360p' packages/typescript/smoke-tests/adapters/src/llm-simulator.ts | grep -E "^\s*(const|let|var|for|function)" | head -30

Repository: TanStack/ai

Length of output: 631


Remove unreachable code that checks for UIMessage format.

The method only receives Array<ModelMessage> per ChatOptions, which doesn't have a parts property. Lines 318-330 that check msg.parts for UIMessage format are dead code—ModelMessage has toolCalls (not parts), and UIMessage has parts (but is never passed here). The logic for handling ModelMessage format with toolCalls and approval.approved is correct, but the defensive UIMessage checks should be removed to avoid confusion about supported message types.

🤖 Prompt for AI Agents
In packages/typescript/smoke-tests/adapters/src/llm-simulator.ts around lines
273-360 (specifically remove the unreachable UIMessage handling at ~lines
318-330), remove the block that checks msg.parts and iterates parts for type
'tool-call' (the UIMessage format) because this method only receives
ModelMessage instances per ChatOptions; keep the remaining logic that handles
role === 'tool' and assistant msg.toolCalls approvals so completedToolIds is
still populated, and delete any now-unused variables or comments tied only to
the removed block.

Comment on lines +136 to +139
test.beforeEach(async ({ page }) => {
await page.goto('/tools-test')
await page.waitForSelector('#run-test-button')
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing :not([disabled]) check in beforeEach.

Unlike client-tool.spec.ts which waits for #run-test-button:not([disabled]), this file only waits for #run-test-button. This could cause test flakiness if the button is initially disabled.

 test.beforeEach(async ({ page }) => {
   await page.goto('/tools-test')
-  await page.waitForSelector('#run-test-button')
+  await page.waitForSelector('#run-test-button:not([disabled])', {
+    timeout: 10000,
+  })
 })
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
test.beforeEach(async ({ page }) => {
await page.goto('/tools-test')
await page.waitForSelector('#run-test-button')
})
test.beforeEach(async ({ page }) => {
await page.goto('/tools-test')
await page.waitForSelector('#run-test-button:not([disabled])', {
timeout: 10000,
})
})
🤖 Prompt for AI Agents
In
packages/typescript/smoke-tests/e2e/tests/tools/server-client-sequence.spec.ts
around lines 136-139, the beforeEach uses await
page.waitForSelector('#run-test-button') which can pass while the button is
still disabled; change the wait to target the enabled button (e.g. await
page.waitForSelector('#run-test-button:not([disabled])') or otherwise wait until
the element is not disabled) so the test only proceeds when the run button is
interactive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants