Skip to content

Agentic loop fails with maxIter > 0 for both Anthropic and OpenAI providers #276

@amshowman

Description

@amshowman

AgentKit Bug Report: Agentic Loop Message Formatting Issues in v0.13.2

This report was generated with Claude Code while debugging AgentKit integration with Inngest.


Environment

  • @inngest/agent-kit: 0.13.2
  • inngest: 3.x (latest)
  • Node.js: 20.x
  • Framework: Next.js 14 (App Router)
  • Model Providers Tested: Anthropic (Claude Sonnet 4), OpenAI (GPT-4o)

Bug 1: Agentic Loop Message Formatting Error with maxIter > 0

Description

When using agent.run() with maxIter > 0, the second inference call (after tool execution) fails with both Anthropic and OpenAI providers. The error messages differ but the root cause is the same - AgentKit incorrectly formats the conversation history when preparing the second inference request after a tool has been executed.

Anthropic error:

messages.2: `tool_use` blocks can only be in `assistant` messages

OpenAI error:

An assistant message with 'tool_calls' must be followed by tool messages responding to each 'tool_call_id'. The following tool_call_ids did not have response messages: call_xxx

Both errors indicate that the tool call/response pairing is malformed in the conversation history sent to the second inference.

Reproduction Steps

  1. Create an agent with Anthropic model and a simple tool:
import { createAgent, anthropic, createTool } from '@inngest/agent-kit'
import { z } from 'zod'

const testTool = createTool({
  name: 'get_current_time',
  description: 'Get the current time',
  parameters: z.object({
    timezone: z.string().optional(),
  }),
  handler: async (input) => {
    return { time: new Date().toISOString() }
  },
})

const agent = createAgent({
  name: 'test-agent',
  system: 'You are a helpful assistant. Use tools when asked.',
  model: anthropic({
    model: 'claude-sonnet-4-20250514',
    apiKey: process.env.ANTHROPIC_API_KEY,
    defaultParameters: { max_tokens: 1024 },
  }),
  tools: [testTool],
})
  1. Run with maxIter: 0 (works):
// This works - tool executes, but no follow-up response
const result = await agent.run("What time is it?", { maxIter: 0 })
// Output includes text + tool_call, toolCalls includes tool_result with actual time
  1. Run with maxIter: 2 (fails):
// This fails on the second inference
const result = await agent.run("What time is it?", { maxIter: 2 })
// Error: messages.2: `tool_use` blocks can only be in `assistant` messages

Expected Behavior

With maxIter: 2, the agent should:

  1. Make first inference → model returns tool_use
  2. Execute tool → get result
  3. Make second inference with tool result → model returns final text response

Actual Behavior

The second inference fails because the Anthropic API receives a malformed message array where tool_use content blocks are placed in a non-assistant message.

Root Cause Analysis

Looking at the AgentKit source (chunk-BSWKEFTT.js), the Anthropic request parser at the tool_call case uses role: m.role:

case "tool_call":
  return [
    ...acc,
    {
      role: m.role,  // <-- This should always be "assistant" for tool_use blocks
      content: m.tools.map((tool) => ({
        type: "tool_use",
        id: tool.id,
        input: tool.input,
        name: tool.name
      }))
    }
  ];

The Anthropic API requires that tool_use content blocks ONLY appear in messages with role: "assistant". If m.role is anything other than "assistant", the API rejects the request.

Workaround

Use maxIter: 0 and manually append tool results to the response:

const agentResult = await agent.run(userInput, { maxIter: 0 })

// Extract tool results and append to response
let responseText = extractTextFromOutput(agentResult.output)
if (agentResult.toolCalls.length > 0) {
  const toolResultsText = agentResult.toolCalls
    .map(tc => `\n\n**Tool: ${tc.tool.name}**\n${JSON.stringify(tc.content.data, null, 2)}`)
    .join('')
  responseText += toolResultsText
}

Bug 2: Step Memoization Returns Stale Results in Inngest Functions

Description

When running agent.run() inside an Inngest function, AgentKit's getStepTools() picks up the step context via AsyncLocalStorage. This causes step.ai.infer() to be used instead of direct fetch, resulting in memoized/cached responses that don't reflect actual API calls.

Symptoms

  • Agent responses return in 3-8ms (impossibly fast for actual API calls)
  • Same responses are returned for different inputs
  • Tool handlers are not actually executed

Reproduction Steps

import { inngest } from './client'
import { createAgent, anthropic, createTool } from '@inngest/agent-kit'

export const chatFunction = inngest.createFunction(
  { id: 'agent-chat' },
  { event: 'agent/chat.requested' },
  async ({ event, step }) => {
    const agent = createAgent({
      name: 'chat',
      model: anthropic({ model: 'claude-sonnet-4-20250514', apiKey }),
      tools: [myTool],
    })

    // This returns cached/stale results because getStepTools()
    // finds the step from Inngest's async context
    const result = await agent.run(userInput)

    // Duration: 3-8ms (should be 2000-5000ms for real API call)
  }
)

Expected Behavior

agent.run() should make actual API calls when invoked, returning fresh responses.

Actual Behavior

The internal getStepTools() call finds the Inngest step context via AsyncLocalStorage:

var getStepTools = async () => {
  const asyncCtx = await getAsyncCtx();
  const ctx = asyncCtx?.ctx || asyncCtx?.execution?.ctx;
  return ctx?.step;  // Returns Inngest step even if not explicitly passed
};

This causes step.ai.infer() to be used, which memoizes results per function run.

Workaround

Explicitly pass step: undefined to bypass the async context lookup:

const result = await agent.run(userInput, { step: undefined })

This forces AgentKit to use direct fetch instead of step.ai.infer().


Combined Workaround

For production use with Anthropic inside Inngest functions, both workarounds must be applied:

const agentResult = await agent.run(userInput, {
  maxIter: 0,        // Avoid Anthropic message formatting bug
  step: undefined    // Avoid step memoization
})

Impact

These bugs prevent using AgentKit's agentic loop with Anthropic models inside Inngest functions. Users cannot:

  1. Have multi-turn tool conversations (model calls tool → gets result → responds with summary)
  2. Rely on durable execution benefits for AI inference calls
  3. Get accurate timing/logging for agent runs

Suggested Fixes

Bug 1 Fix

The issue appears to be in how AgentKit builds the conversation history for the second inference. Both providers require:

  1. Anthropic: tool_use blocks must be in assistant role messages, followed by tool_result in user role
  2. OpenAI: tool_calls in assistant message must be followed by tool role messages with matching tool_call_id

The agentic loop needs to correctly format the history including:

  • Original user message
  • Assistant response with tool call
  • Tool result message (with correct role for each provider)

This likely requires fixing the message conversion logic in the core agent loop, not just individual adapters.

Bug 2 Fix

Consider one of:

  1. Only use getStepTools() if step option is explicitly provided (not undefined)
  2. Add an option like useInngestSteps: false to disable step integration
  3. Document that step: undefined is required when users don't want memoization

Test Case

Here's a minimal reproduction that can be run outside Inngest to verify Bug 1:

// test-agentkit-anthropic.ts
import { createAgent, anthropic, createTool } from '@inngest/agent-kit'
import { z } from 'zod'

const testTool = createTool({
  name: 'add_numbers',
  description: 'Add two numbers',
  parameters: z.object({
    a: z.number(),
    b: z.number(),
  }),
  handler: async (input) => ({ result: input.a + input.b }),
})

const agent = createAgent({
  name: 'test',
  system: 'You are a calculator. Always use tools for math.',
  model: anthropic({
    model: 'claude-sonnet-4-20250514',
    apiKey: process.env.ANTHROPIC_API_KEY!,
    defaultParameters: { max_tokens: 1024 },
  }),
  tools: [testTool],
})

async function test() {
  console.log('Testing maxIter=0...')
  const r0 = await agent.run("What is 5 + 3?", { maxIter: 0 })
  console.log('maxIter=0 result:', JSON.stringify(r0, null, 2))

  console.log('\nTesting maxIter=2...')
  try {
    const r2 = await agent.run("What is 5 + 3?", { maxIter: 2 })
    console.log('maxIter=2 result:', JSON.stringify(r2, null, 2))
  } catch (error) {
    console.error('maxIter=2 error:', error.message)
  }
}

test()

Expected output:

  • maxIter=0: Success, shows tool_call and tool_result
  • maxIter=2: Error: "messages.2: tool_use blocks can only be in assistant messages"

OpenAI Test Results

We also tested with OpenAI GPT-4o:

maxIter=0: Success (duration: 1242ms)
- Tool called, result captured in toolCalls

maxIter=2: Error
- "An assistant message with 'tool_calls' must be followed by tool messages
   responding to each 'tool_call_id'. The following tool_call_ids did not
   have response messages: call_GjurWNhU2pjxwqF78udYbJln"

This confirms the bug is in AgentKit's core agentic loop, not provider-specific adapters.


Report generated: January 2026
AgentKit version tested: 0.13.2
Tested providers: Anthropic Claude Sonnet 4, OpenAI GPT-4o

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions