-
Notifications
You must be signed in to change notification settings - Fork 120
Description
AgentKit Bug Report: Agentic Loop Message Formatting Issues in v0.13.2
This report was generated with Claude Code while debugging AgentKit integration with Inngest.
Environment
- @inngest/agent-kit: 0.13.2
- inngest: 3.x (latest)
- Node.js: 20.x
- Framework: Next.js 14 (App Router)
- Model Providers Tested: Anthropic (Claude Sonnet 4), OpenAI (GPT-4o)
Bug 1: Agentic Loop Message Formatting Error with maxIter > 0
Description
When using agent.run() with maxIter > 0, the second inference call (after tool execution) fails with both Anthropic and OpenAI providers. The error messages differ but the root cause is the same - AgentKit incorrectly formats the conversation history when preparing the second inference request after a tool has been executed.
Anthropic error:
messages.2: `tool_use` blocks can only be in `assistant` messages
OpenAI error:
An assistant message with 'tool_calls' must be followed by tool messages responding to each 'tool_call_id'. The following tool_call_ids did not have response messages: call_xxx
Both errors indicate that the tool call/response pairing is malformed in the conversation history sent to the second inference.
Reproduction Steps
- Create an agent with Anthropic model and a simple tool:
import { createAgent, anthropic, createTool } from '@inngest/agent-kit'
import { z } from 'zod'
const testTool = createTool({
name: 'get_current_time',
description: 'Get the current time',
parameters: z.object({
timezone: z.string().optional(),
}),
handler: async (input) => {
return { time: new Date().toISOString() }
},
})
const agent = createAgent({
name: 'test-agent',
system: 'You are a helpful assistant. Use tools when asked.',
model: anthropic({
model: 'claude-sonnet-4-20250514',
apiKey: process.env.ANTHROPIC_API_KEY,
defaultParameters: { max_tokens: 1024 },
}),
tools: [testTool],
})- Run with
maxIter: 0(works):
// This works - tool executes, but no follow-up response
const result = await agent.run("What time is it?", { maxIter: 0 })
// Output includes text + tool_call, toolCalls includes tool_result with actual time- Run with
maxIter: 2(fails):
// This fails on the second inference
const result = await agent.run("What time is it?", { maxIter: 2 })
// Error: messages.2: `tool_use` blocks can only be in `assistant` messagesExpected Behavior
With maxIter: 2, the agent should:
- Make first inference → model returns tool_use
- Execute tool → get result
- Make second inference with tool result → model returns final text response
Actual Behavior
The second inference fails because the Anthropic API receives a malformed message array where tool_use content blocks are placed in a non-assistant message.
Root Cause Analysis
Looking at the AgentKit source (chunk-BSWKEFTT.js), the Anthropic request parser at the tool_call case uses role: m.role:
case "tool_call":
return [
...acc,
{
role: m.role, // <-- This should always be "assistant" for tool_use blocks
content: m.tools.map((tool) => ({
type: "tool_use",
id: tool.id,
input: tool.input,
name: tool.name
}))
}
];The Anthropic API requires that tool_use content blocks ONLY appear in messages with role: "assistant". If m.role is anything other than "assistant", the API rejects the request.
Workaround
Use maxIter: 0 and manually append tool results to the response:
const agentResult = await agent.run(userInput, { maxIter: 0 })
// Extract tool results and append to response
let responseText = extractTextFromOutput(agentResult.output)
if (agentResult.toolCalls.length > 0) {
const toolResultsText = agentResult.toolCalls
.map(tc => `\n\n**Tool: ${tc.tool.name}**\n${JSON.stringify(tc.content.data, null, 2)}`)
.join('')
responseText += toolResultsText
}Bug 2: Step Memoization Returns Stale Results in Inngest Functions
Description
When running agent.run() inside an Inngest function, AgentKit's getStepTools() picks up the step context via AsyncLocalStorage. This causes step.ai.infer() to be used instead of direct fetch, resulting in memoized/cached responses that don't reflect actual API calls.
Symptoms
- Agent responses return in 3-8ms (impossibly fast for actual API calls)
- Same responses are returned for different inputs
- Tool handlers are not actually executed
Reproduction Steps
import { inngest } from './client'
import { createAgent, anthropic, createTool } from '@inngest/agent-kit'
export const chatFunction = inngest.createFunction(
{ id: 'agent-chat' },
{ event: 'agent/chat.requested' },
async ({ event, step }) => {
const agent = createAgent({
name: 'chat',
model: anthropic({ model: 'claude-sonnet-4-20250514', apiKey }),
tools: [myTool],
})
// This returns cached/stale results because getStepTools()
// finds the step from Inngest's async context
const result = await agent.run(userInput)
// Duration: 3-8ms (should be 2000-5000ms for real API call)
}
)Expected Behavior
agent.run() should make actual API calls when invoked, returning fresh responses.
Actual Behavior
The internal getStepTools() call finds the Inngest step context via AsyncLocalStorage:
var getStepTools = async () => {
const asyncCtx = await getAsyncCtx();
const ctx = asyncCtx?.ctx || asyncCtx?.execution?.ctx;
return ctx?.step; // Returns Inngest step even if not explicitly passed
};This causes step.ai.infer() to be used, which memoizes results per function run.
Workaround
Explicitly pass step: undefined to bypass the async context lookup:
const result = await agent.run(userInput, { step: undefined })This forces AgentKit to use direct fetch instead of step.ai.infer().
Combined Workaround
For production use with Anthropic inside Inngest functions, both workarounds must be applied:
const agentResult = await agent.run(userInput, {
maxIter: 0, // Avoid Anthropic message formatting bug
step: undefined // Avoid step memoization
})Impact
These bugs prevent using AgentKit's agentic loop with Anthropic models inside Inngest functions. Users cannot:
- Have multi-turn tool conversations (model calls tool → gets result → responds with summary)
- Rely on durable execution benefits for AI inference calls
- Get accurate timing/logging for agent runs
Suggested Fixes
Bug 1 Fix
The issue appears to be in how AgentKit builds the conversation history for the second inference. Both providers require:
- Anthropic:
tool_useblocks must be inassistantrole messages, followed bytool_resultinuserrole - OpenAI:
tool_callsin assistant message must be followed bytoolrole messages with matchingtool_call_id
The agentic loop needs to correctly format the history including:
- Original user message
- Assistant response with tool call
- Tool result message (with correct role for each provider)
This likely requires fixing the message conversion logic in the core agent loop, not just individual adapters.
Bug 2 Fix
Consider one of:
- Only use
getStepTools()ifstepoption is explicitly provided (not undefined) - Add an option like
useInngestSteps: falseto disable step integration - Document that
step: undefinedis required when users don't want memoization
Test Case
Here's a minimal reproduction that can be run outside Inngest to verify Bug 1:
// test-agentkit-anthropic.ts
import { createAgent, anthropic, createTool } from '@inngest/agent-kit'
import { z } from 'zod'
const testTool = createTool({
name: 'add_numbers',
description: 'Add two numbers',
parameters: z.object({
a: z.number(),
b: z.number(),
}),
handler: async (input) => ({ result: input.a + input.b }),
})
const agent = createAgent({
name: 'test',
system: 'You are a calculator. Always use tools for math.',
model: anthropic({
model: 'claude-sonnet-4-20250514',
apiKey: process.env.ANTHROPIC_API_KEY!,
defaultParameters: { max_tokens: 1024 },
}),
tools: [testTool],
})
async function test() {
console.log('Testing maxIter=0...')
const r0 = await agent.run("What is 5 + 3?", { maxIter: 0 })
console.log('maxIter=0 result:', JSON.stringify(r0, null, 2))
console.log('\nTesting maxIter=2...')
try {
const r2 = await agent.run("What is 5 + 3?", { maxIter: 2 })
console.log('maxIter=2 result:', JSON.stringify(r2, null, 2))
} catch (error) {
console.error('maxIter=2 error:', error.message)
}
}
test()Expected output:
maxIter=0: Success, shows tool_call and tool_resultmaxIter=2: Error: "messages.2:tool_useblocks can only be inassistantmessages"
OpenAI Test Results
We also tested with OpenAI GPT-4o:
maxIter=0: Success (duration: 1242ms)
- Tool called, result captured in toolCalls
maxIter=2: Error
- "An assistant message with 'tool_calls' must be followed by tool messages
responding to each 'tool_call_id'. The following tool_call_ids did not
have response messages: call_GjurWNhU2pjxwqF78udYbJln"
This confirms the bug is in AgentKit's core agentic loop, not provider-specific adapters.
Report generated: January 2026
AgentKit version tested: 0.13.2
Tested providers: Anthropic Claude Sonnet 4, OpenAI GPT-4o