Agentic loop fails with maxIter > 0 for both Anthropic and OpenAI providers

# AgentKit Bug Report: Agentic Loop Message Formatting Issues in v0.13.2

*This report was generated with Claude Code while debugging AgentKit integration with Inngest.*

---

## Environment

- **@inngest/agent-kit**: 0.13.2
- **inngest**: 3.x (latest)
- **Node.js**: 20.x
- **Framework**: Next.js 14 (App Router)
- **Model Providers Tested**: Anthropic (Claude Sonnet 4), OpenAI (GPT-4o)

---

## Bug 1: Agentic Loop Message Formatting Error with maxIter > 0

### Description

When using `agent.run()` with `maxIter > 0`, the second inference call (after tool execution) fails with both Anthropic and OpenAI providers. The error messages differ but the root cause is the same - AgentKit incorrectly formats the conversation history when preparing the second inference request after a tool has been executed.

**Anthropic error:**
```
messages.2: `tool_use` blocks can only be in `assistant` messages
```

**OpenAI error:**
```
An assistant message with 'tool_calls' must be followed by tool messages responding to each 'tool_call_id'. The following tool_call_ids did not have response messages: call_xxx
```

Both errors indicate that the tool call/response pairing is malformed in the conversation history sent to the second inference.

### Reproduction Steps

1. Create an agent with Anthropic model and a simple tool:

```typescript
import { createAgent, anthropic, createTool } from '@inngest/agent-kit'
import { z } from 'zod'

const testTool = createTool({
  name: 'get_current_time',
  description: 'Get the current time',
  parameters: z.object({
    timezone: z.string().optional(),
  }),
  handler: async (input) => {
    return { time: new Date().toISOString() }
  },
})

const agent = createAgent({
  name: 'test-agent',
  system: 'You are a helpful assistant. Use tools when asked.',
  model: anthropic({
    model: 'claude-sonnet-4-20250514',
    apiKey: process.env.ANTHROPIC_API_KEY,
    defaultParameters: { max_tokens: 1024 },
  }),
  tools: [testTool],
})
```

2. Run with `maxIter: 0` (works):

```typescript
// This works - tool executes, but no follow-up response
const result = await agent.run("What time is it?", { maxIter: 0 })
// Output includes text + tool_call, toolCalls includes tool_result with actual time
```

3. Run with `maxIter: 2` (fails):

```typescript
// This fails on the second inference
const result = await agent.run("What time is it?", { maxIter: 2 })
// Error: messages.2: `tool_use` blocks can only be in `assistant` messages
```

### Expected Behavior

With `maxIter: 2`, the agent should:
1. Make first inference → model returns tool_use
2. Execute tool → get result
3. Make second inference with tool result → model returns final text response

### Actual Behavior

The second inference fails because the Anthropic API receives a malformed message array where `tool_use` content blocks are placed in a non-assistant message.

### Root Cause Analysis

Looking at the AgentKit source (`chunk-BSWKEFTT.js`), the Anthropic request parser at the `tool_call` case uses `role: m.role`:

```javascript
case "tool_call":
  return [
    ...acc,
    {
      role: m.role,  // <-- This should always be "assistant" for tool_use blocks
      content: m.tools.map((tool) => ({
        type: "tool_use",
        id: tool.id,
        input: tool.input,
        name: tool.name
      }))
    }
  ];
```

The Anthropic API requires that `tool_use` content blocks ONLY appear in messages with `role: "assistant"`. If `m.role` is anything other than `"assistant"`, the API rejects the request.

### Workaround

Use `maxIter: 0` and manually append tool results to the response:

```typescript
const agentResult = await agent.run(userInput, { maxIter: 0 })

// Extract tool results and append to response
let responseText = extractTextFromOutput(agentResult.output)
if (agentResult.toolCalls.length > 0) {
  const toolResultsText = agentResult.toolCalls
    .map(tc => `\n\n**Tool: ${tc.tool.name}**\n${JSON.stringify(tc.content.data, null, 2)}`)
    .join('')
  responseText += toolResultsText
}
```

---

## Bug 2: Step Memoization Returns Stale Results in Inngest Functions

### Description

When running `agent.run()` inside an Inngest function, AgentKit's `getStepTools()` picks up the step context via AsyncLocalStorage. This causes `step.ai.infer()` to be used instead of direct fetch, resulting in memoized/cached responses that don't reflect actual API calls.

### Symptoms

- Agent responses return in 3-8ms (impossibly fast for actual API calls)
- Same responses are returned for different inputs
- Tool handlers are not actually executed

### Reproduction Steps

```typescript
import { inngest } from './client'
import { createAgent, anthropic, createTool } from '@inngest/agent-kit'

export const chatFunction = inngest.createFunction(
  { id: 'agent-chat' },
  { event: 'agent/chat.requested' },
  async ({ event, step }) => {
    const agent = createAgent({
      name: 'chat',
      model: anthropic({ model: 'claude-sonnet-4-20250514', apiKey }),
      tools: [myTool],
    })

    // This returns cached/stale results because getStepTools()
    // finds the step from Inngest's async context
    const result = await agent.run(userInput)

    // Duration: 3-8ms (should be 2000-5000ms for real API call)
  }
)
```

### Expected Behavior

`agent.run()` should make actual API calls when invoked, returning fresh responses.

### Actual Behavior

The internal `getStepTools()` call finds the Inngest step context via AsyncLocalStorage:

```javascript
var getStepTools = async () => {
  const asyncCtx = await getAsyncCtx();
  const ctx = asyncCtx?.ctx || asyncCtx?.execution?.ctx;
  return ctx?.step;  // Returns Inngest step even if not explicitly passed
};
```

This causes `step.ai.infer()` to be used, which memoizes results per function run.

### Workaround

Explicitly pass `step: undefined` to bypass the async context lookup:

```typescript
const result = await agent.run(userInput, { step: undefined })
```

This forces AgentKit to use direct fetch instead of `step.ai.infer()`.

---

## Combined Workaround

For production use with Anthropic inside Inngest functions, both workarounds must be applied:

```typescript
const agentResult = await agent.run(userInput, {
  maxIter: 0,        // Avoid Anthropic message formatting bug
  step: undefined    // Avoid step memoization
})
```

---

## Impact

These bugs prevent using AgentKit's agentic loop with Anthropic models inside Inngest functions. Users cannot:

1. Have multi-turn tool conversations (model calls tool → gets result → responds with summary)
2. Rely on durable execution benefits for AI inference calls
3. Get accurate timing/logging for agent runs

---

## Suggested Fixes

### Bug 1 Fix

The issue appears to be in how AgentKit builds the conversation history for the second inference. Both providers require:

1. **Anthropic**: `tool_use` blocks must be in `assistant` role messages, followed by `tool_result` in `user` role
2. **OpenAI**: `tool_calls` in assistant message must be followed by `tool` role messages with matching `tool_call_id`

The agentic loop needs to correctly format the history including:
- Original user message
- Assistant response with tool call
- Tool result message (with correct role for each provider)

This likely requires fixing the message conversion logic in the core agent loop, not just individual adapters.

### Bug 2 Fix

Consider one of:

1. Only use `getStepTools()` if `step` option is explicitly provided (not undefined)
2. Add an option like `useInngestSteps: false` to disable step integration
3. Document that `step: undefined` is required when users don't want memoization

---

## Test Case

Here's a minimal reproduction that can be run outside Inngest to verify Bug 1:

```typescript
// test-agentkit-anthropic.ts
import { createAgent, anthropic, createTool } from '@inngest/agent-kit'
import { z } from 'zod'

const testTool = createTool({
  name: 'add_numbers',
  description: 'Add two numbers',
  parameters: z.object({
    a: z.number(),
    b: z.number(),
  }),
  handler: async (input) => ({ result: input.a + input.b }),
})

const agent = createAgent({
  name: 'test',
  system: 'You are a calculator. Always use tools for math.',
  model: anthropic({
    model: 'claude-sonnet-4-20250514',
    apiKey: process.env.ANTHROPIC_API_KEY!,
    defaultParameters: { max_tokens: 1024 },
  }),
  tools: [testTool],
})

async function test() {
  console.log('Testing maxIter=0...')
  const r0 = await agent.run("What is 5 + 3?", { maxIter: 0 })
  console.log('maxIter=0 result:', JSON.stringify(r0, null, 2))

  console.log('\nTesting maxIter=2...')
  try {
    const r2 = await agent.run("What is 5 + 3?", { maxIter: 2 })
    console.log('maxIter=2 result:', JSON.stringify(r2, null, 2))
  } catch (error) {
    console.error('maxIter=2 error:', error.message)
  }
}

test()
```

Expected output:
- `maxIter=0`: Success, shows tool_call and tool_result
- `maxIter=2`: Error: "messages.2: `tool_use` blocks can only be in `assistant` messages"

### OpenAI Test Results

We also tested with OpenAI GPT-4o:

```
maxIter=0: Success (duration: 1242ms)
- Tool called, result captured in toolCalls

maxIter=2: Error
- "An assistant message with 'tool_calls' must be followed by tool messages
   responding to each 'tool_call_id'. The following tool_call_ids did not
   have response messages: call_GjurWNhU2pjxwqF78udYbJln"
```

This confirms the bug is in AgentKit's core agentic loop, not provider-specific adapters.

---

*Report generated: January 2026*
*AgentKit version tested: 0.13.2*
*Tested providers: Anthropic Claude Sonnet 4, OpenAI GPT-4o*


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agentic loop fails with maxIter > 0 for both Anthropic and OpenAI providers #276

AgentKit Bug Report: Agentic Loop Message Formatting Issues in v0.13.2

Environment

Bug 1: Agentic Loop Message Formatting Error with maxIter > 0

Description

Reproduction Steps

Expected Behavior

Actual Behavior

Root Cause Analysis

Workaround

Bug 2: Step Memoization Returns Stale Results in Inngest Functions

Description

Symptoms

Reproduction Steps

Expected Behavior

Actual Behavior

Workaround

Combined Workaround

Impact

Suggested Fixes

Bug 1 Fix

Bug 2 Fix

Test Case

OpenAI Test Results

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Agentic loop fails with maxIter > 0 for both Anthropic and OpenAI providers #276

Description

AgentKit Bug Report: Agentic Loop Message Formatting Issues in v0.13.2

Environment

Bug 1: Agentic Loop Message Formatting Error with maxIter > 0

Description

Reproduction Steps

Expected Behavior

Actual Behavior

Root Cause Analysis

Workaround

Bug 2: Step Memoization Returns Stale Results in Inngest Functions

Description

Symptoms

Reproduction Steps

Expected Behavior

Actual Behavior

Workaround

Combined Workaround

Impact

Suggested Fixes

Bug 1 Fix

Bug 2 Fix

Test Case

OpenAI Test Results

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions