Skip to content

deepagent: max_tokens parameter is the run-level budget but reads like the per-completion cap — empty-output bug #278

@fede-kamel

Description

@fede-kamel

TL;DR

create_deepagent(max_tokens=N) controls the TOTAL-RUN token budget (cumulative input+output across every iteration), wired into TokenLimit(max_tokens) termination. But the parameter name max_tokens matches every LLM SDK on earth (OpenAI, Anthropic, Google), where it means per-completion output cap. Callers reasonably passing max_tokens=65536 to get long-form output get silent empty runs instead.

Default max_tokens=80_000 is harmful — any agent with a ~50K-token system prompt (graph-grounded research, evaluator prompts, multi-datastore RAG context) is one iteration away from being silently killed.

Repro

from locus import create_deepagent
from locus.tools.decorator import tool

@tool
def search(q: str) -> str:
    return "result"

agent = create_deepagent(
    model="oci:google.gemini-2.5-flash",
    tools=[search],
    # Realistic graph-grounded research prompt — ~50K tokens
    system_prompt="You are a research agent. " + ("CONTEXT. " * 6_000),
    reflexion=False,
    grounding=False,
    max_tokens=65536,  # caller intent: "let the model write up to 65K output tokens"
)
result = agent.run_sync("Summarize the available data.")
# result.text == ''
# metrics: iterations=1, completion_tokens=5, total_tokens=2905
# What happened: TokenLimit(65536) fired on iteration 1 because the
# 50K-token system prompt + 12K iteration overhead exceeded 65K. The
# model wrote 5 completion tokens, then the termination check killed
# the run before it could keep going.

Why this is bad

  1. The name lies: every other LLM SDK uses max_tokens for per-completion. Locus's meaning is unique-and-opposite.
  2. The default kills: 80K cumulative is fine for short prompts but instantly murders long-prompt deep research, which is create_deepagent's primary use case.
  3. The failure is silent: no warning, no log line at termination — result.text='' and you have to dig through metrics to find iterations=1, total_tokens=N near limit to understand why.
  4. It cost real debugging time in observai/optic AFS DeepAgent integration — the symptom (body_md=0 chars after 5min of work) looked like a model bug, then a kernel termination bug, then a structured-output bug, before bisecting down to this.

Proposed fix (breaking — beta SDK, no migrators)

  1. Remove max_tokens= kwarg.
  2. Add total_token_budget: int | None = None (new run-level cap, default no TokenLimit term).
  3. Keep max_output_tokens: int | None = None (per-completion cap — already exists).
  4. Reject max_tokens= loud with TypeError pointing to the new names.
  5. Docstring: explicit "naming note" block + "breaking change" block.

PR opening shortly with implementation + 12 unit tests + 5 integration tests (stub + live).

Related bugs surfaced during this work (NOT fixed by this issue)

These contributed to the same "empty output" UX in the field but are distinct from the token-budget naming bug:

Bug #2 (separate issue to follow): runtime_loop drops the final assistant message when the agent terminates via MaxIterations / TokenLimit without calling submit_tool. AgentResult.text returns empty even when metrics.completion_tokens > 0.

Bug #3 (separate issue to follow): OCIModel + Gemini rejects Pydantic-derived structured-output schemas containing additionalProperties: false. Need vendor-aware schema munging in the OCI provider.

Acceptance criteria

  • max_tokens= removed from create_deepagent signature
  • total_token_budget defaults to None (no TokenLimit term)
  • max_output_tokens flows to AgentConfig.max_tokens (per-completion)
  • TypeError on legacy max_tokens= with migration message + version
  • Unit tests covering: default no-TokenLimit, explicit-budget attaches TokenLimit, legacy rejected, max_output_tokens independent
  • Stub integration tests for the bug shape end-to-end
  • Live OCI Gemini test (gated by RUN_LIVE_OCI=1) reproducing the bug shape with explicit budget + verifying fix happy path

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions