TL;DR
create_deepagent(max_tokens=N) controls the TOTAL-RUN token budget (cumulative input+output across every iteration), wired into TokenLimit(max_tokens) termination. But the parameter name max_tokens matches every LLM SDK on earth (OpenAI, Anthropic, Google), where it means per-completion output cap. Callers reasonably passing max_tokens=65536 to get long-form output get silent empty runs instead.
Default max_tokens=80_000 is harmful — any agent with a ~50K-token system prompt (graph-grounded research, evaluator prompts, multi-datastore RAG context) is one iteration away from being silently killed.
Repro
from locus import create_deepagent
from locus.tools.decorator import tool
@tool
def search(q: str) -> str:
return "result"
agent = create_deepagent(
model="oci:google.gemini-2.5-flash",
tools=[search],
# Realistic graph-grounded research prompt — ~50K tokens
system_prompt="You are a research agent. " + ("CONTEXT. " * 6_000),
reflexion=False,
grounding=False,
max_tokens=65536, # caller intent: "let the model write up to 65K output tokens"
)
result = agent.run_sync("Summarize the available data.")
# result.text == ''
# metrics: iterations=1, completion_tokens=5, total_tokens=2905
# What happened: TokenLimit(65536) fired on iteration 1 because the
# 50K-token system prompt + 12K iteration overhead exceeded 65K. The
# model wrote 5 completion tokens, then the termination check killed
# the run before it could keep going.
Why this is bad
- The name lies: every other LLM SDK uses
max_tokens for per-completion. Locus's meaning is unique-and-opposite.
- The default kills: 80K cumulative is fine for short prompts but instantly murders long-prompt deep research, which is
create_deepagent's primary use case.
- The failure is silent: no warning, no log line at termination —
result.text='' and you have to dig through metrics to find iterations=1, total_tokens=N near limit to understand why.
- It cost real debugging time in observai/optic AFS DeepAgent integration — the symptom (
body_md=0 chars after 5min of work) looked like a model bug, then a kernel termination bug, then a structured-output bug, before bisecting down to this.
Proposed fix (breaking — beta SDK, no migrators)
- Remove
max_tokens= kwarg.
- Add
total_token_budget: int | None = None (new run-level cap, default no TokenLimit term).
- Keep
max_output_tokens: int | None = None (per-completion cap — already exists).
- Reject
max_tokens= loud with TypeError pointing to the new names.
- Docstring: explicit "naming note" block + "breaking change" block.
PR opening shortly with implementation + 12 unit tests + 5 integration tests (stub + live).
Related bugs surfaced during this work (NOT fixed by this issue)
These contributed to the same "empty output" UX in the field but are distinct from the token-budget naming bug:
Bug #2 (separate issue to follow): runtime_loop drops the final assistant message when the agent terminates via MaxIterations / TokenLimit without calling submit_tool. AgentResult.text returns empty even when metrics.completion_tokens > 0.
Bug #3 (separate issue to follow): OCIModel + Gemini rejects Pydantic-derived structured-output schemas containing additionalProperties: false. Need vendor-aware schema munging in the OCI provider.
Acceptance criteria
TL;DR
create_deepagent(max_tokens=N)controls the TOTAL-RUN token budget (cumulative input+output across every iteration), wired intoTokenLimit(max_tokens)termination. But the parameter namemax_tokensmatches every LLM SDK on earth (OpenAI, Anthropic, Google), where it means per-completion output cap. Callers reasonably passingmax_tokens=65536to get long-form output get silent empty runs instead.Default
max_tokens=80_000is harmful — any agent with a ~50K-token system prompt (graph-grounded research, evaluator prompts, multi-datastore RAG context) is one iteration away from being silently killed.Repro
Why this is bad
max_tokensfor per-completion. Locus's meaning is unique-and-opposite.create_deepagent's primary use case.result.text=''and you have to dig through metrics to finditerations=1, total_tokens=N near limitto understand why.body_md=0 chars after 5min of work) looked like a model bug, then a kernel termination bug, then a structured-output bug, before bisecting down to this.Proposed fix (breaking — beta SDK, no migrators)
max_tokens=kwarg.total_token_budget: int | None = None(new run-level cap, default no TokenLimit term).max_output_tokens: int | None = None(per-completion cap — already exists).max_tokens=loud withTypeErrorpointing to the new names.PR opening shortly with implementation + 12 unit tests + 5 integration tests (stub + live).
Related bugs surfaced during this work (NOT fixed by this issue)
These contributed to the same "empty output" UX in the field but are distinct from the token-budget naming bug:
Bug #2 (separate issue to follow):
runtime_loopdrops the final assistant message when the agent terminates viaMaxIterations/TokenLimitwithout callingsubmit_tool.AgentResult.textreturns empty even whenmetrics.completion_tokens > 0.Bug #3 (separate issue to follow):
OCIModel+ Gemini rejects Pydantic-derived structured-output schemas containingadditionalProperties: false. Need vendor-aware schema munging in the OCI provider.Acceptance criteria
max_tokens=removed fromcreate_deepagentsignaturetotal_token_budgetdefaults toNone(noTokenLimitterm)max_output_tokensflows toAgentConfig.max_tokens(per-completion)TypeErroron legacymax_tokens=with migration message + versionmax_output_tokensindependentRUN_LIVE_OCI=1) reproducing the bug shape with explicit budget + verifying fix happy path