fix(deepagent): rename max_tokens → total_token_budget (closes #278)#279
Merged
Conversation
…278) BREAKING: ``create_deepagent(max_tokens=...)`` is removed. Use ``total_token_budget=N`` for the run-level TokenLimit termination, or ``max_output_tokens=N`` for the per-completion output cap on each LLM call. Background — this is what was silently failing: The old ``max_tokens`` parameter controlled the TOTAL-RUN token budget (cumulative input+output across every iteration of one run), wired into the typed-termination algebra as ``TokenLimit(max_tokens)``. The name clashed with every LLM SDK on earth — OpenAI, Anthropic, Google all use ``max_tokens`` for the per-completion output cap. Callers reasonably passing ``max_tokens=65536`` expecting Gemini- style "max 65K output tokens per call" got Locus's ``TokenLimit(65536)`` termination instead. On any agent with a long system prompt (graph-grounded research, evaluator prompts, multi- datastore RAG context), the input alone exceeded the cap on iteration 1 → ``TokenLimit`` fired → run exited via ``TerminateEvent`` with empty output. No warning, no diagnostic. The old 80_000 default was harmful — any agent with a ~50K-token prompt was 1-2 iterations away from being silently killed. Cost real debugging hours in the observai/optic AFS DeepAgent integration before bisecting down to this. What changes: - ``max_tokens=`` kwarg removed entirely (beta SDK, no migration needed). Rejected loud via TypeError with a message pointing to the new names + version so any straggler call sites fail at the bound boundary instead of silently mis-behaving. - ``total_token_budget: int | None = None`` is the new name for the run-level TokenLimit cap. ``None`` (default) means no TokenLimit term in the termination algebra — the run is bounded only by ToolCalled+Confidence or MaxIterations. - ``max_output_tokens: int | None = None`` stays — this is the per-completion cap forwarded to ``AgentConfig.max_tokens`` (and from there to the model provider's per-request max_tokens field). This is the knob callers usually meant when they passed the old name. - Docstring carries a loud "naming note" + "breaking change" block so anyone hitting the TypeError finds the migration path immediately. Tests added: Unit (tests/unit/test_deepagent.py): - test_token_limit_omitted_when_budget_none — default doesn't add TokenLimit term (the foot-gun fix) - test_legacy_max_tokens_kwarg_rejected — TypeError with clear message; can't silently flow through to AgentConfig - test_max_output_tokens_propagated_independently_of_budget — per-completion cap lands on AgentConfig.max_tokens regardless of the run-budget setting - Updated test_typed_termination_attached to use the new name Integration stub (tests/integration/test_deepagent_token_budget.py): - 5 stub-mode tests covering bug shape + fix + loud rejection - No model calls; inspects termination tree directly Integration live (tests/integration/test_deepagent_token_budget_live.py): - Real OCI Gemini calls gated by RUN_LIVE_OCI=1 - Reproduces the bug shape with explicit 80K opt-in (passes) - Verifies real-output happy path (xfailed pending Locus bug #2: runtime_loop drops final assistant message when agent terminates via MaxIterations without calling submit_tool — tracked in a follow-up issue) Two related Locus bugs discovered during this work, NOT fixed here (will be filed as separate issues): #2 — runtime_loop's final-message flush path is conditional on the submit_tool exit branch; agents that terminate via MaxIterations / TokenLimit return AgentResult.text='' even when the model emitted completion tokens. Live test marks this xfail. #3 — OCIModel + Gemini rejects Pydantic-derived structured-output schemas containing additionalProperties:false ("Unsupported JSON Schema feature for Gemini"). Vendor-aware schema munging needed in the OCI provider. Out of scope here; live tests omit output_schema to avoid hitting this. Refs: #278 (this issue), observai/optic AFS DeepAgent integration end-to-end testing surfaced all three bugs. Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
17dbbaf to
89bafbe
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #278.
What changed
create_deepagent(max_tokens=...)(BREAKING — beta SDK, no migrators).total_token_budget: int | None = None— explicit name for the run-levelTokenLimit(total_token_budget)termination cap. DefaultNonemeans no TokenLimit term in the algebra (was the silent-kill default at 80K).max_tokens=raisesTypeErrorwith a migration message pointing at bothtotal_token_budget(run-level) andmax_output_tokens(per-completion).max_output_tokens— per-completion output cap, forwarded toAgentConfig.max_tokens→ model provider's per-requestmax_tokensfield.Why
See #278.
max_tokenswas Locus-unique-and-opposite-to-every-other-LLM-SDK semantics (run-level cap, not per-completion). Default 80K silently killed any agent with a ~50K-token system prompt. Callers reasonably passingmax_tokens=65536for long-form output got empty results.Tests
tests/unit/test_deepagent.py): 12 tests pass. Added 4 new; updated 1.tests/integration/test_deepagent_token_budget.py): 5 new stub-mode tests covering bug shape + fix + loud rejection. No model calls.tests/integration/test_deepagent_token_budget_live.py): 2 new tests gated byRUN_LIVE_OCI=1. Verified against API_FREE_TIER + Gemini 2.5 Flash:test_long_prompt_with_old_default_would_have_diedpasses;test_long_prompt_with_default_none_produces_real_outputxfails pending Locus bug feat: initial public release of locus #2 (separate issue —runtime_loopdrops the final assistant message on non-submit-tool termination).Related bugs found but NOT fixed here
Two more Locus bugs surfaced during this work. Filing as separate issues:
runtime_loopdrops the final assistant message when the agent terminates viaMaxIterations/TokenLimitwithout callingsubmit_tool.AgentResult.text=''even whenmetrics.completion_tokens > 0.OCIModel+ Gemini rejects Pydantic-derived structured-output schemas containingadditionalProperties: false. Need vendor-aware schema munging in the OCI provider.Migration