Skip to content

fix(deepagent): rename max_tokens → total_token_budget (closes #278)#279

Merged
fede-kamel merged 1 commit into
mainfrom
fix/deepagent-token-budget-naming
May 28, 2026
Merged

fix(deepagent): rename max_tokens → total_token_budget (closes #278)#279
fede-kamel merged 1 commit into
mainfrom
fix/deepagent-token-budget-naming

Conversation

@fede-kamel
Copy link
Copy Markdown
Contributor

Closes #278.

What changed

  • Removed: create_deepagent(max_tokens=...) (BREAKING — beta SDK, no migrators).
  • Added: total_token_budget: int | None = None — explicit name for the run-level TokenLimit(total_token_budget) termination cap. Default None means no TokenLimit term in the algebra (was the silent-kill default at 80K).
  • Loud rejection: passing the old max_tokens= raises TypeError with a migration message pointing at both total_token_budget (run-level) and max_output_tokens (per-completion).
  • Kept: max_output_tokens — per-completion output cap, forwarded to AgentConfig.max_tokens → model provider's per-request max_tokens field.

Why

See #278. max_tokens was Locus-unique-and-opposite-to-every-other-LLM-SDK semantics (run-level cap, not per-completion). Default 80K silently killed any agent with a ~50K-token system prompt. Callers reasonably passing max_tokens=65536 for long-form output got empty results.

Tests

  • Unit (tests/unit/test_deepagent.py): 12 tests pass. Added 4 new; updated 1.
  • Integration stub (tests/integration/test_deepagent_token_budget.py): 5 new stub-mode tests covering bug shape + fix + loud rejection. No model calls.
  • Integration live (tests/integration/test_deepagent_token_budget_live.py): 2 new tests gated by RUN_LIVE_OCI=1. Verified against API_FREE_TIER + Gemini 2.5 Flash: test_long_prompt_with_old_default_would_have_died passes; test_long_prompt_with_default_none_produces_real_output xfails pending Locus bug feat: initial public release of locus #2 (separate issue — runtime_loop drops the final assistant message on non-submit-tool termination).
$ .venv/bin/python -m pytest tests/unit/test_deepagent.py tests/integration/test_deepagent_token_budget.py -q
17 passed

$ RUN_LIVE_OCI=1 OCI_PROFILE=API_FREE_TIER OCI_REGION=us-chicago-1 \
    .venv/bin/python -m pytest tests/integration/test_deepagent_token_budget_live.py -v
1 passed, 1 xfailed

Related bugs found but NOT fixed here

Two more Locus bugs surfaced during this work. Filing as separate issues:

Migration

# Before (silent foot-gun)
create_deepagent(model=..., tools=..., system_prompt=..., max_tokens=65_536)
#                                                          ^^^^^^^^^^^^^^^^
#                                                          interpreted as run-level cap

# After (explicit + clear)
create_deepagent(
    model=...,
    tools=...,
    system_prompt=...,
    max_output_tokens=65_536,  # per-completion cap
    # total_token_budget defaults to None — no TokenLimit termination
)

# Or if you really want a cumulative run cap:
create_deepagent(..., total_token_budget=500_000, max_output_tokens=65_536)

…278)

BREAKING: ``create_deepagent(max_tokens=...)`` is removed. Use
``total_token_budget=N`` for the run-level TokenLimit termination, or
``max_output_tokens=N`` for the per-completion output cap on each LLM
call.

Background — this is what was silently failing:

The old ``max_tokens`` parameter controlled the TOTAL-RUN token
budget (cumulative input+output across every iteration of one run),
wired into the typed-termination algebra as ``TokenLimit(max_tokens)``.
The name clashed with every LLM SDK on earth — OpenAI, Anthropic,
Google all use ``max_tokens`` for the per-completion output cap.

Callers reasonably passing ``max_tokens=65536`` expecting Gemini-
style "max 65K output tokens per call" got Locus's
``TokenLimit(65536)`` termination instead. On any agent with a long
system prompt (graph-grounded research, evaluator prompts, multi-
datastore RAG context), the input alone exceeded the cap on
iteration 1 → ``TokenLimit`` fired → run exited via
``TerminateEvent`` with empty output. No warning, no diagnostic.

The old 80_000 default was harmful — any agent with a ~50K-token
prompt was 1-2 iterations away from being silently killed. Cost
real debugging hours in the observai/optic AFS DeepAgent
integration before bisecting down to this.

What changes:

  - ``max_tokens=`` kwarg removed entirely (beta SDK, no migration
    needed). Rejected loud via TypeError with a message pointing
    to the new names + version so any straggler call sites fail at
    the bound boundary instead of silently mis-behaving.
  - ``total_token_budget: int | None = None`` is the new name for
    the run-level TokenLimit cap. ``None`` (default) means no
    TokenLimit term in the termination algebra — the run is bounded
    only by ToolCalled+Confidence or MaxIterations.
  - ``max_output_tokens: int | None = None`` stays — this is the
    per-completion cap forwarded to ``AgentConfig.max_tokens`` (and
    from there to the model provider's per-request max_tokens
    field). This is the knob callers usually meant when they
    passed the old name.
  - Docstring carries a loud "naming note" + "breaking change"
    block so anyone hitting the TypeError finds the migration path
    immediately.

Tests added:

  Unit (tests/unit/test_deepagent.py):
    - test_token_limit_omitted_when_budget_none — default doesn't
      add TokenLimit term (the foot-gun fix)
    - test_legacy_max_tokens_kwarg_rejected — TypeError with clear
      message; can't silently flow through to AgentConfig
    - test_max_output_tokens_propagated_independently_of_budget —
      per-completion cap lands on AgentConfig.max_tokens regardless
      of the run-budget setting
    - Updated test_typed_termination_attached to use the new name

  Integration stub (tests/integration/test_deepagent_token_budget.py):
    - 5 stub-mode tests covering bug shape + fix + loud rejection
    - No model calls; inspects termination tree directly

  Integration live (tests/integration/test_deepagent_token_budget_live.py):
    - Real OCI Gemini calls gated by RUN_LIVE_OCI=1
    - Reproduces the bug shape with explicit 80K opt-in (passes)
    - Verifies real-output happy path (xfailed pending Locus bug #2:
      runtime_loop drops final assistant message when agent
      terminates via MaxIterations without calling submit_tool —
      tracked in a follow-up issue)

Two related Locus bugs discovered during this work, NOT fixed here
(will be filed as separate issues):

  #2 — runtime_loop's final-message flush path is conditional on
       the submit_tool exit branch; agents that terminate via
       MaxIterations / TokenLimit return AgentResult.text='' even
       when the model emitted completion tokens. Live test marks
       this xfail.

  #3 — OCIModel + Gemini rejects Pydantic-derived structured-output
       schemas containing additionalProperties:false ("Unsupported
       JSON Schema feature for Gemini"). Vendor-aware schema
       munging needed in the OCI provider. Out of scope here;
       live tests omit output_schema to avoid hitting this.

Refs: #278 (this issue), observai/optic AFS DeepAgent integration
end-to-end testing surfaced all three bugs.

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
@fede-kamel fede-kamel force-pushed the fix/deepagent-token-budget-naming branch from 17dbbaf to 89bafbe Compare May 28, 2026 11:32
@fede-kamel fede-kamel merged commit 6304505 into main May 28, 2026
10 checks passed
@fede-kamel fede-kamel deleted the fix/deepagent-token-budget-naming branch May 28, 2026 11:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

OCA Verified All contributors have signed the Oracle Contributor Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

deepagent: max_tokens parameter is the run-level budget but reads like the per-completion cap — empty-output bug

1 participant