deepagent: max_tokens parameter is the run-level budget but reads like the per-completion cap — empty-output bug

## TL;DR

`create_deepagent(max_tokens=N)` controls the **TOTAL-RUN** token budget (cumulative input+output across every iteration), wired into `TokenLimit(max_tokens)` termination. But the parameter name `max_tokens` matches every LLM SDK on earth (OpenAI, Anthropic, Google), where it means **per-completion output cap**. Callers reasonably passing `max_tokens=65536` to get long-form output get silent empty runs instead.

Default `max_tokens=80_000` is harmful — any agent with a ~50K-token system prompt (graph-grounded research, evaluator prompts, multi-datastore RAG context) is one iteration away from being silently killed.

## Repro

```python
from locus import create_deepagent
from locus.tools.decorator import tool

@tool
def search(q: str) -> str:
    return "result"

agent = create_deepagent(
    model="oci:google.gemini-2.5-flash",
    tools=[search],
    # Realistic graph-grounded research prompt — ~50K tokens
    system_prompt="You are a research agent. " + ("CONTEXT. " * 6_000),
    reflexion=False,
    grounding=False,
    max_tokens=65536,  # caller intent: "let the model write up to 65K output tokens"
)
result = agent.run_sync("Summarize the available data.")
# result.text == ''
# metrics: iterations=1, completion_tokens=5, total_tokens=2905
# What happened: TokenLimit(65536) fired on iteration 1 because the
# 50K-token system prompt + 12K iteration overhead exceeded 65K. The
# model wrote 5 completion tokens, then the termination check killed
# the run before it could keep going.
```

## Why this is bad

1. **The name lies**: every other LLM SDK uses `max_tokens` for per-completion. Locus's meaning is unique-and-opposite.
2. **The default kills**: 80K cumulative is fine for short prompts but instantly murders long-prompt deep research, which is `create_deepagent`'s primary use case.
3. **The failure is silent**: no warning, no log line at termination — `result.text=''` and you have to dig through metrics to find `iterations=1, total_tokens=N near limit` to understand why.
4. **It cost real debugging time** in [observai/optic AFS DeepAgent integration](https://github.com/oracle/observai) — the symptom (`body_md=0 chars after 5min of work`) looked like a model bug, then a kernel termination bug, then a structured-output bug, before bisecting down to this.

## Proposed fix (breaking — beta SDK, no migrators)

1. Remove `max_tokens=` kwarg.
2. Add `total_token_budget: int | None = None` (new run-level cap, default no TokenLimit term).
3. Keep `max_output_tokens: int | None = None` (per-completion cap — already exists).
4. Reject `max_tokens=` loud with `TypeError` pointing to the new names.
5. Docstring: explicit "naming note" block + "breaking change" block.

PR opening shortly with implementation + 12 unit tests + 5 integration tests (stub + live).

## Related bugs surfaced during this work (NOT fixed by this issue)

These contributed to the same "empty output" UX in the field but are distinct from the token-budget naming bug:

**Bug #2 (separate issue to follow)**: `runtime_loop` drops the final assistant message when the agent terminates via `MaxIterations` / `TokenLimit` without calling `submit_tool`. `AgentResult.text` returns empty even when `metrics.completion_tokens > 0`.

**Bug #3 (separate issue to follow)**: `OCIModel` + Gemini rejects Pydantic-derived structured-output schemas containing `additionalProperties: false`. Need vendor-aware schema munging in the OCI provider.

## Acceptance criteria

- [ ] `max_tokens=` removed from `create_deepagent` signature
- [ ] `total_token_budget` defaults to `None` (no `TokenLimit` term)
- [ ] `max_output_tokens` flows to `AgentConfig.max_tokens` (per-completion)
- [ ] `TypeError` on legacy `max_tokens=` with migration message + version
- [ ] Unit tests covering: default no-TokenLimit, explicit-budget attaches TokenLimit, legacy rejected, `max_output_tokens` independent
- [ ] Stub integration tests for the bug shape end-to-end
- [ ] Live OCI Gemini test (gated by `RUN_LIVE_OCI=1`) reproducing the bug shape with explicit budget + verifying fix happy path

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deepagent: max_tokens parameter is the run-level budget but reads like the per-completion cap — empty-output bug #278

TL;DR

Repro

Why this is bad

Proposed fix (breaking — beta SDK, no migrators)

Related bugs surfaced during this work (NOT fixed by this issue)

Acceptance criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

deepagent: max_tokens parameter is the run-level budget but reads like the per-completion cap — empty-output bug #278

Description

TL;DR

Repro

Why this is bad

Proposed fix (breaking — beta SDK, no migrators)

Related bugs surfaced during this work (NOT fixed by this issue)

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions