-
Notifications
You must be signed in to change notification settings - Fork 3.3k
v0.8.63: Token-budget governor for Workflows / high-fanout Agent runs #3319
Copy link
Copy link
Open
Labels
documentationImprovements or additions to documentationImprovements or additions to documentationenhancementNew feature or requestNew feature or requestreliabilityReliability, flaky behavior, retries, fallbacks, and robustnessReliability, flaky behavior, retries, fallbacks, and robustnesssubagentsSub-agent orchestration, lifecycle, and completion handlingSub-agent orchestration, lifecycle, and completion handlingv0.8.63Targeting v0.8.63Targeting v0.8.63workflow-runtimeWorkflow IR, executor, control flow, and replay runtimeWorkflow IR, executor, control flow, and replay runtime
Milestone
Metadata
Metadata
Assignees
Labels
documentationImprovements or additions to documentationImprovements or additions to documentationenhancementNew feature or requestNew feature or requestreliabilityReliability, flaky behavior, retries, fallbacks, and robustnessReliability, flaky behavior, retries, fallbacks, and robustnesssubagentsSub-agent orchestration, lifecycle, and completion handlingSub-agent orchestration, lifecycle, and completion handlingv0.8.63Targeting v0.8.63Targeting v0.8.63workflow-runtimeWorkflow IR, executor, control flow, and replay runtimeWorkflow IR, executor, control flow, and replay runtime
Projects
StatusShow more project fields
Backlog
Problem
The concurrency cap's real protective value is pacing token spend, not just counting Agents. A wide or recursive Workflow fan-out can burn a large token budget very quickly — in a live test, 20 trivial one-word Agents still consumed ~174k tokens in ~9s. Count caps are a poor proxy for spend, and there is currently no spend ceiling that paces or halts a fan-out when a token budget is exhausted. This is the safety lever that makes raising fan-out width (companion admission/queue issue) sane.
Current state / evidence
max_concurrent,max_depth) and time-based (api_timeout_secs,heartbeat_timeout_secs) — see the knob list in v0.8.63: Expose editable sub-agent recursion and concurrency controls #3304. None of them governs aggregate token spend across a fan-out.docs/SUBAGENTS.mdworker-record schema), so aggregate accounting across a run is feasible without new plumbing at the leaf.Proposed direction
budgetmodel (total / spent / remaining).Acceptance criteria
Non-goals