fix(workflow): add token budget regulator for high fan-out agent runs#3321
fix(workflow): add token budget regulator for high fan-out agent runs#3321donglovejava wants to merge 3 commits into
Conversation
- Add max_tokens field to BudgetSpec in WhaleFlow IR - Implement token budget enforcement in MockWorkflowExecutor - Add global and per-leaf token budget checks - Update all test fixtures with max_tokens field - Add comprehensive tests for token budget scenarios Fixes: v0.8.63 workflow runtime token budget regulation for high fan-out sub-agent orchestration
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
There was a problem hiding this comment.
Code Review
This pull request introduces token budget limits (max_tokens) to BudgetSpec and MockWorkflowExecutor in whaleflow, implementing checks for both global and per-leaf token limits. A review comment correctly points out that the test mock_executor_stops_when_global_token_budget_is_exhausted will fail because the mock executor will attempt to run the third leaf node before encountering the budget limit, resulting in three leaf results instead of the asserted two.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| assert_eq!(execution.status, WorkflowRunStatus::BudgetExceeded); | ||
| assert_eq!(execution.leaf_results.len(), 2); | ||
| assert_eq!(execution.leaf_results[0].status, WorkflowRunStatus::Succeeded); | ||
| assert_eq!(execution.leaf_results[1].status, WorkflowRunStatus::Succeeded); | ||
| assert_eq!(execution.usage.total_tokens(), 1100); |
There was a problem hiding this comment.
The test mock_executor_stops_when_global_token_budget_is_exhausted will fail with the current implementation.
During execution:
scan-readmeruns and succeeds, using 600 tokens.scan-configruns and succeeds, using 500 tokens. The cumulative token usage becomes 1100, which exceeds the 1000 global cap. However, since the leaf itself succeeded,execution.statusremainsSucceeded.- Since
execution.statusis stillSucceeded,execution.should_stop_mock_execution()returnsfalse, and the loop inexecute_nodesdoes not break. It proceeds to execute the third leafscan-tests. - When
scan-testsruns,mock_leaf_outcomedetects thatself.leaf_tokens_used(1100) >=max_leaf_tokens(1000) and returnsBudgetExceeded. - This adds
scan-teststoleaf_resultswithBudgetExceededstatus, making the total length ofleaf_resultsequal to 3.
To fix this, the assertions should be updated to expect 3 leaf results, with the third one having BudgetExceeded status. This is consistent with how step budget exhaustion is handled (where the N+1-th step is executed and blocked/failed).
assert_eq!(execution.status, WorkflowRunStatus::BudgetExceeded);
assert_eq!(execution.leaf_results.len(), 3);
assert_eq!(execution.leaf_results[0].status, WorkflowRunStatus::Succeeded);
assert_eq!(execution.leaf_results[1].status, WorkflowRunStatus::Succeeded);
assert_eq!(execution.leaf_results[2].status, WorkflowRunStatus::BudgetExceeded);
assert_eq!(execution.usage.total_tokens(), 1100);- Add max_tokens field to AgentWorkerSpec for runtime enforcement - Map FleetTaskBudget.max_tokens to AgentWorkerSpec in fleet_task_to_worker_spec - Update CHANGELOG with Unreleased section documenting the fix This closes the enforcement gap where FleetTaskBudget.max_tokens was defined in the protocol but never consumed by the runtime execution path.
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
- Add max_tokens field to SubAgentTask struct - Track cumulative token usage (input + output) across all model turns - Stop worker with Failed status when token budget exceeded - Report budget exceeded via mailbox and progress logs - Update test fixtures to include max_tokens: None field Closes the runtime enforcement gap where FleetTaskBudget.max_tokens was defined but never consumed by the actual sub-agent execution loop.
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
Summary
This PR adds comprehensive token budget regulation for high fan-out workflow and sub-agent orchestration runs, closing the enforcement gap between the protocol layer and the actual runtime execution path.
Problem
The workflow runtime's
BudgetSpeconly hadmax_steps,timeout_secs, andmax_parallelfields, but lackedmax_tokensfor token-based budget enforcement. More critically,FleetTaskBudget.max_tokenswas defined in the protocol layer but never consumed by the runtime execution path, allowing high fan-out orchestrations to consume unbounded tokens even when explicit budgets were specified.Solution
1. WhaleFlow IR Enhancement (
crates/whaleflow/src/lib.rs)Added
max_tokens: Option<u64>toBudgetSpec#[serde(default)]for backward compatibilityImplemented token budget enforcement in
MockWorkflowExecutormax_leaf_tokensandleaf_tokens_usedtracking fieldswith_max_leaf_tokens()builder methodbudget.max_tokensmax_tokens: Some(0)immediately returnsBudgetExceededAdded comprehensive test coverage:
mock_executor_stops_when_global_token_budget_is_exhausted— verifies global cap enforcementmock_executor_honors_zero_token_leaf_budget— verifies zero-token edge casemock_executor_honors_per_leaf_token_cap— verifies per-leaf cap enforcementbudget_spec_serializes_max_tokens— verifies serialization round-trip2. Fleet Runtime Integration (
crates/tui/src/fleet/worker_runtime.rs)max_tokens: Option<u64>field toAgentWorkerSpecFleetTaskBudget.max_tokensfleet_task_to_worker_spec()so fleet tasks with token budgets actually enforce them at runtime3. Sub-Agent Execution Loop (
crates/tui/src/tools/subagent/mod.rs)max_tokens: Option<u64>field toSubAgentTaskFailedstatus when token budget exceededmax_tokens: Nonefield4. Documentation (
CHANGELOG.md)[Unreleased]section documenting the token budget regulator fixTesting
Impact
This change closes the critical enforcement gap where
FleetTaskBudget.max_tokenswas defined but never consumed, enabling proper cost control for high fan-out sub-agent orchestration. The implementation is backward compatible — existing workflows withoutmax_tokenscontinue to work unchanged.Related
FleetTaskBudget.max_tokensandGoalBudget.token_budget🤖 Generated with Claude Code