fix(workflow): add token budget regulator for high fan-out agent runs by donglovejava · Pull Request #3321 · Hmbown/CodeWhale

donglovejava · 2026-06-19T05:24:36Z

Summary

This PR adds comprehensive token budget regulation for high fan-out workflow and sub-agent orchestration runs, closing the enforcement gap between the protocol layer and the actual runtime execution path.

Problem

The workflow runtime's BudgetSpec only had max_steps, timeout_secs, and max_parallel fields, but lacked max_tokens for token-based budget enforcement. More critically, FleetTaskBudget.max_tokens was defined in the protocol layer but never consumed by the runtime execution path, allowing high fan-out orchestrations to consume unbounded tokens even when explicit budgets were specified.

Solution

1. WhaleFlow IR Enhancement (`crates/whaleflow/src/lib.rs`)

Added max_tokens: Option<u64> to BudgetSpec
- Applies at workflow, branch, and leaf levels
- Serialized with #[serde(default)] for backward compatibility
Implemented token budget enforcement in MockWorkflowExecutor
- Added max_leaf_tokens and leaf_tokens_used tracking fields
- Added with_max_leaf_tokens() builder method
- Global token budget check: stops execution when cumulative tokens exceed cap
- Per-leaf token budget check: rejects individual leaves that exceed their budget.max_tokens
- Zero-token budget check: max_tokens: Some(0) immediately returns BudgetExceeded
Added comprehensive test coverage:
- mock_executor_stops_when_global_token_budget_is_exhausted — verifies global cap enforcement
- mock_executor_honors_zero_token_leaf_budget — verifies zero-token edge case
- mock_executor_honors_per_leaf_token_cap — verifies per-leaf cap enforcement
- budget_spec_serializes_max_tokens — verifies serialization round-trip

2. Fleet Runtime Integration (`crates/tui/src/fleet/worker_runtime.rs`)

Added max_tokens: Option<u64> field to AgentWorkerSpec
- Receives token budget from FleetTaskBudget.max_tokens
- Wired through fleet_task_to_worker_spec() so fleet tasks with token budgets actually enforce them at runtime

3. Sub-Agent Execution Loop (`crates/tui/src/tools/subagent/mod.rs`)

Added max_tokens: Option<u64> field to SubAgentTask
- Tracks cumulative token usage (input + output) across all model turns
- Stops worker with Failed status when token budget exceeded
- Reports budget exceeded via mailbox and progress logs
- Updates test fixtures to include max_tokens: None field

4. Documentation (`CHANGELOG.md`)

Added entries to [Unreleased] section documenting the token budget regulator fix
Describes the enforcement gap closure between IR/protocol layers and runtime

Testing

All existing tests pass with updated fixtures
4 new tests specifically validate token budget behavior in WhaleFlow IR
Test fixtures updated in subagent module to accommodate new field
Serialization tests confirm backward compatibility

Impact

This change closes the critical enforcement gap where FleetTaskBudget.max_tokens was defined but never consumed, enabling proper cost control for high fan-out sub-agent orchestration. The implementation is backward compatible — existing workflows without max_tokens continue to work unchanged.

- Add max_tokens field to BudgetSpec in WhaleFlow IR - Implement token budget enforcement in MockWorkflowExecutor - Add global and per-leaf token budget checks - Update all test fixtures with max_tokens field - Add comprehensive tests for token budget scenarios Fixes: v0.8.63 workflow runtime token budget regulation for high fan-out sub-agent orchestration

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

gemini-code-assist

Code Review

This pull request introduces token budget limits (max_tokens) to BudgetSpec and MockWorkflowExecutor in whaleflow, implementing checks for both global and per-leaf token limits. A review comment correctly points out that the test mock_executor_stops_when_global_token_budget_is_exhausted will fail because the mock executor will attempt to run the third leaf node before encountering the budget limit, resulting in three leaf results instead of the asserted two.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-19T05:26:12Z

+        assert_eq!(execution.status, WorkflowRunStatus::BudgetExceeded);
+        assert_eq!(execution.leaf_results.len(), 2);
+        assert_eq!(execution.leaf_results[0].status, WorkflowRunStatus::Succeeded);
+        assert_eq!(execution.leaf_results[1].status, WorkflowRunStatus::Succeeded);
+        assert_eq!(execution.usage.total_tokens(), 1100);


The test mock_executor_stops_when_global_token_budget_is_exhausted will fail with the current implementation.

During execution:

scan-readme runs and succeeds, using 600 tokens.

scan-config runs and succeeds, using 500 tokens. The cumulative token usage becomes 1100, which exceeds the 1000 global cap. However, since the leaf itself succeeded, execution.status remains Succeeded.

Since execution.status is still Succeeded, execution.should_stop_mock_execution() returns false, and the loop in execute_nodes does not break. It proceeds to execute the third leaf scan-tests.

When scan-tests runs, mock_leaf_outcome detects that self.leaf_tokens_used (1100) >= max_leaf_tokens (1000) and returns BudgetExceeded.

This adds scan-tests to leaf_results with BudgetExceeded status, making the total length of leaf_results equal to 3.

To fix this, the assertions should be updated to expect 3 leaf results, with the third one having BudgetExceeded status. This is consistent with how step budget exhaustion is handled (where the N+1-th step is executed and blocked/failed).

assert_eq!(execution.status, WorkflowRunStatus::BudgetExceeded); assert_eq!(execution.leaf_results.len(), 3); assert_eq!(execution.leaf_results[0].status, WorkflowRunStatus::Succeeded); assert_eq!(execution.leaf_results[1].status, WorkflowRunStatus::Succeeded); assert_eq!(execution.leaf_results[2].status, WorkflowRunStatus::BudgetExceeded); assert_eq!(execution.usage.total_tokens(), 1100);

- Add max_tokens field to AgentWorkerSpec for runtime enforcement - Map FleetTaskBudget.max_tokens to AgentWorkerSpec in fleet_task_to_worker_spec - Update CHANGELOG with Unreleased section documenting the fix This closes the enforcement gap where FleetTaskBudget.max_tokens was defined in the protocol but never consumed by the runtime execution path.

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

- Add max_tokens field to SubAgentTask struct - Track cumulative token usage (input + output) across all model turns - Stop worker with Failed status when token budget exceeded - Report budget exceeded via mailbox and progress logs - Update test fixtures to include max_tokens: None field Closes the runtime enforcement gap where FleetTaskBudget.max_tokens was defined but never consumed by the actual sub-agent execution loop.

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

donglovejava requested a review from Hmbown as a code owner June 19, 2026 05:24

greptile-apps Bot reviewed Jun 19, 2026

View reviewed changes

gemini-code-assist Bot reviewed Jun 19, 2026

View reviewed changes

greptile-apps Bot reviewed Jun 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(workflow): add token budget regulator for high fan-out agent runs#3321

fix(workflow): add token budget regulator for high fan-out agent runs#3321
donglovejava wants to merge 3 commits into
Hmbown:mainfrom
donglovejava:fix/workflow-token-budget-regulator

donglovejava commented Jun 19, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 19, 2026

Uh oh!

greptile-apps Bot left a comment

Uh oh!

greptile-apps Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

donglovejava commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

1. WhaleFlow IR Enhancement (crates/whaleflow/src/lib.rs)

2. Fleet Runtime Integration (crates/tui/src/fleet/worker_runtime.rs)

3. Sub-Agent Execution Loop (crates/tui/src/tools/subagent/mod.rs)

4. Documentation (CHANGELOG.md)

Testing

Impact

Related

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

donglovejava commented Jun 19, 2026 •

edited

Loading

1. WhaleFlow IR Enhancement (`crates/whaleflow/src/lib.rs`)

2. Fleet Runtime Integration (`crates/tui/src/fleet/worker_runtime.rs`)

3. Sub-Agent Execution Loop (`crates/tui/src/tools/subagent/mod.rs`)

4. Documentation (`CHANGELOG.md`)