Skip to content

fix(workflow): add token budget regulator for high fan-out agent runs#3321

Open
donglovejava wants to merge 3 commits into
Hmbown:mainfrom
donglovejava:fix/workflow-token-budget-regulator
Open

fix(workflow): add token budget regulator for high fan-out agent runs#3321
donglovejava wants to merge 3 commits into
Hmbown:mainfrom
donglovejava:fix/workflow-token-budget-regulator

Conversation

@donglovejava

@donglovejava donglovejava commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR adds comprehensive token budget regulation for high fan-out workflow and sub-agent orchestration runs, closing the enforcement gap between the protocol layer and the actual runtime execution path.

Problem

The workflow runtime's BudgetSpec only had max_steps, timeout_secs, and max_parallel fields, but lacked max_tokens for token-based budget enforcement. More critically, FleetTaskBudget.max_tokens was defined in the protocol layer but never consumed by the runtime execution path, allowing high fan-out orchestrations to consume unbounded tokens even when explicit budgets were specified.

Solution

1. WhaleFlow IR Enhancement (crates/whaleflow/src/lib.rs)

  • Added max_tokens: Option<u64> to BudgetSpec

    • Applies at workflow, branch, and leaf levels
    • Serialized with #[serde(default)] for backward compatibility
  • Implemented token budget enforcement in MockWorkflowExecutor

    • Added max_leaf_tokens and leaf_tokens_used tracking fields
    • Added with_max_leaf_tokens() builder method
    • Global token budget check: stops execution when cumulative tokens exceed cap
    • Per-leaf token budget check: rejects individual leaves that exceed their budget.max_tokens
    • Zero-token budget check: max_tokens: Some(0) immediately returns BudgetExceeded
  • Added comprehensive test coverage:

    • mock_executor_stops_when_global_token_budget_is_exhausted — verifies global cap enforcement
    • mock_executor_honors_zero_token_leaf_budget — verifies zero-token edge case
    • mock_executor_honors_per_leaf_token_cap — verifies per-leaf cap enforcement
    • budget_spec_serializes_max_tokens — verifies serialization round-trip

2. Fleet Runtime Integration (crates/tui/src/fleet/worker_runtime.rs)

  • Added max_tokens: Option<u64> field to AgentWorkerSpec
    • Receives token budget from FleetTaskBudget.max_tokens
    • Wired through fleet_task_to_worker_spec() so fleet tasks with token budgets actually enforce them at runtime

3. Sub-Agent Execution Loop (crates/tui/src/tools/subagent/mod.rs)

  • Added max_tokens: Option<u64> field to SubAgentTask
    • Tracks cumulative token usage (input + output) across all model turns
    • Stops worker with Failed status when token budget exceeded
    • Reports budget exceeded via mailbox and progress logs
    • Updates test fixtures to include max_tokens: None field

4. Documentation (CHANGELOG.md)

  • Added entries to [Unreleased] section documenting the token budget regulator fix
  • Describes the enforcement gap closure between IR/protocol layers and runtime

Testing

  • All existing tests pass with updated fixtures
  • 4 new tests specifically validate token budget behavior in WhaleFlow IR
  • Test fixtures updated in subagent module to accommodate new field
  • Serialization tests confirm backward compatibility

Impact

This change closes the critical enforcement gap where FleetTaskBudget.max_tokens was defined but never consumed, enabling proper cost control for high fan-out sub-agent orchestration. The implementation is backward compatible — existing workflows without max_tokens continue to work unchanged.

Related

  • Addresses v0.8.63 workflow runtime token budget regulation
  • Complements existing FleetTaskBudget.max_tokens and GoalBudget.token_budget
  • Enables proper cost control for high fan-out sub-agent orchestration

🤖 Generated with Claude Code

- Add max_tokens field to BudgetSpec in WhaleFlow IR
- Implement token budget enforcement in MockWorkflowExecutor
- Add global and per-leaf token budget checks
- Update all test fixtures with max_tokens field
- Add comprehensive tests for token budget scenarios

Fixes: v0.8.63 workflow runtime token budget regulation for high fan-out sub-agent orchestration
@donglovejava donglovejava requested a review from Hmbown as a code owner June 19, 2026 05:24

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces token budget limits (max_tokens) to BudgetSpec and MockWorkflowExecutor in whaleflow, implementing checks for both global and per-leaf token limits. A review comment correctly points out that the test mock_executor_stops_when_global_token_budget_is_exhausted will fail because the mock executor will attempt to run the third leaf node before encountering the budget limit, resulting in three leaf results instead of the asserted two.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +2701 to +2705
assert_eq!(execution.status, WorkflowRunStatus::BudgetExceeded);
assert_eq!(execution.leaf_results.len(), 2);
assert_eq!(execution.leaf_results[0].status, WorkflowRunStatus::Succeeded);
assert_eq!(execution.leaf_results[1].status, WorkflowRunStatus::Succeeded);
assert_eq!(execution.usage.total_tokens(), 1100);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The test mock_executor_stops_when_global_token_budget_is_exhausted will fail with the current implementation.

During execution:

  1. scan-readme runs and succeeds, using 600 tokens.
  2. scan-config runs and succeeds, using 500 tokens. The cumulative token usage becomes 1100, which exceeds the 1000 global cap. However, since the leaf itself succeeded, execution.status remains Succeeded.
  3. Since execution.status is still Succeeded, execution.should_stop_mock_execution() returns false, and the loop in execute_nodes does not break. It proceeds to execute the third leaf scan-tests.
  4. When scan-tests runs, mock_leaf_outcome detects that self.leaf_tokens_used (1100) >= max_leaf_tokens (1000) and returns BudgetExceeded.
  5. This adds scan-tests to leaf_results with BudgetExceeded status, making the total length of leaf_results equal to 3.

To fix this, the assertions should be updated to expect 3 leaf results, with the third one having BudgetExceeded status. This is consistent with how step budget exhaustion is handled (where the N+1-th step is executed and blocked/failed).

        assert_eq!(execution.status, WorkflowRunStatus::BudgetExceeded);
        assert_eq!(execution.leaf_results.len(), 3);
        assert_eq!(execution.leaf_results[0].status, WorkflowRunStatus::Succeeded);
        assert_eq!(execution.leaf_results[1].status, WorkflowRunStatus::Succeeded);
        assert_eq!(execution.leaf_results[2].status, WorkflowRunStatus::BudgetExceeded);
        assert_eq!(execution.usage.total_tokens(), 1100);

- Add max_tokens field to AgentWorkerSpec for runtime enforcement
- Map FleetTaskBudget.max_tokens to AgentWorkerSpec in fleet_task_to_worker_spec
- Update CHANGELOG with Unreleased section documenting the fix

This closes the enforcement gap where FleetTaskBudget.max_tokens was defined
in the protocol but never consumed by the runtime execution path.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

- Add max_tokens field to SubAgentTask struct
- Track cumulative token usage (input + output) across all model turns
- Stop worker with Failed status when token budget exceeded
- Report budget exceeded via mailbox and progress logs
- Update test fixtures to include max_tokens: None field

Closes the runtime enforcement gap where FleetTaskBudget.max_tokens was
defined but never consumed by the actual sub-agent execution loop.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant