Optimize TaskBroker scanning: larger batches + cursor-based scan by airhorns · Pull Request #213 · gadget-inc/silo

airhorns · 2026-03-23T19:08:59Z

Summary

Increase scan_batch from 1024 → 4096 so a single scan pass can refill the buffer from low-watermark to target, eliminating 3x redundant re-reads per refill cycle
Cache the SlateDB scan iterator across passes: the scanner marches forward through the key range without recreating the iterator or re-reading buffered entries. Stops at future-timestamped tasks and resumes from there on the next pass.
When a future task is encountered, the consumed entry is held back and re-evaluated on the next pass with a fresh now_ms, so it is never lost from the iterator
wakeup() invalidates the cached iterator, ensuring newly-created tasks (from enqueue, expedite, restart, concurrency grants, retries) that sort before the old cursor position are picked up via a fresh scan from the beginning
Iterator is also refreshed after MAX_ITER_AGE (5s) to pick up scheduled tasks that have become ready
Add 9 tests covering every code path that creates tasks "behind the cursor"
Add real-shard dequeue benchmark exercising the full TaskBroker pipeline

Test plan

All 9 scanner ordering tests pass (enqueue at time=0, expedite, restart, concurrency grants, retries, priority ordering, multi-group, bulk drain)
All 21 floating concurrency tests pass
Existing dequeue, enqueue, cancel, and metrics tests pass
Clippy clean
CI green

🤖 Generated with Claude Code

Note

Medium Risk
Changes core dequeue/scanner behavior by introducing a cached iterator and new stop/resume semantics around future tasks and wakeups, which could affect task availability/latency if incorrect. Risk is mitigated by extensive new regression tests covering behind-cursor task creation paths.

Overview
Optimizes TaskBroker scanning to avoid redundant DB re-reads by caching a SlateDB DbIterator across scan passes, advancing a cursor through the task-group key range instead of restarting from the beginning every time.

The scan loop now refreshes the iterator on wakeup(), on iterator exhaustion, or after MAX_ITER_AGE (5s) to pick up newly-ready scheduled work; it also halts at the first future task and carries that consumed entry forward for re-evaluation on the next pass.

Adds a new task_scanner_ordering_tests.rs suite to ensure tasks created “behind the cursor” (e.g., start_at_ms=0, expedite/restart, concurrency grants, retries, priority ordering, multi-group, bulk enqueue) are still eventually dequeued, and updates task_scanner_profile benchmark to measure real sustained dequeue throughput instead of synthetic scan timing.

^{Written by Cursor Bugbot for commit cdc0501. This will update automatically on new commits. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

src/task_broker.rs

Tests that all code paths creating task keys (enqueue, expedite, restart, concurrency grants, retries, priority ordering) result in tasks being picked up by the TaskBroker scanner even when those tasks sort before entries already in the buffer. This coverage is important before optimizing the scanner to use cursor-based scanning. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The TaskBroker scanner previously always started from the beginning of the task group key range on every scan pass. When the buffer already contained N entries, each scan had to read through all N before finding new entries to insert — O(buffer_size) wasted reads per scan. Now the scanner tracks a cursor (the last key read) and resumes from there on subsequent passes within a fill cycle. This eliminates the redundant re-reads of already-buffered entries. The cursor is invalidated whenever wakeup() fires, which happens on every code path that creates new task keys (enqueue, expedite, restart, concurrency grants, retries). This ensures tasks created behind the cursor are always picked up on the next scan. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replaces the synthetic scan simulation with a benchmark that exercises the full TaskBroker pipeline end-to-end: enqueue 50K tasks via import, then drain them via sustained dequeue calls measuring throughput and per-batch latency. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Instead of tracking a cursor key and creating a new SlateDB iterator each scan pass, hold the actual DbIterator as a local variable in the scan loop and march it forward. This avoids both the iterator creation cost (~160us) and the cost of re-reading already-buffered entries. The iterator stops when it hits future-timestamped tasks (HitFuture) or exhausts the range (Exhausted). It is dropped and recreated when: - wakeup() fires (new tasks may have been written behind the cursor) - The iterator is older than MAX_ITER_AGE (5s) - The iterator is exhausted When a future task is encountered, the consumed entry is held back and re-evaluated on the next pass with a fresh now_ms, so it is never lost from the iterator. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

airhorns force-pushed the task-broker-opt branch from 0df0ef0 to bb95080 Compare March 23, 2026 19:12

airhorns changed the title ~~Adjust task scanner batch size up to make sure we can fill the whole buffer in one scan~~ Optimize TaskBroker scanning: larger batches + cursor-based scan Mar 23, 2026

cursor bot reviewed Mar 23, 2026

View reviewed changes

src/task_broker.rs Outdated Show resolved Hide resolved

airhorns force-pushed the task-broker-opt branch from 27665b0 to 8ddb7b3 Compare March 23, 2026 21:43

airhorns and others added 6 commits March 23, 2026 18:14

Format benchmark and test files

a450980

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Trigger CI

cdc0501

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

airhorns force-pushed the task-broker-opt branch from 905b8a2 to cdc0501 Compare March 23, 2026 22:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize TaskBroker scanning: larger batches + cursor-based scan#213

Optimize TaskBroker scanning: larger batches + cursor-based scan#213
airhorns wants to merge 6 commits intomainfrom
task-broker-opt

airhorns commented Mar 23, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

airhorns commented Mar 23, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

airhorns commented Mar 23, 2026 •

edited by cursor bot

Loading