Fix scan_requested fast-path defeating scanner backoff by airhorns · Pull Request #215 · gadget-inc/silo

airhorns · 2026-03-24T18:48:36Z

Summary

The scan_requested fast-path in the TaskBroker scan loop unconditionally used a 1ms sleep whenever wakeup() had been called. With continuous enqueues (each calling wakeup()), scan_requested was permanently true, causing the scanner to spin at 100+ scans/s even when every scan found 0 new entries.

Before: scan_requested=true → always 1ms sleep → 100-150 scans/s, buffer never fills past ~40 entries
After: scan_requested=true + found tasks → 1ms sleep (fast-path preserved). scan_requested=true + found nothing → 5ms sleep (caps empty scans at ~200/s, responsive to nudge loop)

Single-line behavioral change in the scan loop — no architectural changes.

Test plan

All 71 existing tests pass (dequeue, enqueue, cancel, floating concurrency, metrics)
Clippy clean

🤖 Generated with Claude Code

Note

Medium Risk
Adjusts TaskBroker scanner timing and adds per-scan info logging, which can impact dequeue latency and production log volume under load. Benchmark changes are low risk, but scanner-loop behavior is performance-critical.

Overview
Fixes the scan_requested fast-path in the TaskBroker background scan loop so wakeups no longer force a 1ms sleep when a scan inserts zero tasks; instead it uses a short (5ms) delay to avoid busy-spinning during continuous enqueues while staying responsive when work is found.

Adds scan_tasks breakdown counters and emits a tracing::info! log summarizing read/skip/insert reasons per scan, and exposes a new shard helper broker_buffer_len() for total broker buffer size.

Reworks the task_scanner_profile bench to measure sustained dequeue throughput via real dequeue() calls (batch sizes 32 and 1) rather than synthetic key-range scan microbenchmarks.

^{Written by Cursor Bugbot for commit 940879a. This will update automatically on new commits. Configure here.}

cursor · 2026-03-24T18:56:30Z

src/task_broker.rs

+                    // (50ms, too slow for the nudge loop's 25ms window).
+                    // 5ms matches the nudge poll interval and caps the
+                    // scan rate at ~200/s during active workloads.
+                    sleep_ms = 5;


Notification permit defeats intended 5ms sleep path

Medium Severity

When scan_requested is true and inserted == 0, sleep_ms is set to 5 and execution falls through to the tokio::select! block. However, the same wakeup() call that set scan_requested = true also called notify.notify_one(), which stores a permit in the Notify. When the scanner enters tokio::select!, notified() resolves immediately by consuming that stored permit, so the 5ms sleep never actually happens. With continuous enqueues, a fresh permit is stored during every scan, making the effective sleep 0ms — potentially spinning faster than the old 1ms continue path this change replaces.

Additional Locations (1)

src/task_broker.rs#L323-L334

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 3 total unresolved issues (including 1 from previous review).

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-24T19:58:45Z

src/task_broker.rs

+                buffer_len = self.buffer.len(),
+                tombstone_count = self.ack_tombstones.lock().unwrap().len(),
+                "scan_tasks breakdown"
+            );


Hot-path info-level log causes excessive production log volume

Medium Severity

The new tracing::info! in scan_tasks fires on every scan iteration that reads at least one entry from the DB. During active scanning, this loop runs up to 200+ times per second per task group. All other logging in this module uses debug! level. This info!-level log on the hot path would produce massive log volume in production and is likely a leftover from debugging the backoff fix.

cursor · 2026-03-24T19:58:45Z

src/job_store_shard/mod.rs

+    /// Get the total number of tasks in the broker buffer across all task groups.
+    pub fn broker_buffer_len(&self) -> usize {
+        self.brokers.buffer_len()
+    }


New public method broker_buffer_len is never called

Low Severity

The newly added broker_buffer_len method on JobStoreShard is not called anywhere in the codebase — not in source, tests, or benchmarks. It expands the public API surface without a consumer.

The scan_requested fast-path unconditionally used a 1ms sleep whenever wakeup() had been called, regardless of whether the scan found anything. With continuous enqueues (each calling wakeup), scan_requested was permanently true, causing the scanner to spin at 100+ scans/s even when every scan found 0 new entries to buffer. Now the fast-path only uses 1ms when the scan actually inserted tasks. When scan_requested was set but the scan found nothing new, use a 5ms sleep instead — short enough for the nudge loop's 25ms polling window but long enough to prevent the scanner from spinning at 1000/s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds debug-level logging to scan_tasks showing per-scan breakdown: total entries read, inserted, and counts for each skip reason (future, inflight, tombstone, already_buffered, defunct). Helps diagnose why the buffer isn't filling in production. Also exposes broker_buffer_len() on JobStoreShard for tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cursor bot reviewed Mar 24, 2026

View reviewed changes

airhorns and others added 2 commits March 25, 2026 21:23

airhorns force-pushed the fix-scan-requested-fastpath branch from ca2beda to 940879a Compare March 26, 2026 01:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix scan_requested fast-path defeating scanner backoff#215

Fix scan_requested fast-path defeating scanner backoff#215
airhorns wants to merge 2 commits intomainfrom
fix-scan-requested-fastpath

airhorns commented Mar 24, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot Mar 24, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Mar 24, 2026

Uh oh!

cursor bot Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

airhorns commented Mar 24, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

cursor bot Mar 24, 2026

Choose a reason for hiding this comment

Notification permit defeats intended 5ms sleep path

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 24, 2026

Choose a reason for hiding this comment

Hot-path info-level log causes excessive production log volume

Uh oh!

cursor bot Mar 24, 2026

Choose a reason for hiding this comment

New public method broker_buffer_len is never called

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

airhorns commented Mar 24, 2026 •

edited by cursor bot

Loading

New public method `broker_buffer_len` is never called