Fix scan_requested fast-path defeating scanner backoff#215
Fix scan_requested fast-path defeating scanner backoff#215
Conversation
| // (50ms, too slow for the nudge loop's 25ms window). | ||
| // 5ms matches the nudge poll interval and caps the | ||
| // scan rate at ~200/s during active workloads. | ||
| sleep_ms = 5; |
There was a problem hiding this comment.
Notification permit defeats intended 5ms sleep path
Medium Severity
When scan_requested is true and inserted == 0, sleep_ms is set to 5 and execution falls through to the tokio::select! block. However, the same wakeup() call that set scan_requested = true also called notify.notify_one(), which stores a permit in the Notify. When the scanner enters tokio::select!, notified() resolves immediately by consuming that stored permit, so the 5ms sleep never actually happens. With continuous enqueues, a fresh permit is stored during every scan, making the effective sleep 0ms — potentially spinning faster than the old 1ms continue path this change replaces.
Additional Locations (1)
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
There are 3 total unresolved issues (including 1 from previous review).
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| buffer_len = self.buffer.len(), | ||
| tombstone_count = self.ack_tombstones.lock().unwrap().len(), | ||
| "scan_tasks breakdown" | ||
| ); |
There was a problem hiding this comment.
Hot-path info-level log causes excessive production log volume
Medium Severity
The new tracing::info! in scan_tasks fires on every scan iteration that reads at least one entry from the DB. During active scanning, this loop runs up to 200+ times per second per task group. All other logging in this module uses debug! level. This info!-level log on the hot path would produce massive log volume in production and is likely a leftover from debugging the backoff fix.
| /// Get the total number of tasks in the broker buffer across all task groups. | ||
| pub fn broker_buffer_len(&self) -> usize { | ||
| self.brokers.buffer_len() | ||
| } |
The scan_requested fast-path unconditionally used a 1ms sleep whenever wakeup() had been called, regardless of whether the scan found anything. With continuous enqueues (each calling wakeup), scan_requested was permanently true, causing the scanner to spin at 100+ scans/s even when every scan found 0 new entries to buffer. Now the fast-path only uses 1ms when the scan actually inserted tasks. When scan_requested was set but the scan found nothing new, use a 5ms sleep instead — short enough for the nudge loop's 25ms polling window but long enough to prevent the scanner from spinning at 1000/s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds debug-level logging to scan_tasks showing per-scan breakdown: total entries read, inserted, and counts for each skip reason (future, inflight, tombstone, already_buffered, defunct). Helps diagnose why the buffer isn't filling in production. Also exposes broker_buffer_len() on JobStoreShard for tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ca2beda to
940879a
Compare


Summary
The
scan_requestedfast-path in the TaskBroker scan loop unconditionally used a 1ms sleep wheneverwakeup()had been called. With continuous enqueues (each callingwakeup()),scan_requestedwas permanently true, causing the scanner to spin at 100+ scans/s even when every scan found 0 new entries.Before:
scan_requested=true→ always 1ms sleep → 100-150 scans/s, buffer never fills past ~40 entriesAfter:
scan_requested=true+ found tasks → 1ms sleep (fast-path preserved).scan_requested=true+ found nothing → 5ms sleep (caps empty scans at ~200/s, responsive to nudge loop)Single-line behavioral change in the scan loop — no architectural changes.
Test plan
🤖 Generated with Claude Code
Note
Medium Risk
Adjusts
TaskBrokerscanner timing and adds per-scan info logging, which can impact dequeue latency and production log volume under load. Benchmark changes are low risk, but scanner-loop behavior is performance-critical.Overview
Fixes the
scan_requestedfast-path in theTaskBrokerbackground scan loop so wakeups no longer force a 1ms sleep when a scan inserts zero tasks; instead it uses a short (5ms) delay to avoid busy-spinning during continuous enqueues while staying responsive when work is found.Adds
scan_tasksbreakdown counters and emits atracing::info!log summarizing read/skip/insert reasons per scan, and exposes a new shard helperbroker_buffer_len()for total broker buffer size.Reworks the
task_scanner_profilebench to measure sustained dequeue throughput via realdequeue()calls (batch sizes 32 and 1) rather than synthetic key-range scan microbenchmarks.Written by Cursor Bugbot for commit 940879a. This will update automatically on new commits. Configure here.