Skip to content

Introduce dispatch cooldown for sustained-failure miners#513

Closed
HudsonGraeme wants to merge 1 commit into
testnetfrom
introduce/dispatch-cooldown-for-broken-miners
Closed

Introduce dispatch cooldown for sustained-failure miners#513
HudsonGraeme wants to merge 1 commit into
testnetfrom
introduce/dispatch-cooldown-for-broken-miners

Conversation

@HudsonGraeme
Copy link
Copy Markdown
Member

@HudsonGraeme HudsonGraeme commented May 25, 2026

Summary

Mainnet validators are seeing 80-90% slice failure rates per benchmark run because a small set of broken miners (6 UIDs accounting for ~99% of all retry-exhausting query failures in the validator log) keep absorbing the dispatch budget. The adaptive-cap mechanism in performance.rs already ratchets these miners down via update_adaptive_cap, but the ratchet floors at 1 — they stay in the dispatch pool indefinitely, queueing up just enough work to fail fast and free their capacity for more work.

Scoring weight for those miners is already 0 (the throughput formula is rate × cap, and rate ≈ 0), so this is purely a dispatch-efficiency issue, not a payments one.

Change

Reuse the existing reconnect_blacklist primitive — rename to dispatch_cooldowns to reflect its more general role: any hotkey in the map is excluded from queryable_uids until its block-height value expires, regardless of the reason it landed there.

  • Relax the cap floor: update_adaptive_cap allows the cap to fall to 0 instead of clamping at 1. The 1→0 transition emits CapDirection::Evict and pushes (uid, hotkey) onto a new pending_evictions queue.
  • Drain + cooldown insert: every dispatch refresh, the validator loop drains pending_evictions and inserts cooldown entries with current_block + REHAB_BLOCKS expiry.
  • Rehabilitation on expiry: the maintenance loop's existing cooldown-prune path now calls PerformanceTracker::rehabilitate(uid, hotkey) for each expired entry — restores cap from 0 to 1 and clears the at-cap window so the next probe starts a fresh evaluation. The existing 8-sample ramp-up logic then decides whether the miner climbs back or lands in another Evict.
  • Shared constant: hoists REHAB_BLOCKS = 360 into sn2-types/constants.rs, replacing the per-module DISABLED_SLICE_REHAB_BLOCKS from Develop validator resilience to transient chain disruption and recover from network-wide slice failures #509. Operators tune one knob.

Properties

  • Symmetric with the disabled-slice rehab from Develop validator resilience to transient chain disruption and recover from network-wide slice failures #509 — same 360-block (~72min mainnet, ~20min testnet) self-healing window, same primitive.
  • No new state machine. Reuses dispatch_cooldowns (renamed reconnect_blacklist), reuses ramp-up logic, reuses cap event telemetry.
  • Single failed probe at cap=1 won't immediately re-evict because the at-cap window needs 8 samples before re-evaluating. A broken miner gets 8 wasted dispatches per epoch, down from thousands.
  • Stats reporter learns two new cap-event directions: "evict", "rehab". Telemetry remains lossless.

Test plan

  • cargo test -p sn2-validator performance:: — 20 tests pass, 5 new:
    • cap_ratchets_below_one_to_zero_on_sustained_failure
    • cap_drop_to_zero_emits_evict_event_and_pending_eviction
    • drain_pending_evictions_clears
    • rehabilitate_restores_cap_from_zero_to_one
    • rehabilitate_is_noop_when_cap_is_nonzero
  • cargo build -p sn2-validator clean.
  • cargo fmt --check clean.
  • CI for clippy + full workspace.
  • Mainnet smoke after merge: confirm uid=37 and friends drop out of queryable_uids within their first at-cap evaluation window and re-appear ~360 blocks later for a single probe.

Summary by CodeRabbit

  • New Features

    • Added miner rehabilitation system: miners experiencing performance degradation or reaching zero capacity can now be temporarily disabled and automatically restored after a configurable block-height cooldown period.
  • Improvements

    • Enhanced dispatch filtering and cooldown management for improved control over miner availability.

Review Change Stack

Investigate the mainnet dispatch failure distribution and isolate the
fact that the existing adaptive-cap mechanism floors at 1. A miner that
errors fast — handler error, transport timeout — keeps its at-capacity
window full of failures, but the ratchet cannot ratchet below 1, so
the dispatch loop keeps assigning the queue head to them and they
absorb most of the run's slices. Observed on mainnet: six miners
consuming 99% of all retry-exhausting failures, each at adaptive cap=1
with near-zero success rate. Throughput weight is already 0 via the
existing rate * cap formula, so the issue is not scoring; it is the
validator wasting its dispatch budget on a long tail of useless
peers.

Develop a dispatch-cooldown extension that reuses the existing
reconnect_blacklist primitive instead of introducing a parallel jail
subsystem. Rename reconnect_blacklist to dispatch_cooldowns to reflect
its more general role: any hotkey present in the map is excluded from
the queryable filter for the duration encoded in its block-height
value, regardless of why it was placed there.

* Relax the cap floor: update_adaptive_cap now allows the cap to fall
  to 0 instead of clamping at 1. The 1 -> 0 transition is treated as
  an Evict event and pushes (uid, hotkey) onto a new
  pending_evictions queue.

* PerformanceTracker exposes drain_pending_evictions and
  rehabilitate(uid, hotkey). Evictions are drained on every dispatch
  refresh; the validator loop inserts dispatch_cooldowns entries with
  current_block + REHAB_BLOCKS expiry. Rehabilitation restores cap
  from 0 back to 1 and clears the at-cap window so the next probe
  starts a fresh evaluation, emitting a Rehab cap event for
  telemetry.

* Maintenance loop's existing cooldown pruning path now calls
  rehabilitate on every expired hotkey, so a miner whose cooldown
  window has elapsed re-enters the queryable set at cap=1. The
  existing 8-sample ramp-up logic decides whether they climb back or
  land in another Evict, which re-arms the cooldown for another
  REHAB_BLOCKS.

* Eliminate the per-module DISABLED_SLICE_REHAB_BLOCKS constant and
  hoist REHAB_BLOCKS = 360 into sn2-types/constants.rs so the disabled
  slice rehab and the new dispatch cooldown share a single number.
  Operators tune one knob, not two.

The dispatch.rs queryable filter is unchanged in shape — the
reconnect_blacklist check is now a dispatch_cooldowns check and
naturally captures both reconnect failures and capacity-zero
evictions. The inner dispatch loop's `active_now >= cap` gate
already short-circuits cap=0 miners, so no new skip is needed.

Stats reporter learns the two new cap-event directions ("evict",
"rehab") so the telemetry stream remains lossless.

Tests:
* 5 new unit tests in performance.rs cover the cap=0 floor, the
  Evict event payload, the pending_evictions drain semantics, and
  the rehabilitate code path (cap=0 -> 1 with cleared window,
  no-op when cap > 0).
* cargo test -p sn2-validator performance:: passes 20/20.
* cargo build -p sn2-validator clean.
* cargo fmt --check clean.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

Walkthrough

This PR implements a miner eviction and rehabilitation system. When a miner's adaptive capacity drops to zero due to sustained failures, it is marked as pending eviction, placed into a dispatch cooldown for a configurable block window (REHAB_BLOCKS), and later rehabilitated to restore its capacity. The reconnect_blacklist is replaced throughout with a dispatch_cooldowns map keyed by hotkey.

Changes

Eviction and Rehabilitation Flow

Layer / File(s) Summary
Shared constants and capacity direction documentation
crates/sn2-types/src/constants.rs, crates/sn2-validator/src/performance.rs
Introduces REHAB_BLOCKS constant (360 blocks) and documents CapDirection::Evict (terminal capacity drop) and CapDirection::Rehab (cap restoration) variants.
PerformanceTracker eviction detection and APIs
crates/sn2-validator/src/performance.rs
Adds pending_evictions vec to track (uid, hotkey) pairs; updates update_adaptive_cap to emit CapDirection::Evict and populate pending evictions when cap hits zero; implements drain_pending_evictions() and rehabilitate(uid, hotkey) public APIs.
PerformanceTracker eviction and rehabilitation tests
crates/sn2-validator/src/performance.rs
Verifies sustained failures ratchet capacity to zero, Evict events and pending pairs are recorded and cleared, and rehabilitate restores cap from zero to one, clears at-cap results, and emits Rehab events.
Capacity event serialization for Evict and Rehab
crates/sn2-validator/src/stats_reporter.rs
Maps CapDirection::Evict and CapDirection::Rehab to "evict" and "rehab" strings in capacity event JSON.
ValidatorLoop state refactoring: dispatch_cooldowns
crates/sn2-validator/src/validator_loop/mod.rs
Replaces reconnect_blacklist HashMap with dispatch_cooldowns HashMap, keyed by hotkey.
Dispatch: eviction drain and cooldown insertion
crates/sn2-validator/src/validator_loop/dispatch.rs
Updates dispatch_requests to filter by dispatch_cooldowns instead of reconnect_blacklist; refresh_dispatch_cache_if_stale drains pending evictions and inserts cooldown entries for evicted hotkeys until current_block + REHAB_BLOCKS.
Maintenance: dispatch_cooldown expiry and rehabilitation
crates/sn2-validator/src/validator_loop/maintenance.rs
In sync_metagraph, expires and removes dispatch_cooldowns entries, then calls performance_tracker.rehabilitate() for each expired miner; in update_weights, skiplist filtering uses dispatch_cooldowns instead of reconnect_blacklist.
Results: failure-based cooldown recording
crates/sn2-validator/src/validator_loop/results.rs
In handle_failure, "already in progress" and "in backoff" failures now record cooldowns into dispatch_cooldowns instead of reconnect_blacklist.
Dslice: shared REHAB_BLOCKS configuration
crates/sn2-validator/src/validator_loop/dslice.rs
Imports sn2_types::REHAB_BLOCKS and uses it as DISABLED_SLICE_REHAB_BLOCKS, centralizing the rehab window configuration.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • inference-labs-inc/subnet-2#503: Generalizes per-hotkey reconnect blocking by replacing reconnect_blacklist with dispatch_cooldowns and adding rehab/evict draining across the same validator_loop components.
  • inference-labs-inc/subnet-2#478: Updates PerformanceTracker::update_adaptive_cap to add CapDirection::Evict and Rehab behaviors and updates stats_reporter::flush_capacity for capacity event direction serialization.
  • inference-labs-inc/subnet-2#509: Modifies validator_loop/dslice.rs disabled-slice rehabilitation logic using the centralized REHAB_BLOCKS window that this PR introduces.

Suggested labels

run-build

Poem

🐰 When miners stumble and fail the test,
Their caps tumble down, needing rest.
Evict them now, then one day soon,
Rehabilitate beneath the blockchain moon—
A cooldown's grace, a second chance to bloom! 🌱

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 70.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main change: introducing a dispatch cooldown mechanism for miners that have experienced sustained failures, which is the core objective across all modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch introduce/dispatch-cooldown-for-broken-miners

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/sn2-validator/src/validator_loop/dispatch.rs (1)

143-177: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fresh evictions can still receive work in the current dispatch pass.

dispatch_requests builds queryable_uids before calling this helper. If drain_pending_evictions() returns entries while the cache is still fresh, the hotkey gets added to dispatch_cooldowns, but the already-built queryable_uids and cached caps are unchanged, so this cycle can still dispatch to a miner that just ratcheted to cap 0. Please drain/re-filter before building queryable_uids, or force a capacity recheck when evicted is non-empty.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/sn2-validator/src/validator_loop/dispatch.rs` around lines 143 - 177,
The current code drains pending evictions inside refresh_dispatch_cache_if_stale
after queryable_uids was already built, so freshly-evicted miners can still
receive work; fix by ensuring evictions affect the current dispatch pass: either
move the drain_pending_evictions() call out to the caller so dispatch_requests
builds queryable_uids after calling refresh_dispatch_cache_if_stale, or (if
keeping the call here) when evicted is non-empty immediately refresh the cache
capacities and recompute api_eligible for the current pass (call
self.dispatch_cache.capacities = self.performance_tracker.miner_capacities() and
recompute self.dispatch_cache.api_eligible while filtering out the evicted
hotkeys, and update self.dispatch_cache.refreshed_at) so the evicted hotkeys in
dispatch_cooldowns are excluded from this dispatch cycle; reference
refresh_dispatch_cache_if_stale, drain_pending_evictions,
compute_api_eligible_from_uids, dispatch_cache.capacities,
dispatch_cache.api_eligible, and dispatch_cooldowns.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/sn2-validator/src/performance.rs`:
- Around line 539-544: The current drain_pending_evictions implementation only
clears an in-memory pending_evictions queue so adaptive_caps entries that
reached 0 can stay permanently disabled after a crash; fix by making
eviction/rehab state persistent or by rehydration at startup: either (A) change
the eviction representation from an ephemeral Vec in pending_evictions to a
persisted map of hotkey -> expiry_block (store expiry timestamps/blocks
alongside adaptive_caps) and load that persisted map into dispatch_cooldowns on
startup, or (B) add a startup recovery routine (e.g., a new method invoked
during init) that scans adaptive_caps for entries where cap == 0 and
re-populates dispatch_cooldowns with correct expiry (use stored eviction
timestamp or compute expiry using REHAB_BLOCKS and current block) so
rehabilitate will be invoked; locate and update drain_pending_evictions,
pending_evictions, adaptive_caps, dispatch_cooldowns, and any rehab/rehabilitate
scheduling logic to ensure the expiry is persisted and reloaded.

In `@crates/sn2-validator/src/validator_loop/maintenance.rs`:
- Around line 264-288: The cooldown pruning and rehab logic currently tied to
sync_metagraph() must be moved into the maintenance path that runs every loop
tick so expired entries are rehabilitated immediately; locate the block that
iterates self.dispatch_cooldowns using self.current_block and calling
self.performance_tracker.rehabilitate(uid, hk) (and referencing
sn2_types::REHAB_BLOCKS) and extract it out of
sync_metagraph()/run_periodic_tasks() into the per-iteration maintenance
function/loop so it executes each tick, ensuring you still compute
expired_hotkeys from self.dispatch_cooldowns, remove them, and call
get_uid_by_hotkey on self.config.metagraph before rehabilitating.

In `@crates/sn2-validator/src/validator_loop/results.rs`:
- Around line 350-351: The code unconditionally inserts `until` into
`dispatch_cooldowns`, which can shorten an existing cooldown; change the update
to be monotonic by taking the max of the existing value and `until` instead of
blind insert. Locate the `dispatch_cooldowns.insert(hotkey.to_string(), until)`
call in `results.rs` and replace it with a max-preserving update (e.g., use the
map/entry API on `dispatch_cooldowns` to compare existing and new values and set
the stored value to `max(existing, until)`), using `self.current_block`, `bpt`,
`until`, and `hotkey` as the referenced symbols.

---

Outside diff comments:
In `@crates/sn2-validator/src/validator_loop/dispatch.rs`:
- Around line 143-177: The current code drains pending evictions inside
refresh_dispatch_cache_if_stale after queryable_uids was already built, so
freshly-evicted miners can still receive work; fix by ensuring evictions affect
the current dispatch pass: either move the drain_pending_evictions() call out to
the caller so dispatch_requests builds queryable_uids after calling
refresh_dispatch_cache_if_stale, or (if keeping the call here) when evicted is
non-empty immediately refresh the cache capacities and recompute api_eligible
for the current pass (call self.dispatch_cache.capacities =
self.performance_tracker.miner_capacities() and recompute
self.dispatch_cache.api_eligible while filtering out the evicted hotkeys, and
update self.dispatch_cache.refreshed_at) so the evicted hotkeys in
dispatch_cooldowns are excluded from this dispatch cycle; reference
refresh_dispatch_cache_if_stale, drain_pending_evictions,
compute_api_eligible_from_uids, dispatch_cache.capacities,
dispatch_cache.api_eligible, and dispatch_cooldowns.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 760e77ef-1f38-4ca1-9ffd-238c422d9a93

📥 Commits

Reviewing files that changed from the base of the PR and between d6a2d17 and f8d6cbd.

📒 Files selected for processing (8)
  • crates/sn2-types/src/constants.rs
  • crates/sn2-validator/src/performance.rs
  • crates/sn2-validator/src/stats_reporter.rs
  • crates/sn2-validator/src/validator_loop/dispatch.rs
  • crates/sn2-validator/src/validator_loop/dslice.rs
  • crates/sn2-validator/src/validator_loop/maintenance.rs
  • crates/sn2-validator/src/validator_loop/mod.rs
  • crates/sn2-validator/src/validator_loop/results.rs

Comment on lines +539 to +544
/// Drain the set of (uid, hotkey) pairs whose adaptive cap just hit zero.
/// The validator loop consumes this on every dispatch tick and inserts
/// the hotkeys into `dispatch_cooldowns` with a `REHAB_BLOCKS` expiry.
pub fn drain_pending_evictions(&mut self) -> Vec<(u16, String)> {
std::mem::take(&mut self.pending_evictions)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Eviction state is not restart-safe for cap-zero miners.

adaptive_caps (including 0) is persisted, but Line 542 only drains an in-memory queue. After restart, dispatch_cooldowns starts empty, so previously evicted miners can remain at cap 0 indefinitely with no cooldown-expiry path to call rehabilitate.

Please make eviction/rehab state crash-safe (for example, persist cooldown expiry per hotkey, or add a startup recovery path that safely rehydrates rehab scheduling for cap-zero entries).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/sn2-validator/src/performance.rs` around lines 539 - 544, The current
drain_pending_evictions implementation only clears an in-memory
pending_evictions queue so adaptive_caps entries that reached 0 can stay
permanently disabled after a crash; fix by making eviction/rehab state
persistent or by rehydration at startup: either (A) change the eviction
representation from an ephemeral Vec in pending_evictions to a persisted map of
hotkey -> expiry_block (store expiry timestamps/blocks alongside adaptive_caps)
and load that persisted map into dispatch_cooldowns on startup, or (B) add a
startup recovery routine (e.g., a new method invoked during init) that scans
adaptive_caps for entries where cap == 0 and re-populates dispatch_cooldowns
with correct expiry (use stored eviction timestamp or compute expiry using
REHAB_BLOCKS and current block) so rehabilitate will be invoked; locate and
update drain_pending_evictions, pending_evictions, adaptive_caps,
dispatch_cooldowns, and any rehab/rehabilitate scheduling logic to ensure the
expiry is persisted and reloaded.

Comment on lines +264 to 288
let cooldowns_before = self.dispatch_cooldowns.len();
let expired_hotkeys: Vec<String> = self
.dispatch_cooldowns
.iter()
.filter(|(_, &until)| self.current_block >= until)
.map(|(hk, _)| hk.clone())
.collect();
for hk in &expired_hotkeys {
self.dispatch_cooldowns.remove(hk);
// Re-arm the miner with one probe slot. If the probe fails,
// the at-cap window will refill with mostly-zero successes
// and the next `update_adaptive_cap` will Evict them again.
if let Some(uid) = self.config.metagraph.get_uid_by_hotkey(hk) {
self.performance_tracker.rehabilitate(uid, hk);
}
}
let cooldowns_after = self.dispatch_cooldowns.len();
if cooldowns_before != cooldowns_after {
info!(
expired = blacklist_before - blacklist_after,
remaining = blacklist_after,
"reconnect_blacklist pruned"
expired = cooldowns_before - cooldowns_after,
remaining = cooldowns_after,
rehab_blocks = sn2_types::REHAB_BLOCKS,
"dispatch_cooldowns pruned"
);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Cooldown rehab is gated by the hourly metagraph sync.

This block only runs from sync_metagraph(), and run_periodic_tasks() schedules that path once every 3600 seconds. After a cooldown expires, the hotkey is no longer filtered by current_block < until, but the miner still sits at cap 0 until this code runs, so rehabilitation can lag the intended REHAB_BLOCKS window by up to an hour. Please move cooldown prune/rehab into a maintenance path that runs every loop tick.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/sn2-validator/src/validator_loop/maintenance.rs` around lines 264 -
288, The cooldown pruning and rehab logic currently tied to sync_metagraph()
must be moved into the maintenance path that runs every loop tick so expired
entries are rehabilitated immediately; locate the block that iterates
self.dispatch_cooldowns using self.current_block and calling
self.performance_tracker.rehabilitate(uid, hk) (and referencing
sn2_types::REHAB_BLOCKS) and extract it out of
sync_metagraph()/run_periodic_tasks() into the per-iteration maintenance
function/loop so it executes each tick, ensuring you still compute
expired_hotkeys from self.dispatch_cooldowns, remove them, and call
get_uid_by_hotkey on self.config.metagraph before rehabilitating.

Comment on lines 350 to +351
let until = self.current_block + bpt;
self.reconnect_blacklist.insert(hotkey.to_string(), until);
self.dispatch_cooldowns.insert(hotkey.to_string(), until);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve the longest existing cooldown when recording failure backoff.

Line 351 unconditionally overwrites dispatch_cooldowns; this can shorten an existing eviction cooldown and allow early re-dispatch. Use a monotonic update (max(existing, until)) instead of blind insert.

Suggested fix
-            let until = self.current_block + bpt;
-            self.dispatch_cooldowns.insert(hotkey.to_string(), until);
+            let until = self.current_block.saturating_add(bpt);
+            self.dispatch_cooldowns
+                .entry(hotkey.to_string())
+                .and_modify(|existing_until| *existing_until = (*existing_until).max(until))
+                .or_insert(until);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let until = self.current_block + bpt;
self.reconnect_blacklist.insert(hotkey.to_string(), until);
self.dispatch_cooldowns.insert(hotkey.to_string(), until);
let until = self.current_block.saturating_add(bpt);
self.dispatch_cooldowns
.entry(hotkey.to_string())
.and_modify(|existing_until| *existing_until = (*existing_until).max(until))
.or_insert(until);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/sn2-validator/src/validator_loop/results.rs` around lines 350 - 351,
The code unconditionally inserts `until` into `dispatch_cooldowns`, which can
shorten an existing cooldown; change the update to be monotonic by taking the
max of the existing value and `until` instead of blind insert. Locate the
`dispatch_cooldowns.insert(hotkey.to_string(), until)` call in `results.rs` and
replace it with a max-preserving update (e.g., use the map/entry API on
`dispatch_cooldowns` to compare existing and new values and set the stored value
to `max(existing, until)`), using `self.current_block`, `bpt`, `until`, and
`hotkey` as the referenced symbols.

@HudsonGraeme
Copy link
Copy Markdown
Member Author

Superseded by #514 which includes the same dispatch-cooldown commit (093c564) plus inline-finish refactor and idle-cap decay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant