tl;dr: I had my agent look at the Garwood interval data posted by @adammwest in the linked issue to see how it might best be integrated into the current algorithm.
Findings so far:
- It says the existing algo tries to force the converged hashrate to a smaller range of acceptable values than is justifiable mathematically, and as a result subsequent difficulty adjustments can be triggered due to noise/error.
- So this change actually increases the variance in where a miner's hashrate will converge but in exchange it should get there with fewer triggered adjustments.
- This change replaced the fixed stairstep function with a dynamically computed value (though we could rewrite with a fixed rungs if desired)
- Also the new function is parameterized by share-rate, so the vardiff triggers get more efficient as share-rate increases.
I'm doing some manual A/B testing now to confirm the reduction in vardiff triggers but in all honesty this accuracy tweak doesn't have much impact on the total convergence time, which is the bigger issue and will require a larger rewrite.
Curious what others think of these changes.
Reconsider #396: vardiff convergence quality, not poll cadence
Refs: #396
#396 proposes shrinking the hardcoded 60s vardiff polling cycle in pool, jd-client, and translator so sv2-ui updates more frequently. After working through the algorithm and its statistical behaviour at default settings, the polling cadence isn't the bottleneck — it isn't even part of the algorithm window. The user-visible problems (slow convergence, "staircase" wobble, dashboard feeling unresponsive) come from a separate place: the ladder's thresholds were calibrated to a different statistical regime than the default 6 shares/min produces. This is a request to reframe the work and pursue different fixes.
Polling cadence is not the algorithm window
The hardcoded 60s in each app is just the tokio::time::interval cadence at which try_vardiff is called:
// pool-apps/pool/src/lib/channel_manager/mod.rs:611-621 (mirrored in jd-client and translator)
let mut ticker = tokio::time::interval(std::time::Duration::from_secs(60));
loop {
ticker.tick().await;
self.run_vardiff().await
}
try_vardiff (stratum/sv2/channels-sv2/src/vardiff/classic.rs) computes deviation against shares_since_last_update, which only resets when a fire applies a new target. Calling it more frequently re-checks the same accumulating window against the same thresholds. There's also an internal early-return at delta_time <= 15, so polling below ~15s is a no-op.
Shrinking the apps-layer ticker on its own buys nothing on the algorithm side. If UI freshness is the real goal, it's better solved at the display layer (smoother interpolation, separate hashrate buckets independent of vardiff fires).
What operators actually experience
Three properties matter for vardiff in practice:
Convergence latency — time from a fresh connection to a target near the miner's true hashrate. Today: ~7–10 minutes on typical SV1-bridged connections, presenting as a visible staircase climb.
Convergence stability — how smoothly the staircase rises. Today: a few large discrete jumps, because each fire pulls a single-window hashrate estimate that carries a wide confidence interval.
Steady-state behaviour — once near the true rate, no rungs fire and the system settles. This works correctly.
The complaint that motivated #396 is downstream of latency and stability, not poll frequency. A faster ticker on the same algorithm makes the staircase no shorter and no smoother; it just samples it more often.
Why this happens — sampling noise at default share rate
Every example config uses shares_per_minute = 6.0. One 60s window observes ~6 shares. Share counts are Poisson; the exact 95% Garwood CI on a count of 6 is approximately [2.2, 13.1] — that is, −63%, +118% from the expected value. The single-window hashrate estimate that drives every ladder fire inherits that CI directly.
Cross-referencing the ladder in vardiff/classic.rs:154-162:
let should_update = match hashrate_delta_percentage {
pct if pct >= 100.0 => true,
pct if pct >= 60.0 && delta_time >= 60 => true,
pct if pct >= 50.0 && delta_time >= 120 => true,
pct if pct >= 45.0 && delta_time >= 180 => true,
pct if pct >= 30.0 && delta_time >= 240 => true,
pct if pct >= 15.0 && delta_time >= 300 => true,
_ => false,
};
against the noise expected at each elapsed time:
| Rung |
Threshold |
Expected shares |
95% Garwood CI |
Verdict |
| ≥100%, any time |
100% |
— |
always outside |
OK |
| ≥60%, ≥60s |
60% |
6 |
−63%, +118% |
inside noise (upper) |
| ≥50%, ≥120s |
50% |
12 |
−48%, +75% |
inside noise |
| ≥45%, ≥180s |
45% |
18 |
−41%, +58% |
inside noise |
| ≥30%, ≥240s |
30% |
24 |
−36%, +49% |
inside noise |
| ≥15%, ≥300s |
15% |
30 |
−33%, +43% |
inside noise |
(99% CI rather than 95% because the algorithm re-evaluates each rung repeatedly during convergence; a 95% bound implies ~5% false-fire probability per check, which is the regime we're trying to leave.)
Bumping the default share rate is a defensible secondary change
shares_per_minute = 6.0 produces the noisy single-window estimates above. Doubling to 12 narrows the post-fire CI by ~30% (Poisson noise scales as 1/√N), which makes the first close-to-truth fire land meaningfully closer.
Tradeoffs to weigh:
Per-share work. Pool, JDC, and translator all do per-share validation; doubling the rate roughly doubles that load. For the largest pools this may shift where validation runs (more partial validation in JDC/translator). Probably fine, but worth confirming with operators of large-deployment pools.
Small miners. A USB-stick or low-end ASIC averaging well below 6 shares/min today won't reach 12 either — at very low miner hashrates, the difficulty floor in the protocol takes over and the configured rate is aspirational. This bump shouldn't change anything for them, but it's worth checking that no path assumes the configured rate is achievable.
Diminishing returns. Past ~20 shares/min the CI tightens slowly; sweet spot is 10–15.
Steady-state precision: a real tradeoff
Worth being honest about this. The current ladder fires periodically once "near" truth — those small lower rungs (15%/300s, 30%/240s) keep nudging the displayed value, which both anchors it within roughly ±15% of truth and produces the visible twitch that operators complain about. Thresholds set above the noise floor will sit, by construction, around the noise-floor magnitude — at 12 shares/min and the proposed parametric thresholds, the steady-state band is approximately ±45%.
That's a real cost. A 30% hardware throttling event (failed PSU rail, thermal limiter kicking in) would land inside the new band and not trigger a re-fire. A vardiff dashboard exists partly to surface that kind of event. The right argument here isn't "stability is more important than precision" — it's "vardiff isn't the right surface for change-detection at this granularity." Operators wanting to spot a 30% throttle should be reading a smoothed observed-share-rate line, not the vardiff target. If we agree on that reframing, the precision regression is acceptable. If we don't, this change shouldn't land on its own and should wait for the EWMA restructuring below.
Expected impact
A fresh connection's hashrate trajectory has three phases. Each fix touches a different one.
Phase 1 — orders-of-magnitude climb. When the configured hashrate is far from truth, the system spends most of convergence walking up by ×3 per minute under the clamp at classic.rs:177-181. Share counts in this phase are huge (thousands per cycle), so noise is irrelevant. Neither fix touches Phase 1. Walking from a default 10 GH/s to within order-of-magnitude of truth still takes ~9 minutes. This is the dominant cost for fresh-connection latency, and we're not addressing it here.
Phase 2 — final approach. Once within 100–1000% of truth, the top rung fires using a measured estimate. Bumping shares_per_minute from 6 to 12 narrows the post-fire CI by ~30%; the first close-to-truth fire lands closer.
Phase 3 — settling. Today's lower rungs sit inside the CI of subsequent measurements, so the system keeps firing on noise as small corrections that walk around the true value. This is the visible wobble. Retuned thresholds sit above the noise floor, so once within band, nothing fires.
The chart below is a simulation, not measured data — it's there to illustrate the qualitative shape, not as evidence:
In summary:
- Time-to-flat improves from ~15 minutes to ~10 minutes (the ~9-minute Phase 1 still dominates).
- Visible jumps at the top of the staircase drop from 2–3 to 1.
- Steady-state band widens from ~±15% to ~±45%, with the reframing above.
Future work
EWMA-smoothed hashrate. Replace the ladder with a continuously-updated exponentially-weighted moving average. Convergence becomes a smooth exponential ramp, post-fire accuracy is tight by construction, and the implementation is simpler than the current ladder. Time constant becomes the only tunable. This also eliminates the steady-state-band tradeoff above and is the right long-term fix.
SPRT / CI-driven retargeting. Fire only when the observed rate is decisively outside the configured target's confidence band. Best statistical guarantees, fastest convergence on large initial gaps, most code complexity.
Implementation surface
Primary algorithm (parametric thresholds):
stratum-mining/stratum: sv2/channels-sv2/src/vardiff/classic.rs:96-196
Polling call sites — no change, keep 60s:
pool-apps/pool/src/lib/channel_manager/mod.rs:611-621
miner-apps/jd-client/src/lib/channel_manager/mod.rs:1257-1260
miner-apps/translator/src/lib/sv1/sv1_server/difficulty_manager.rs:33-43
Default share rate (proposal: 12.0, conditional on agreeing the steady-state reframing above):
pool-apps/pool/config-examples/**/*.toml
miner-apps/jd-client/config-examples/**/*.toml
miner-apps/translator/config-examples/**/*.toml
Methodology note: statistical analysis was developed with LLM assistance against the Garwood interval data linked from #396; code references and config defaults were verified against current sv2-apps HEAD.
tl;dr: I had my agent look at the Garwood interval data posted by @adammwest in the linked issue to see how it might best be integrated into the current algorithm.
Findings so far:
I'm doing some manual A/B testing now to confirm the reduction in vardiff triggers but in all honesty this accuracy tweak doesn't have much impact on the total convergence time, which is the bigger issue and will require a larger rewrite.
Curious what others think of these changes.
Reconsider #396: vardiff convergence quality, not poll cadence
Refs: #396
#396 proposes shrinking the hardcoded 60s vardiff polling cycle in
pool,jd-client, andtranslatorsosv2-uiupdates more frequently. After working through the algorithm and its statistical behaviour at default settings, the polling cadence isn't the bottleneck — it isn't even part of the algorithm window. The user-visible problems (slow convergence, "staircase" wobble, dashboard feeling unresponsive) come from a separate place: the ladder's thresholds were calibrated to a different statistical regime than the default 6 shares/min produces. This is a request to reframe the work and pursue different fixes.Polling cadence is not the algorithm window
The hardcoded 60s in each app is just the
tokio::time::intervalcadence at whichtry_vardiffis called:try_vardiff(stratum/sv2/channels-sv2/src/vardiff/classic.rs) computes deviation againstshares_since_last_update, which only resets when a fire applies a new target. Calling it more frequently re-checks the same accumulating window against the same thresholds. There's also an internal early-return atdelta_time <= 15, so polling below ~15s is a no-op.Shrinking the apps-layer ticker on its own buys nothing on the algorithm side. If UI freshness is the real goal, it's better solved at the display layer (smoother interpolation, separate hashrate buckets independent of vardiff fires).
What operators actually experience
Three properties matter for vardiff in practice:
Convergence latency — time from a fresh connection to a target near the miner's true hashrate. Today: ~7–10 minutes on typical SV1-bridged connections, presenting as a visible staircase climb.
Convergence stability — how smoothly the staircase rises. Today: a few large discrete jumps, because each fire pulls a single-window hashrate estimate that carries a wide confidence interval.
Steady-state behaviour — once near the true rate, no rungs fire and the system settles. This works correctly.
The complaint that motivated #396 is downstream of latency and stability, not poll frequency. A faster ticker on the same algorithm makes the staircase no shorter and no smoother; it just samples it more often.
Why this happens — sampling noise at default share rate
Every example config uses
shares_per_minute = 6.0. One 60s window observes ~6 shares. Share counts are Poisson; the exact 95% Garwood CI on a count of 6 is approximately[2.2, 13.1]— that is, −63%, +118% from the expected value. The single-window hashrate estimate that drives every ladder fire inherits that CI directly.Cross-referencing the ladder in
vardiff/classic.rs:154-162:against the noise expected at each elapsed time:
(99% CI rather than 95% because the algorithm re-evaluates each rung repeatedly during convergence; a 95% bound implies ~5% false-fire probability per check, which is the regime we're trying to leave.)
Bumping the default share rate is a defensible secondary change
shares_per_minute = 6.0produces the noisy single-window estimates above. Doubling to 12 narrows the post-fire CI by ~30% (Poisson noise scales as1/√N), which makes the first close-to-truth fire land meaningfully closer.Tradeoffs to weigh:
Per-share work. Pool, JDC, and translator all do per-share validation; doubling the rate roughly doubles that load. For the largest pools this may shift where validation runs (more partial validation in JDC/translator). Probably fine, but worth confirming with operators of large-deployment pools.
Small miners. A USB-stick or low-end ASIC averaging well below 6 shares/min today won't reach 12 either — at very low miner hashrates, the difficulty floor in the protocol takes over and the configured rate is aspirational. This bump shouldn't change anything for them, but it's worth checking that no path assumes the configured rate is achievable.
Diminishing returns. Past ~20 shares/min the CI tightens slowly; sweet spot is 10–15.
Steady-state precision: a real tradeoff
Worth being honest about this. The current ladder fires periodically once "near" truth — those small lower rungs (15%/300s, 30%/240s) keep nudging the displayed value, which both anchors it within roughly ±15% of truth and produces the visible twitch that operators complain about. Thresholds set above the noise floor will sit, by construction, around the noise-floor magnitude — at 12 shares/min and the proposed parametric thresholds, the steady-state band is approximately ±45%.
That's a real cost. A 30% hardware throttling event (failed PSU rail, thermal limiter kicking in) would land inside the new band and not trigger a re-fire. A vardiff dashboard exists partly to surface that kind of event. The right argument here isn't "stability is more important than precision" — it's "vardiff isn't the right surface for change-detection at this granularity." Operators wanting to spot a 30% throttle should be reading a smoothed observed-share-rate line, not the vardiff target. If we agree on that reframing, the precision regression is acceptable. If we don't, this change shouldn't land on its own and should wait for the EWMA restructuring below.
Expected impact
A fresh connection's hashrate trajectory has three phases. Each fix touches a different one.
Phase 1 — orders-of-magnitude climb. When the configured hashrate is far from truth, the system spends most of convergence walking up by ×3 per minute under the clamp at
classic.rs:177-181. Share counts in this phase are huge (thousands per cycle), so noise is irrelevant. Neither fix touches Phase 1. Walking from a default 10 GH/s to within order-of-magnitude of truth still takes ~9 minutes. This is the dominant cost for fresh-connection latency, and we're not addressing it here.Phase 2 — final approach. Once within 100–1000% of truth, the top rung fires using a measured estimate. Bumping
shares_per_minutefrom 6 to 12 narrows the post-fire CI by ~30%; the first close-to-truth fire lands closer.Phase 3 — settling. Today's lower rungs sit inside the CI of subsequent measurements, so the system keeps firing on noise as small corrections that walk around the true value. This is the visible wobble. Retuned thresholds sit above the noise floor, so once within band, nothing fires.
The chart below is a simulation, not measured data — it's there to illustrate the qualitative shape, not as evidence:
In summary:
Future work
EWMA-smoothed hashrate. Replace the ladder with a continuously-updated exponentially-weighted moving average. Convergence becomes a smooth exponential ramp, post-fire accuracy is tight by construction, and the implementation is simpler than the current ladder. Time constant becomes the only tunable. This also eliminates the steady-state-band tradeoff above and is the right long-term fix.
SPRT / CI-driven retargeting. Fire only when the observed rate is decisively outside the configured target's confidence band. Best statistical guarantees, fastest convergence on large initial gaps, most code complexity.
Implementation surface
Primary algorithm (parametric thresholds):
stratum-mining/stratum:sv2/channels-sv2/src/vardiff/classic.rs:96-196Polling call sites — no change, keep 60s:
pool-apps/pool/src/lib/channel_manager/mod.rs:611-621miner-apps/jd-client/src/lib/channel_manager/mod.rs:1257-1260miner-apps/translator/src/lib/sv1/sv1_server/difficulty_manager.rs:33-43Default share rate (proposal: 12.0, conditional on agreeing the steady-state reframing above):
pool-apps/pool/config-examples/**/*.tomlminer-apps/jd-client/config-examples/**/*.tomlminer-apps/translator/config-examples/**/*.tomlMethodology note: statistical analysis was developed with LLM assistance against the Garwood interval data linked from #396; code references and config defaults were verified against current
sv2-appsHEAD.