Skip to content

B-2 WAN-latency spike: interactive no-go, batch bandwidth-gated#124

Merged
lai3d merged 1 commit into
mainfrom
claude/b2-wan-latency-spike
Jun 14, 2026
Merged

B-2 WAN-latency spike: interactive no-go, batch bandwidth-gated#124
lai3d merged 1 commit into
mainfrom
claude/b2-wan-latency-spike

Conversation

@lai3d

@lai3d lai3d commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator

ROADMAP-AI-V3 gates B-2 (multi-miner pipeline inference) on a WAN-latency measurement spike before committing the 12–16h build. This is that spike: real measurements + a reproducible model + the scope decision. No production code — a runnable calculator (cargo run -p qfc-inference --example pipeline_latency) + two docs.

Real measured anchors

Path RTT Bandwidth How
LAN — AWS us-east-1 intra-VPC (testnet A→B/C/D) 0.25 ms ~5 Gbit/s (assumed) ping, 5×3 samples
Intercontinental — Singapore ↔ AWS us-east-1 (Virginia) 242 ms median 20 Mbit/s single-stream TCP-handshake RTT ×8; 20 MB over SSH

The live testnet is single-region (B/C/D are LAN behind a ProxyJump), so it can't directly measure cross-region miner latency — these two anchors bracket it, with published AWS inter-region figures filling the middle.

Findings

1. Interactive autoregressive inference over WAN = NO-GO. Each generated token traverses the full pipeline (K−1 hops), so latency is T·(K−1)·RTT/2. qwen2.5-7b, K=4, 100 tokens — network alone: 0.77s regional, 6.1s continental, 37s intercontinental. Only same-datacenter/same-region passes a 2s bar, which negates the point of geographically distributed miners.

2. Batch prefill = viable but BANDWIDTH-gated — the spike's correction to the roadmap, which framed B-2's risk as RTT only. At the measured 20 Mbit/s, a 7B prefill hop (B8/S512 = 28.7 MB/boundary) is ~12s — transfer-bound, not RTT-bound. Even regional cross-AZ trips the 200ms/hop threshold. Bandwidth is co-equal with RTT.

Decision — ADR-0011

  • B-2 drops interactive WAN inference (stays single-miner, as today).
  • B-2 targets batch/async single-pass workloads only.
  • B3 assignment must model bandwidth + locality (not just RTT): form network-local pipeline groups, carry activation-transfer cost as an explicit term, reject groups over a per-hop budget, prefer fewer stages.
  • Activation compression (fp16→int8/fp8) becomes a B-2 requirement, folded into B4 prototype + B5 commitments.
  • Hard gate before building B3–B5: re-run the calculator against real cross-region miner-to-miner RTT+BW — needs ≥2 miners in different regions (the single-region testnet can't produce this). Build only if a realistic post-compression locality tier clears the per-hop budget.

Files

  • crates/qfc-inference/examples/pipeline_latency.rs — reproducible calculator (4 regression-guard tests)
  • docs/spikes/B2-wan-pipeline-latency.md — methodology, measured data, full result tables
  • docs/adr/0011-b2-pipeline-scope.md — the decision

🤖 Generated with Claude Code

…th-gated

ROADMAP-AI-V3 gates B-2 (multi-miner pipeline inference) on a WAN-latency
measurement spike. Done: real measurements + a reproducible model + the
scope decision.

Measured anchors:
- LAN (AWS us-east-1 intra-VPC, testnet A->B/C/D): RTT 0.25ms
- Intercontinental (Singapore <-> AWS us-east-1 Virginia): RTT 242ms
  median, 20 Mbit/s single-stream
The live testnet is single-region, so these bracket the geo-distributed
case; published AWS inter-region figures fill the middle.

Findings (crates/qfc-inference/examples/pipeline_latency.rs — re-runnable):
- Interactive autoregressive over WAN = NO-GO. Per-token full-pipeline
  traversal: qwen2.5-7b K=4 100 tokens = 6s continental, 37s
  intercontinental, network alone. Only same-region passes (defeats
  geo-decentralization).
- Batch prefill = viable but BANDWIDTH-gated, not just RTT-gated (the
  dimension the roadmap under-weighted): 7B B8/S512 = 28.7MB/boundary ->
  ~12s/hop intercontinental at 20 Mbit/s; even regional cross-AZ trips
  the 200ms/hop threshold.

Decision (ADR-0011): B-2 drops interactive WAN inference; targets
batch/async only; B3 assignment must model bandwidth+locality (not just
RTT) and form network-local pipeline groups; activation compression
(fp16->int8/fp8) becomes a B-2 requirement. Hard gate before B3-B5: re-run
the calculator against real cross-region miner-to-miner measurements
(needs >=2 miners in different regions; single-region testnet can't
produce them).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@lai3d lai3d merged commit ed315eb into main Jun 14, 2026
4 checks passed
@lai3d lai3d deleted the claude/b2-wan-latency-spike branch June 14, 2026 15:44
lai3d added a commit that referenced this pull request Jun 14, 2026
…e shipped (#125)

Update ROADMAP-AI-V3.md (+ zh mirror) to reflect actual status:
- Status line + new Progress section with the shipped milestones and PRs
  (B-1 #102; A0-A2 #107; A3 #110; A4 #114; A5 #120; A6 #122; B-2 spike #124).
- Status column on both milestone tables; A0 shipped as 7 ADRs not 5;
  A5/A6 note the node-side/live-deployment parts that remain gated.
- B-2 section reflects the spike verdict (ADR-0011): interactive WAN =
  no-go, batch-only + bandwidth/locality-aware + activation compression;
  B3-B5 gated on real cross-region miner-to-miner measurement.
- Sequencing note: B-1 -> A -> B-2 order executed as recommended.

Docs only.

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant