B-2 WAN-latency spike: interactive no-go, batch bandwidth-gated by lai3d · Pull Request #124 · qfc-network/qfc-core

lai3d · 2026-06-14T15:41:48Z

ROADMAP-AI-V3 gates B-2 (multi-miner pipeline inference) on a WAN-latency measurement spike before committing the 12–16h build. This is that spike: real measurements + a reproducible model + the scope decision. No production code — a runnable calculator (cargo run -p qfc-inference --example pipeline_latency) + two docs.

Real measured anchors

Path	RTT	Bandwidth	How
LAN — AWS us-east-1 intra-VPC (testnet A→B/C/D)	0.25 ms	~5 Gbit/s (assumed)	`ping`, 5×3 samples
Intercontinental — Singapore ↔ AWS us-east-1 (Virginia)	242 ms median	20 Mbit/s single-stream	TCP-handshake RTT ×8; 20 MB over SSH

The live testnet is single-region (B/C/D are LAN behind a ProxyJump), so it can't directly measure cross-region miner latency — these two anchors bracket it, with published AWS inter-region figures filling the middle.

Findings

1. Interactive autoregressive inference over WAN = NO-GO. Each generated token traverses the full pipeline (K−1 hops), so latency is T·(K−1)·RTT/2. qwen2.5-7b, K=4, 100 tokens — network alone: 0.77s regional, 6.1s continental, 37s intercontinental. Only same-datacenter/same-region passes a 2s bar, which negates the point of geographically distributed miners.

2. Batch prefill = viable but BANDWIDTH-gated — the spike's correction to the roadmap, which framed B-2's risk as RTT only. At the measured 20 Mbit/s, a 7B prefill hop (B8/S512 = 28.7 MB/boundary) is ~12s — transfer-bound, not RTT-bound. Even regional cross-AZ trips the 200ms/hop threshold. Bandwidth is co-equal with RTT.

Decision — ADR-0011

B-2 drops interactive WAN inference (stays single-miner, as today).
B-2 targets batch/async single-pass workloads only.
B3 assignment must model bandwidth + locality (not just RTT): form network-local pipeline groups, carry activation-transfer cost as an explicit term, reject groups over a per-hop budget, prefer fewer stages.
Activation compression (fp16→int8/fp8) becomes a B-2 requirement, folded into B4 prototype + B5 commitments.
Hard gate before building B3–B5: re-run the calculator against real cross-region miner-to-miner RTT+BW — needs ≥2 miners in different regions (the single-region testnet can't produce this). Build only if a realistic post-compression locality tier clears the per-hop budget.

Files

crates/qfc-inference/examples/pipeline_latency.rs — reproducible calculator (4 regression-guard tests)
docs/spikes/B2-wan-pipeline-latency.md — methodology, measured data, full result tables
docs/adr/0011-b2-pipeline-scope.md — the decision

🤖 Generated with Claude Code

…th-gated ROADMAP-AI-V3 gates B-2 (multi-miner pipeline inference) on a WAN-latency measurement spike. Done: real measurements + a reproducible model + the scope decision. Measured anchors: - LAN (AWS us-east-1 intra-VPC, testnet A->B/C/D): RTT 0.25ms - Intercontinental (Singapore <-> AWS us-east-1 Virginia): RTT 242ms median, 20 Mbit/s single-stream The live testnet is single-region, so these bracket the geo-distributed case; published AWS inter-region figures fill the middle. Findings (crates/qfc-inference/examples/pipeline_latency.rs — re-runnable): - Interactive autoregressive over WAN = NO-GO. Per-token full-pipeline traversal: qwen2.5-7b K=4 100 tokens = 6s continental, 37s intercontinental, network alone. Only same-region passes (defeats geo-decentralization). - Batch prefill = viable but BANDWIDTH-gated, not just RTT-gated (the dimension the roadmap under-weighted): 7B B8/S512 = 28.7MB/boundary -> ~12s/hop intercontinental at 20 Mbit/s; even regional cross-AZ trips the 200ms/hop threshold. Decision (ADR-0011): B-2 drops interactive WAN inference; targets batch/async only; B3 assignment must model bandwidth+locality (not just RTT) and form network-local pipeline groups; activation compression (fp16->int8/fp8) becomes a B-2 requirement. Hard gate before B3-B5: re-run the calculator against real cross-region miner-to-miner measurements (needs >=2 miners in different regions; single-region testnet can't produce them). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…e shipped (#125) Update ROADMAP-AI-V3.md (+ zh mirror) to reflect actual status: - Status line + new Progress section with the shipped milestones and PRs (B-1 #102; A0-A2 #107; A3 #110; A4 #114; A5 #120; A6 #122; B-2 spike #124). - Status column on both milestone tables; A0 shipped as 7 ADRs not 5; A5/A6 note the node-side/live-deployment parts that remain gated. - B-2 section reflects the spike verdict (ADR-0011): interactive WAN = no-go, batch-only + bandwidth/locality-aware + activation compression; B3-B5 gated on real cross-region miner-to-miner measurement. - Sequencing note: B-1 -> A -> B-2 order executed as recommended. Docs only. Co-authored-by: Claude Fable 5 <noreply@anthropic.com>

lai3d merged commit ed315eb into main Jun 14, 2026
4 checks passed

lai3d deleted the claude/b2-wan-latency-spike branch June 14, 2026 15:44

lai3d mentioned this pull request Jun 14, 2026

docs(roadmap): mark AI-V3 progress — B-1, Feature A (A0–A6), B-2 spike shipped #125

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

B-2 WAN-latency spike: interactive no-go, batch bandwidth-gated#124

B-2 WAN-latency spike: interactive no-go, batch bandwidth-gated#124
lai3d merged 1 commit into
mainfrom
claude/b2-wan-latency-spike

lai3d commented Jun 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

lai3d commented Jun 14, 2026

Real measured anchors

Findings

Decision — ADR-0011

Files

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant