Skip to content

fix(consensus): make leader election converge across nodes (testnet fork fix)#126

Merged
lai3d merged 1 commit into
mainfrom
claude/consensus-convergence-fix
Jun 14, 2026
Merged

fix(consensus): make leader election converge across nodes (testnet fork fix)#126
lai3d merged 1 commit into
mainfrom
claude/consensus-convergence-fix

Conversation

@lai3d

@lai3d lai3d commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator

Fixes the root cause of the testnet's three-way consensus fork — validators diverged from block #1 into three independent chains and could never reconcile. (Full investigation in the session; the running staging-sha-8cf3cb0 image exhibits it live.)

Root cause

Leader election was not a pure, node-agnostic function of (validator set, epoch seed, slot), so different nodes elected different producers and never agreed. Two defects in qfc-consensus:

select_producer — zero-score fallback was order-dependent

if total_score == 0 { return Some(validators[0].address); }   // before

validators[0]'s identity depends on each node's internal list order, so every node elected itself and forked at block #1. Now:

  • filter to active validators and sort by address (canonical, node-agnostic),
  • zero total score → deterministic round-robin sorted[slot % len],
  • the weighted path also iterates the sorted set, so it's order-independent even with non-zero scores.

maybe_advance_epoch — seed came from the local chain head

seed.copy_from_slice(head_hash.as_bytes());   // before — per-node, divergent

Once two nodes diverged, their seeds diverged and the fork became permanent. Now the seed is a hash chain rooted at the genesis seed (seed_n = blake3(seed_{n-1} || n)), walked from the current epoch to the target — identical on every node regardless of chain head. Dropped the now-unused head_hash param (updated producer.rs / miner.rs callers). Deterministic ⇒ predictable beacon, an acceptable trade for convergence on this chain.

Tests (27 qfc-consensus pass; 20 qfc-node)

  • test_producer_selection_is_order_independent — two nodes holding the set in opposite order elect the same producer for 200 slots (weighted path).
  • test_zero_score_round_robin — zero-score path round-robins deterministically and covers every validator, order-independent.
  • test_epoch_seed_is_deterministic_hash_chain — epoch seed is the genesis-rooted hash chain, independent of head.

Scope / follow-ups

  • Not fixed here: Defect ④ (qfc-node sync MAX_PENDING_BLOCKS=1000 can't bridge a deep fork) and contribution-score scaling on the old deployed image — separate PRs.
  • A testnet reset is still required: the three existing chains diverge at block feat: implement v2.0 AI inference compute system #1 and cannot reconcile; they need a fresh genesis on this fixed binary. This PR makes a reset stick (nodes will converge instead of re-forking).

🤖 Generated with Claude Code

Root cause of the testnet's three-way fork (validators diverged from block
#1, three independent chains): leader election was not a pure, node-agnostic
function of (validator set, epoch seed, slot), so different nodes elected
different producers and never agreed.

Two defects fixed in qfc-consensus:

1. select_producer (Defect ②): the zero-score fallback returned
   `validators[0]` — whose identity depends on each node's internal list
   order, so every node elected ITSELF and forked immediately. Now: filter to
   active validators, sort canonically by address, and on zero total score do
   a deterministic round-robin by slot (`sorted[slot % len]`). The weighted
   path also iterates the address-sorted set, so selection is order-independent
   even with non-zero scores.

2. maybe_advance_epoch (Defect ③): the epoch seed was copied from the local
   chain head hash, so once two nodes diverged their seeds diverged and the
   fork became permanent. Now the seed is a hash chain rooted at the genesis
   seed (`seed_n = blake3(seed_{n-1} || n)`), walked from the current epoch to
   the target — identical on every node regardless of chain head. Dropped the
   now-unused `head_hash` param (callers in producer.rs/miner.rs updated).

Tests: producer selection is identical across two nodes holding the set in
opposite order (weighted path); zero-score path round-robins deterministically
and covers every validator; epoch seed is the deterministic genesis-rooted
hash chain. 27 qfc-consensus tests pass.

NOT in scope (follow-ups): Defect ④ (qfc-node sync MAX_PENDING_BLOCKS can't
bridge a deep fork) and the contribution-score scaling on the deployed image.
A testnet reset is still required — existing chains diverge at block #1 and
cannot reconcile; they need a fresh genesis on this fixed binary.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@lai3d lai3d merged commit db45f2c into main Jun 14, 2026
4 checks passed
@lai3d lai3d deleted the claude/consensus-convergence-fix branch June 14, 2026 16:30
lai3d added a commit that referenced this pull request Jun 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant