Skip to content

fix(consensus): wall-clock-anchored epoch/slot scheduling (5th fork defect)#129

Merged
lai3d merged 1 commit into
mainfrom
claude/consensus-time-slots
Jun 27, 2026
Merged

fix(consensus): wall-clock-anchored epoch/slot scheduling (5th fork defect)#129
lai3d merged 1 commit into
mainfrom
claude/consensus-time-slots

Conversation

@lai3d

@lai3d lai3d commented Jun 27, 2026

Copy link
Copy Markdown
Collaborator

The testnet reset on the #126 image still forked — exposing a 5th defect only visible at multi-node runtime: scheduling was anchored to each node's local start time, so nodes that booted at different times ran on different epoch/slot numbers and elected different producers at the same instant → competing blocks.

Observed live: node-1 on epoch 25 vs node-2/3 on epoch 15, purely from a ~minute start-time gap; block #5 hash differed across nodes.

Root cause

  • producer.rs used a per-tick local counter (slot += 1 from boot).
  • maybe_advance_epoch accumulated epochs from each node's start_time.

#126 made selection deterministic given the same (epoch, slot), but nothing made nodes agree on the current epoch/slot.

Fix — anchor scheduling to wall-clock (standard time-slot PoS)

  • Slot = now_ms / block_interval_ms (global; NTP-synced nodes agree), processed at most once per slot.
  • Epoch = now_ms / epoch_duration_ms (global), replacing local-start accumulation.
  • Seed derived directly: blake3(genesis_seed ‖ epoch) — O(1), required because wall-clock epoch numbers are far too large to walk fix(consensus): make leader election converge across nodes (testnet fork fix) #126's hash chain. genesis_seed is captured on the first start_epoch (genesis init), identical across nodes.

Every node now computes the same slot/epoch/seed/producer at any instant → one elected producer network-wide per slot → convergence.

Tests (28 qfc-consensus + 21 qfc-node pass)

  • test_nodes_agree_despite_different_start_times — two engines started ~20ms apart with opposite validator order agree on epoch, seed, and producer for 200 slots (the decisive property).
  • test_epoch_seed_is_deterministic — seed = blake3(genesis ‖ n).
  • Existing order-independence / round-robin tests still pass.

Final consensus fix for the testnet recovery (after #126 selection determinism, #128 forward sync). Next: rebuild → re-run the reset.

🤖 Generated with Claude Code

…efect)

The testnet reset on the #126 image still forked: nodes that started at
different times ran on different epoch/slot numbers, so they elected different
producers at the same instant and produced competing blocks.

Root cause: scheduling was anchored to each node's LOCAL start time.
- producer.rs used a per-tick local counter (`slot += 1` from boot).
- maybe_advance_epoch accumulated epochs from each node's `start_time`.
#126 made selection deterministic GIVEN the same (epoch, slot), but nothing made
nodes agree on the CURRENT epoch/slot. (Observed live: node-1 epoch 25 vs
node-2/3 epoch 15 purely from a ~minute start-time gap.)

Fix — anchor scheduling to wall-clock (standard time-slot approach):
- producer slot = now_ms / block_interval_ms (global; NTP-synced nodes agree).
  Processed at most once per slot. Replaces the local counter.
- maybe_advance_epoch: epoch = now_ms / epoch_duration_ms (global), replacing
  the local-start-time accumulation.
- Epoch seed derived DIRECTLY: blake3(genesis_seed || epoch) — O(1), required
  because wall-clock epoch numbers are far too large to walk #126's hash chain
  to. genesis_seed is captured on the first start_epoch (genesis init) and is
  identical across nodes.

Result: every node computes the same slot/epoch/seed/producer at any instant →
one elected producer network-wide per slot → convergence.

Tests: nodes started at DIFFERENT times with opposite validator order now agree
on epoch, seed, and producer for 200 slots; epoch seed = blake3(genesis||n).
28 qfc-consensus + 21 qfc-node tests pass.

This is the final consensus fix for the testnet recovery (after #126 selection
determinism, #128 forward sync). Rebuild → re-run reset.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@lai3d lai3d merged commit 5b818ff into main Jun 27, 2026
4 checks passed
@lai3d lai3d deleted the claude/consensus-time-slots branch June 27, 2026 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant