Skip to content

perf(consensus): skip fsync for unsigned internal messages (block parts)#28

Open
JayT106 wants to merge 2 commits intocrypto-org-chain:v0.38.xfrom
JayT106:perf/wal-selective-fsync-v0.38.x
Open

perf(consensus): skip fsync for unsigned internal messages (block parts)#28
JayT106 wants to merge 2 commits intocrypto-org-chain:v0.38.xfrom
JayT106:perf/wal-selective-fsync-v0.38.x

Conversation

@JayT106
Copy link
Copy Markdown

@JayT106 JayT106 commented Mar 13, 2026

Summary

  • Skip fsync (WriteSync) for BlockPartMessage in the consensus WAL's receiveRoutine, using buffered Write instead
  • Only signed messages (VoteMessage, ProposalMessage) retain WriteSync for double-signing prevention
  • Explicit type-switch with default: panic prevents future signed message types from silently bypassing fsync

Backport of cometbft#5695 to v0.38.x.

Motivation

In receiveRoutine, every message from internalMsgQueue called WriteSync (Write + fsync). For a proposer with N block parts per round, this meant N+2 fsyncs per round (N can be 10–100+). Each fsync costs ~1–10ms on typical hardware, and up to 10–50ms on cloud storage (e.g., AWS EBS).

Block parts are unsigned data — losing them on crash causes a round timeout (liveness), not double-signing (safety). The existing FlushAndSync calls before SignVote and SignProposal already flush all buffered writes to disk before any signature is produced, maintaining the critical safety invariant.

Safety analysis

Why signed messages MUST keep WriteSync:

  • Pre-sign FlushAndSync ensures WAL replay reaches the same deterministic state
  • Post-sign WriteSync ensures the signed message is durable before handleMsg processes/broadcasts it
  • Without this, crash-replay could re-sign a different vote → equivocation

Why BlockPartMessage can safely use Write:

  1. Block parts are unsigned data chunks derived from the proposal block
  2. On crash: proposal exists (fsynced), some block parts may be missing → round times out → consensus proceeds
  3. The periodic 2-second WAL flush (processFlushTicks) eventually flushes them
  4. Any subsequent WriteSync (vote) or FlushAndSync (pre-sign) also flushes buffered block parts
  5. EndHeightMessage uses WriteSync, which flushes ALL buffered writes before the end marker

Benchmark results (Apple M1 Max SSD)

BenchmarkWALRoundSimulation/AllWriteSync-10       ~240ms/round
BenchmarkWALRoundSimulation/SelectiveFsync-10     ~10.6ms/round  (~23x faster)

Test plan

  • go build ./consensus/ — compiles cleanly
  • go vet ./consensus/ — no issues
  • TestWALSelectiveFsync — new test verifying dispatch logic
  • TestWALCrash — existing crash/replay tests pass
  • TestWALTruncate, TestWALEncoderDecoder, TestWALWrite, TestWALSearchForEndHeight, TestWALPeriodicSync — all pass
  • BenchmarkWALWrite, BenchmarkWALWriteSync, BenchmarkWALRoundSimulation — benchmarks pass

🤖 Generated with Claude Code

Only signed messages (votes, proposals) need WriteSync (fsync) in the
WAL for double-signing prevention. Block parts are unsigned data where
losing them on crash just causes a round timeout, not a safety violation.

Replace the blanket WriteSync in receiveRoutine with an explicit type
switch: WriteSync for VoteMessage/ProposalMessage, buffered Write for
BlockPartMessage. Unknown message types panic to prevent future signed
types from silently bypassing fsync.

The pre-sign FlushAndSync calls (before SignVote and SignProposal)
ensure buffered block parts reach disk before any signature is produced.

Benchmark (50 block parts/round, Apple M1 Max):
  AllWriteSync:    ~240ms/round
  SelectiveFsync:  ~10.6ms/round  (~23x faster)

Backport of cometbft#5695.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@JayT106 JayT106 self-assigned this Mar 13, 2026
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@JayT106 JayT106 requested a review from songgaoye March 14, 2026 00:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant