[FREEZE-EXCEPTION] fix(SEV-039): RPC quorum threshold = strict majority (⌊N/2⌋+1)#439
Merged
Merged
Conversation
…N/2⌉
Internal review (Wave 9 follow-up): the ceil(N/2) threshold in
decideTxQuorum / checkFinalizedQuorum collapses to 1 for N=2, which is
the most common cheap multi-RPC setup. A single divergent / lying /
null-returning provider single-handedly fixes the verdict via two
concrete paths:
- Tie-break injection: [tx-A, tx-B] → buckets {A:1, B:1}, both ≥
threshold 1, best decided by Map insertion order. 1 divergent RPC
INJECTS its fingerprint without any real agreement.
- Null censorship: [tx, null] → nulls (1) ≥ threshold (1) fires
first, returning consensus_null (or consensus 'missing' in the
reconciler, which DELETES the event from the canonical table — a
permanent censorship of a real on-chain event by a single lying
RPC).
Both made the Wave 9.1 #432 'single-RPC trust' mitigation effectively a
no-op for N=2 deployments. Asymmetry the original report missed:
reconciler.checkFinalizedQuorum's 'missing' branch is delete-of-record,
strictly more dangerous than the read-side 'consensus_null' in
decideTxQuorum.
Fix:
- Add exported quorumThreshold(n) = ⌊n/2⌋ + 1 in rpcQuorum.ts as the
single source of truth (and document the attack vector derivation
on the function so future reviewers don't relax it back to ceil).
Edge case n≤0 → 0; callers all early-return on empty.
- Use it in all 3 call sites: decideTxQuorum, checkFinalizedQuorum
(with an extra note on the delete-of-record risk), and the
backfill-events.ts startup telemetry log (was a separate
Math.ceil(...) that would have silently drifted otherwise).
- Update the stale test invariant '2P + 1 error → consensus_tx'
(locked in the old behavior) to '… → no_quorum'. Trade-off
documented: a flaky provider causes deferral instead of ingest on
a single vote — worth it, since under the old rule a malicious
provider returning 'error' let 1 (possibly lying) survivor decide.
- Add two regression tests that lock down the exact attack vectors
([tx-A, tx-B] → no_quorum; [tx, null] → no_quorum) so any future
drift to ceil trips here loudly.
- Add a quorumThreshold helper test pinning the table from N=0..7.
Backward compat: N=1 (single-RPC, the default) still has threshold 1,
behavior unchanged. Odd N also unchanged (floor(N/2)+1 == ceil(N/2) for
odd N). Only even N tightens.
Validated in sandbox: tsc clean, prettier clean,
`pnpm test:rpc-quorum` 19/19 (was 17 — +2 regression tests for the
attack vectors, +1 for the helper, 1 existing invariant flipped to
`no_quorum`). `test:indexer-log` 4/4 unaffected. `reconciler-lease`
needs DATABASE_URL — runs in CI's Postgres lane.
https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
[FREEZE-EXCEPTION rationale: closes a concrete data-integrity weakness in the multi-RPC quorum primitive (PR #432, Wave 9.1) that ships in the indexer service. Defense-in-depth before the Phase 3 B2B oracle is wired to consume
events. No on-chain program change; 4 files (1 helper + 3 call sites + 1 test file). Internal review finding.]What
The
ceil(N/2)threshold indecideTxQuorum/checkFinalizedQuorumcollapses to 1 for N=2 — the most common cheap multi-RPC setup. A single divergent / lying / null-returning provider single-handedly fixes the verdict via two concrete paths:[tx-A, tx-B](fingerprints diverge): buckets{A:1, B:1}, both≥ threshold 1,bestdecided by Map insertion order. 1 divergent RPC INJECTS its fingerprint without any real agreement.[tx, null]:nulls (1) ≥ threshold (1)fires first, returningconsensus_null. Inreconciler.checkFinalizedQuorumthe symmetricmissing ≥ thresholdbranch DELETES the event from the canonical table — a permanent censorship of a real on-chain event by a single lying RPC.Both made the Wave 9.1 #432 "single-RPC trust" mitigation effectively a no-op for N=2 deployments.
Asymmetry the internal report missed:
reconciler.checkFinalizedQuorum'smissingbranch is delete-of-record, strictly more dangerous than the read-sideconsensus_nullindecideTxQuorum.Fix
Add exported
quorumThreshold(n) = ⌊n/2⌋ + 1inrpcQuorum.tsas the single source of truth (and document the attack-vector derivation on the function so future reviewers don't relax it back toceil). Edge casen ≤ 0 → 0; callers all early-return on empty.Use it in all 3 call sites:
decideTxQuorum,checkFinalizedQuorum(with an extra comment on the delete-of-record risk), and thebackfill-events.tsstartup telemetry log (was a separateMath.ceil(...)that would have silently drifted otherwise — caught bygrep -rE 'Math\.ceil\(.+length / 2\)').Update the stale test invariant
'2P + 1 error → consensus_tx'(locked in the old behavior) to'… → no_quorum'. Trade-off documented in the test: a flaky provider causes deferral instead of ingest on a single vote — worth it, since under the old rule a malicious provider returningerrorlet 1 (possibly lying) survivor decide.Add two regression tests that lock down the exact attack vectors:
[tx-A, tx-B] → divergence (no tie-break injection)[tx, null] → divergence (no null-censorship)Any future drift to
ceiltrips here loudly.Add a
quorumThresholdhelper test pinning the table from N=0..7.Backward compatibility
ceil)floor+1)ceilOnly even N tightens. Single-RPC (the default with just
SOLANA_RPC_URLset) and any odd-count deployment are unchanged.Test plan
Validated in sandbox:
pnpm typecheckcleanpnpm exec prettier --checkclean on the 4 changed filespnpm test:rpc-quorum→ 19/19 passing (was 17; +2 regression tests for the attack vectors, +1 for the helper; 1 existing invariant flipped tono_quorum)pnpm test:indexer-log→ 4/4 unaffectedreconciler-leaseneedsDATABASE_URL— runs in CI's Postgres lane.Files
services/indexer/src/rpcQuorum.ts—+quorumThreshold(n), use indecideTxQuorumservices/indexer/src/reconciler.ts— import + use incheckFinalizedQuorum(+ comment on delete-of-record risk)services/indexer/src/backfill-events.ts— telemetry log uses the helpertests/rpc_quorum.spec.ts— 1 invariant updated + 2 regression tests + 1 helper testhttps://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm
Generated by Claude Code