Skip to content

[FREEZE-EXCEPTION] fix(SEV-039): RPC quorum threshold = strict majority (⌊N/2⌋+1)#439

Merged
alrimarleskovar merged 1 commit into
mainfrom
claude/fix-quorum-strict-majority
May 31, 2026
Merged

[FREEZE-EXCEPTION] fix(SEV-039): RPC quorum threshold = strict majority (⌊N/2⌋+1)#439
alrimarleskovar merged 1 commit into
mainfrom
claude/fix-quorum-strict-majority

Conversation

@alrimarleskovar
Copy link
Copy Markdown
Owner

[FREEZE-EXCEPTION rationale: closes a concrete data-integrity weakness in the multi-RPC quorum primitive (PR #432, Wave 9.1) that ships in the indexer service. Defense-in-depth before the Phase 3 B2B oracle is wired to consume events. No on-chain program change; 4 files (1 helper + 3 call sites + 1 test file). Internal review finding.]

What

The ceil(N/2) threshold in decideTxQuorum / checkFinalizedQuorum collapses to 1 for N=2 — the most common cheap multi-RPC setup. A single divergent / lying / null-returning provider single-handedly fixes the verdict via two concrete paths:

  1. Tie-break injection[tx-A, tx-B] (fingerprints diverge): buckets {A:1, B:1}, both ≥ threshold 1, best decided by Map insertion order. 1 divergent RPC INJECTS its fingerprint without any real agreement.
  2. Null censorship[tx, null]: nulls (1) ≥ threshold (1) fires first, returning consensus_null. In reconciler.checkFinalizedQuorum the symmetric missing ≥ threshold branch DELETES the event from the canonical table — a permanent censorship of a real on-chain event by a single lying RPC.

Both made the Wave 9.1 #432 "single-RPC trust" mitigation effectively a no-op for N=2 deployments.

Asymmetry the internal report missed: reconciler.checkFinalizedQuorum's missing branch is delete-of-record, strictly more dangerous than the read-side consensus_null in decideTxQuorum.

Fix

  • Add exported quorumThreshold(n) = ⌊n/2⌋ + 1 in rpcQuorum.ts as the single source of truth (and document the attack-vector derivation on the function so future reviewers don't relax it back to ceil). Edge case n ≤ 0 → 0; callers all early-return on empty.

  • Use it in all 3 call sites: decideTxQuorum, checkFinalizedQuorum (with an extra comment on the delete-of-record risk), and the backfill-events.ts startup telemetry log (was a separate Math.ceil(...) that would have silently drifted otherwise — caught by grep -rE 'Math\.ceil\(.+length / 2\)').

  • Update the stale test invariant '2P + 1 error → consensus_tx' (locked in the old behavior) to '… → no_quorum'. Trade-off documented in the test: a flaky provider causes deferral instead of ingest on a single vote — worth it, since under the old rule a malicious provider returning error let 1 (possibly lying) survivor decide.

  • Add two regression tests that lock down the exact attack vectors:

    • [tx-A, tx-B] → divergence (no tie-break injection)
    • [tx, null] → divergence (no null-censorship)

    Any future drift to ceil trips here loudly.

  • Add a quorumThreshold helper test pinning the table from N=0..7.

Backward compatibility

N old (ceil) new (floor+1) change
1 1 1 none — single-RPC behavior unchanged
2 1 2 strict majority (was the bug)
3 2 2 none — odd N already strict majority under ceil
4 2 3 strict majority
5 3 3 none
6 3 4 strict majority

Only even N tightens. Single-RPC (the default with just SOLANA_RPC_URL set) and any odd-count deployment are unchanged.

Test plan

Validated in sandbox:

  • pnpm typecheck clean
  • pnpm exec prettier --check clean on the 4 changed files
  • pnpm test:rpc-quorum19/19 passing (was 17; +2 regression tests for the attack vectors, +1 for the helper; 1 existing invariant flipped to no_quorum)
  • pnpm test:indexer-log → 4/4 unaffected

reconciler-lease needs DATABASE_URL — runs in CI's Postgres lane.

Files

  • services/indexer/src/rpcQuorum.ts+quorumThreshold(n), use in decideTxQuorum
  • services/indexer/src/reconciler.ts — import + use in checkFinalizedQuorum (+ comment on delete-of-record risk)
  • services/indexer/src/backfill-events.ts — telemetry log uses the helper
  • tests/rpc_quorum.spec.ts — 1 invariant updated + 2 regression tests + 1 helper test

https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm


Generated by Claude Code

…N/2⌉

Internal review (Wave 9 follow-up): the ceil(N/2) threshold in
decideTxQuorum / checkFinalizedQuorum collapses to 1 for N=2, which is
the most common cheap multi-RPC setup. A single divergent / lying /
null-returning provider single-handedly fixes the verdict via two
concrete paths:

  - Tie-break injection: [tx-A, tx-B] → buckets {A:1, B:1}, both ≥
    threshold 1, best decided by Map insertion order. 1 divergent RPC
    INJECTS its fingerprint without any real agreement.
  - Null censorship: [tx, null] → nulls (1) ≥ threshold (1) fires
    first, returning consensus_null (or consensus 'missing' in the
    reconciler, which DELETES the event from the canonical table — a
    permanent censorship of a real on-chain event by a single lying
    RPC).

Both made the Wave 9.1 #432 'single-RPC trust' mitigation effectively a
no-op for N=2 deployments. Asymmetry the original report missed:
reconciler.checkFinalizedQuorum's 'missing' branch is delete-of-record,
strictly more dangerous than the read-side 'consensus_null' in
decideTxQuorum.

Fix:

- Add exported quorumThreshold(n) = ⌊n/2⌋ + 1 in rpcQuorum.ts as the
  single source of truth (and document the attack vector derivation
  on the function so future reviewers don't relax it back to ceil).
  Edge case n≤0 → 0; callers all early-return on empty.
- Use it in all 3 call sites: decideTxQuorum, checkFinalizedQuorum
  (with an extra note on the delete-of-record risk), and the
  backfill-events.ts startup telemetry log (was a separate
  Math.ceil(...) that would have silently drifted otherwise).
- Update the stale test invariant '2P + 1 error → consensus_tx'
  (locked in the old behavior) to '… → no_quorum'. Trade-off
  documented: a flaky provider causes deferral instead of ingest on
  a single vote — worth it, since under the old rule a malicious
  provider returning 'error' let 1 (possibly lying) survivor decide.
- Add two regression tests that lock down the exact attack vectors
  ([tx-A, tx-B] → no_quorum; [tx, null] → no_quorum) so any future
  drift to ceil trips here loudly.
- Add a quorumThreshold helper test pinning the table from N=0..7.

Backward compat: N=1 (single-RPC, the default) still has threshold 1,
behavior unchanged. Odd N also unchanged (floor(N/2)+1 == ceil(N/2) for
odd N). Only even N tightens.

Validated in sandbox: tsc clean, prettier clean,
`pnpm test:rpc-quorum` 19/19 (was 17 — +2 regression tests for the
attack vectors, +1 for the helper, 1 existing invariant flipped to
`no_quorum`). `test:indexer-log` 4/4 unaffected. `reconciler-lease`
needs DATABASE_URL — runs in CI's Postgres lane.

https://claude.ai/code/session_01YapZy1Z5gzbV5EammBkSQm
@vercel
Copy link
Copy Markdown

vercel Bot commented May 31, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
round_financial Ignored Ignored May 31, 2026 1:39pm

@alrimarleskovar alrimarleskovar merged commit 563020e into main May 31, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants