-
Notifications
You must be signed in to change notification settings - Fork 137
Description
Summary
A safety vulnerability exists in the EPaxos recovery logic where dependencies can be lost when processing PrepareReply messages with conflicting PREACCEPTED states. Under asymmetric message loss, the recovering replica may commit an instance with an incomplete dependency set, causing different replicas to execute conflicting commands in different orders and leading to state machine divergence.
Bug Type
| Attribute | Value |
|---|---|
| Type | Safety Violation (Agreement/State Divergence) |
| Severity | 🔴 Critical |
| Threat Model | CFT (Compliant) |
Bug Location
File: src/epaxos/epaxos.go
Problematic Code Path 1: Recovery Initialization
Lines 1377-1379 in startRecoveryForInstance:
```go
} else if inst.Status >= epaxosproto.PREACCEPTED {
//
inst.lb.recoveryInst = &RecoveryInstance{inst.Cmds, inst.Status, inst.Seq, inst.Deps, 1, (r.Id == replica)}
}
```
When the recovering replica has a local PREACCEPTED instance, it initializes `recoveryInst` with its own (potentially incomplete) metadata and sets `preAcceptCount = 1`.
Problematic Code Path 2: PrepareReply Handling
Lines 1471-1488 in `handlePrepareReply`:
```go
if (preply.Status == epaxosproto.PREACCEPTED || preply.Status == epaxosproto.PREACCEPTED_EQ) &&
(inst.lb.recoveryInst == nil || inst.lb.recoveryInst.status < epaxosproto.ACCEPTED) {
if inst.lb.recoveryInst == nil {
inst.lb.recoveryInst = &RecoveryInstance{preply.Command, preply.Status, preply.Seq, preply.Deps, 1, false}
} else if preply.Seq == inst.Seq && equal(&preply.Deps, &inst.Deps) {
//
inst.lb.recoveryInst.preAcceptCount++
} else if preply.Status == epaxosproto.PREACCEPTED_EQ {
inst.lb.recoveryInst = &RecoveryInstance{preply.Command, preply.Status, preply.Seq, preply.Deps, 1, false}
}
//
// the reply with RICHER dependencies is IGNORED!
}
```
When a replica replies with PREACCEPTED and different (richer) Seq/Deps than the local view:
- The condition `preply.Seq == inst.Seq && equal(&preply.Deps, &inst.Deps)` fails
- If the reply is PREACCEPTED (not PREACCEPTED_EQ), the `else if` branch is skipped
- The richer dependency information is silently discarded!
Root Cause
The recovery logic fails to adopt the maximal (Seq, Deps) across all PREACCEPTED responses. Instead, it only counts matching responses and ignores non-matching PREACCEPTED replies, even when they contain more complete dependency information.
According to the EPaxos paper, during recovery, the replica should adopt a superset of all known dependencies to ensure safety.
Attack Scenario
Setup: 3 replicas (R0, R1, R2), N=3, f=1
Timeline
```
T0: I1 = PUT(k, 1) committed on R0 and R1 via fast path
R2 has no knowledge of I1
T1: Client sends I2 = PUT(k, 2) to R2
R2 creates I2 with seq=1, deps=∅ (doesn't know about I1)
R2 sends PreAccept(I2) to R0 and R1
T2: Network behavior:
- PreAccept(I2) to R0: DROPPED
- PreAccept(I2) to R1: DELIVERED
R1 processes PreAccept(I2):
- Detects conflict with I1 (same key k)
- Sets I2: seq=2, deps={I1}
- Sends PreAcceptReply(seq=2, deps={I1}) to R2
Network: PreAcceptReply from R1 to R2: DROPPED
T3: R2 times out on I2, starts recovery
- R2's local view: I2 with seq=1, deps=∅
- startRecoveryForInstance initializes recoveryInst = {seq=1, deps=∅, count=1}
- R2 sends Prepare(I2) to R0 and R1
T4: Prepare replies:
- R0 replies: Status=NONE (no record of I2)
- R1 replies: Status=PREACCEPTED, seq=2, deps={I1}
T5: R2 processes PrepareReply from R1:
- preply.Seq (2) != inst.Seq (1) → condition fails
- preply.Status == PREACCEPTED (not PREACCEPTED_EQ)
-
- recoveryInst remains {seq=1, deps=∅}
T6: R2 has majority (2/3), proceeds with recovery
- Uses recoveryInst: seq=1, deps=∅
- Commits I2 with deps=∅ (missing dependency on I1!)
T7: All replicas eventually see:
- I1 committed: seq=1, deps=∅
- I2 committed: seq=1, deps=∅ (NO dependency on I1!)
```
State Divergence
```
Execution order depends on local arrival/tie-breaking:
Replica R0: Executes I1 then I2 → final k = 2
Replica R1: Executes I1 then I2 → final k = 2
Replica R2: Executes I2 then I1 → final k = 1
❌ SAFETY VIOLATION: R2 has different final state!
```
Test Case
```go
func TestLostDependencyOnRecoveryUnderAsymmetricPreAcceptLoss(t *testing.T)
```
Test Output:
```
=== RUN TestLostDependencyOnRecoveryUnderAsymmetricPreAcceptLoss
lost_dependency_...test.go:269: replica 0: lost dependency on I1 for I2; got Deps[I1]=-1, want 0
lost_dependency...test.go:269: replica 1: lost dependency on I1 for I2; got Deps[I1]=-1, want 0
lost_dependency...test.go:269: replica 2: lost dependency on I1 for I2; got Deps[I1]=-1, want 0
lost_dependency..._test.go:323: execution-order safety violation: final value for key 1 is 1, want 2 (values=[1 1 1])
--- FAIL: TestLostDependencyOnRecoveryUnderAsymmetricPreAcceptLoss (0.09s)
```
Trigger Conditions
- Cluster with N ≥ 3 replicas
- Two conflicting commands I1 and I2 on the same key
- Asymmetric message loss during PreAccept phase for I2
- One replica (R1) has a PREACCEPTED view of I2 with richer deps than the leader (R2)
- R2 initiates recovery and receives conflicting PREACCEPTED states
- R1's richer deps are discarded because they don't match R2's local view
Impact
- Different replicas may execute conflicting commands in different orders
- State machines diverge permanently
- Violates the fundamental EPaxos guarantee of consistent execution order
- This is a data corruption bug in production systems
Notes
This bug is distinct from Issue #27 (handleCommit not protecting EXECUTED instances) and Issue #28 (non-durable PreAccept). It specifically affects the recovery path when processing PrepareReply messages with conflicting PREACCEPTED states.
The EPaxos paper requires that during recovery, the recovering replica must adopt metadata that is a superset of all known information. The current implementation violates this requirement by ignoring PREACCEPTED replies with different (potentially richer) Seq/Deps values.