Skip to content

[Safety Bug] Lost Dependency During Recovery Under Asymmetric PreAccept Message Loss #29

@SherlockShemol

Description

@SherlockShemol

Summary

A safety vulnerability exists in the EPaxos recovery logic where dependencies can be lost when processing PrepareReply messages with conflicting PREACCEPTED states. Under asymmetric message loss, the recovering replica may commit an instance with an incomplete dependency set, causing different replicas to execute conflicting commands in different orders and leading to state machine divergence.

Bug Type

Attribute Value
Type Safety Violation (Agreement/State Divergence)
Severity 🔴 Critical
Threat Model CFT (Compliant)

Bug Location

File: src/epaxos/epaxos.go

Problematic Code Path 1: Recovery Initialization

Lines 1377-1379 in startRecoveryForInstance:

```go
} else if inst.Status >= epaxosproto.PREACCEPTED {
// ⚠️ Initializes recoveryInst with LOCAL (potentially incomplete) data
inst.lb.recoveryInst = &RecoveryInstance{inst.Cmds, inst.Status, inst.Seq, inst.Deps, 1, (r.Id == replica)}
}
```

When the recovering replica has a local PREACCEPTED instance, it initializes `recoveryInst` with its own (potentially incomplete) metadata and sets `preAcceptCount = 1`.

Problematic Code Path 2: PrepareReply Handling

Lines 1471-1488 in `handlePrepareReply`:

```go
if (preply.Status == epaxosproto.PREACCEPTED || preply.Status == epaxosproto.PREACCEPTED_EQ) &&
(inst.lb.recoveryInst == nil || inst.lb.recoveryInst.status < epaxosproto.ACCEPTED) {
if inst.lb.recoveryInst == nil {
inst.lb.recoveryInst = &RecoveryInstance{preply.Command, preply.Status, preply.Seq, preply.Deps, 1, false}
} else if preply.Seq == inst.Seq && equal(&preply.Deps, &inst.Deps) {
// ⚠️ BUG: Only increments count if Seq AND Deps EXACTLY match!
inst.lb.recoveryInst.preAcceptCount++
} else if preply.Status == epaxosproto.PREACCEPTED_EQ {
inst.lb.recoveryInst = &RecoveryInstance{preply.Command, preply.Status, preply.Seq, preply.Deps, 1, false}
}
// ⚠️ BUG: If Seq/Deps differ and status is PREACCEPTED (not PREACCEPTED_EQ),
// the reply with RICHER dependencies is IGNORED!
}
```

When a replica replies with PREACCEPTED and different (richer) Seq/Deps than the local view:

  • The condition `preply.Seq == inst.Seq && equal(&preply.Deps, &inst.Deps)` fails
  • If the reply is PREACCEPTED (not PREACCEPTED_EQ), the `else if` branch is skipped
  • The richer dependency information is silently discarded!

Root Cause

The recovery logic fails to adopt the maximal (Seq, Deps) across all PREACCEPTED responses. Instead, it only counts matching responses and ignores non-matching PREACCEPTED replies, even when they contain more complete dependency information.

According to the EPaxos paper, during recovery, the replica should adopt a superset of all known dependencies to ensure safety.

Attack Scenario

Setup: 3 replicas (R0, R1, R2), N=3, f=1

Timeline

```
T0: I1 = PUT(k, 1) committed on R0 and R1 via fast path
R2 has no knowledge of I1

T1: Client sends I2 = PUT(k, 2) to R2
R2 creates I2 with seq=1, deps=∅ (doesn't know about I1)
R2 sends PreAccept(I2) to R0 and R1

T2: Network behavior:
- PreAccept(I2) to R0: DROPPED
- PreAccept(I2) to R1: DELIVERED

R1 processes PreAccept(I2):
  - Detects conflict with I1 (same key k)
  - Sets I2: seq=2, deps={I1}
  - Sends PreAcceptReply(seq=2, deps={I1}) to R2

Network: PreAcceptReply from R1 to R2: DROPPED

T3: R2 times out on I2, starts recovery
- R2's local view: I2 with seq=1, deps=∅
- startRecoveryForInstance initializes recoveryInst = {seq=1, deps=∅, count=1}
- R2 sends Prepare(I2) to R0 and R1

T4: Prepare replies:
- R0 replies: Status=NONE (no record of I2)
- R1 replies: Status=PREACCEPTED, seq=2, deps={I1}

T5: R2 processes PrepareReply from R1:
- preply.Seq (2) != inst.Seq (1) → condition fails
- preply.Status == PREACCEPTED (not PREACCEPTED_EQ)
- ⚠️ BUG: R1's reply with deps={I1} is IGNORED!
- recoveryInst remains {seq=1, deps=∅}

T6: R2 has majority (2/3), proceeds with recovery
- Uses recoveryInst: seq=1, deps=∅
- Commits I2 with deps=∅ (missing dependency on I1!)

T7: All replicas eventually see:
- I1 committed: seq=1, deps=∅
- I2 committed: seq=1, deps=∅ (NO dependency on I1!)
```

State Divergence

```
Execution order depends on local arrival/tie-breaking:

Replica R0: Executes I1 then I2 → final k = 2
Replica R1: Executes I1 then I2 → final k = 2
Replica R2: Executes I2 then I1 → final k = 1

❌ SAFETY VIOLATION: R2 has different final state!
```

Test Case

```go
func TestLostDependencyOnRecoveryUnderAsymmetricPreAcceptLoss(t *testing.T)
```

Test Output:
```
=== RUN TestLostDependencyOnRecoveryUnderAsymmetricPreAcceptLoss
lost_dependency_...test.go:269: replica 0: lost dependency on I1 for I2; got Deps[I1]=-1, want 0
lost_dependency
...test.go:269: replica 1: lost dependency on I1 for I2; got Deps[I1]=-1, want 0
lost_dependency
...test.go:269: replica 2: lost dependency on I1 for I2; got Deps[I1]=-1, want 0
lost_dependency
..._test.go:323: execution-order safety violation: final value for key 1 is 1, want 2 (values=[1 1 1])
--- FAIL: TestLostDependencyOnRecoveryUnderAsymmetricPreAcceptLoss (0.09s)
```

Trigger Conditions

  1. Cluster with N ≥ 3 replicas
  2. Two conflicting commands I1 and I2 on the same key
  3. Asymmetric message loss during PreAccept phase for I2
  4. One replica (R1) has a PREACCEPTED view of I2 with richer deps than the leader (R2)
  5. R2 initiates recovery and receives conflicting PREACCEPTED states
  6. R1's richer deps are discarded because they don't match R2's local view

Impact

  • Different replicas may execute conflicting commands in different orders
  • State machines diverge permanently
  • Violates the fundamental EPaxos guarantee of consistent execution order
  • This is a data corruption bug in production systems

Notes

This bug is distinct from Issue #27 (handleCommit not protecting EXECUTED instances) and Issue #28 (non-durable PreAccept). It specifically affects the recovery path when processing PrepareReply messages with conflicting PREACCEPTED states.

The EPaxos paper requires that during recovery, the recovering replica must adopt metadata that is a superset of all known information. The current implementation violates this requirement by ignoring PREACCEPTED replies with different (potentially richer) Seq/Deps values.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions