-
Notifications
You must be signed in to change notification settings - Fork 137
Open
Description
Summary
A safety violation can occur when a leader executes a command on the Fast Path before broadcasting the Commit message, then crashes. Recovery may assign different dependencies to the command, causing different replicas to execute commands in different orders, leading to state machine divergence.
Bug Type
| Attribute | Value |
|---|---|
| Type | Safety Violation (State Divergence) |
| Severity | 🔴 Critical |
| Threat Model | CFT (Compliant) |
Problematic Code
Problem 1: Race Condition Between Execution and Commit Broadcast
File: src/epaxos/epaxos.go, Lines 1050-1071
// handlePreAcceptReply function
if inst.lb.preAcceptOKs >= r.N/2 && inst.lb.allEqual && allCommitted && isInitialBallot(inst.ballot) {
// ① Status set to COMMITTED first
r.InstanceSpace[pareply.Replica][pareply.Instance].Status = epaxosproto.COMMITTED // Line 1053
r.updateCommitted(pareply.Replica)
// ... reply to clients ...
r.recordInstanceMetadata(inst)
r.sync()
// ② Commit message sent AFTER status change
r.bcastCommit(pareply.Replica, pareply.Instance, inst.Cmds, inst.Seq, inst.Deps) // Line 1071
}The executeCommands goroutine (line 429) may see COMMITTED status and execute the command before bcastCommit is called.
Problem 2: Recovery Can Recalculate Dependencies
File: src/epaxos/epaxos.go, Lines 1518-1522
// handlePrepareReply function
if conf, q, i := r.findPreAcceptConflicts(ir.cmds, preply.Replica, preply.Instance, ir.seq, ir.deps); conf {
if r.InstanceSpace[q][i].Status >= epaxosproto.COMMITTED {
// Found committed conflict! Restart Phase1 with NEW deps
r.startPhase1(preply.Replica, preply.Instance, inst.ballot, inst.lb.clientProposals, ir.cmds, len(ir.cmds))
return
}
}Problem 3: startPhase1 Recalculates Dependencies
File: src/epaxos/epaxos.go, Lines 831-837
func (r *Replica) startPhase1(...) {
seq := int32(0)
var deps [DS]int32
for q := 0; q < r.N; q++ {
deps[q] = -1
}
// ⚠️ Recalculates deps based on CURRENT conflict map!
seq, deps, _ = r.updateAttributes(cmds, seq, deps, replica, instance) // Line 837
}Problem 4: handleCommit Does Not Protect EXECUTED Instances
File: src/epaxos/epaxos.go, Lines 1279-1281
// handleCommit function
if inst != nil {
// ⚠️ NO CHECK for EXECUTED status!
inst.Seq = commit.Seq // Overwrites Seq
inst.Deps = commit.Deps // Overwrites Deps
inst.Status = epaxosproto.COMMITTED // Downgrades from EXECUTED!
}Trigger Conditions
All of the following must occur:
- Leader (R0) achieves Fast Path for command X with
deps=[] executeCommandsgoroutine executes X beforebcastCommit(race condition)- R0 crashes before or during
bcastCommit - A conflicting command Y is proposed and committed on other replicas
- Recovery for X discovers Y (COMMITTED) via
findPreAcceptConflicts startPhase1recalculates X's deps to include Y:deps=[Y]- X is committed with
deps=[Y] - R0 recovers and receives
Commit(X, deps=[Y]) handleCommitoverwrites X's deps (X was already EXECUTED withdeps=[])
Attack Scenario Timeline
T0: R0 proposes X (key=K, value=100)
Fast Path: all responders return deps=[]
T1: R0 sets X.Status = COMMITTED (line 1053)
T2: executeCommands goroutine sees COMMITTED
Executes X: state[K] = 100
X.Status = EXECUTED
T3: R0 crashes (before bcastCommit at line 1071)
Commit(X, deps=[]) message NOT sent
T4: R1 proposes Y (key=K, value=200)
Y sees X as PREACCEPTED
Y.deps = [X]
Y commits and executes
T5: R2 initiates recovery for X
TryPreAccept finds Y (COMMITTED)
startPhase1 recalculates: deps=[Y]
X committed with deps=[Y]
T6: R0 recovers
Receives Commit(X, deps=[Y])
handleCommit overwrites X.Deps = [Y]
X.Status = COMMITTED (downgraded from EXECUTED!)
State Divergence Result
R0 (Early Executor):
Execution order: X → Y
Final state: K = 200
R1 (Normal Executor):
X.deps = [Y], Y.deps = [X]
Cycle detected: X ↔ Y
Order by Seq: Y.Seq < X.Seq
Execution order: Y → X
Final state: K = 100
❌ SAFETY VIOLATION: State machines diverged!
R0: K = 200
R1: K = 100
Impact
- Different replicas may execute the same commands in different orders
- State machines diverge permanently
- Violates the fundamental EPaxos guarantee of consistent execution order
Metadata
Metadata
Assignees
Labels
No labels