Raft Consensus in Chorus

Introduction to Raft

Raft is a consensus algorithm designed as an alternative to Paxos. It was created to be more understandable than Paxos while providing equivalent safety and performance guarantees. In Chorus, Raft ensures that all nodes agree on the same sequence of operations, providing strong consistency guarantees for the distributed key-value store.

Raft Fundamentals

Core Concepts

Log Replication

Raft maintains a replicated log of operations across all nodes in the cluster. The log is the source of truth for the system state, and all nodes must have identical logs to maintain consistency.

Leader Election

Raft uses a leader-based approach where one node is elected as the leader and handles all client requests. Followers replicate the leader's log and redirect client requests to the leader.

Safety Properties

Raft guarantees several safety properties:

Election Safety: At most one leader can be elected per term
Leader Append-Only: A leader never overwrites or deletes entries in its log
Log Matching: If two logs contain an entry with the same index and term, they are identical up to that point
Leader Completeness: If a log entry is committed in a given term, it will be present in the logs of all future leaders

Raft Implementation in Chorus

Log Structure

Each log entry in Chorus contains:

Term: The term number when the entry was created
Index: The position in the log
Command: The actual operation (SET or DELETE)
Timestamp: When the entry was created

Command Application

Commands are applied to the Finite State Machine (FSM) in the following order:

SET Command: Stores or updates a key-value pair
DELETE Command: Removes a key-value pair
Snapshot Command: Creates a snapshot of the current state

Snapshot Management

Snapshot Creation

Triggered when log size exceeds SnapshotThreshold
Occurs at regular intervals defined by SnapshotInterval
Creates a point-in-time view of the entire key-value store

Snapshot Restoration

Used when a new node joins the cluster
Applied during node recovery after prolonged downtime
Reduces recovery time by avoiding full log replay

Configuration Parameters

Chorus provides several configurable Raft parameters:

Timing Parameters

HeartbeatTimeout: How often the leader sends heartbeats (default: 1s)
ElectionTimeout: How long followers wait before starting election (default: 1s)
CommitTimeout: Maximum time to wait for log commit (default: 50ms)
LeaderLeaseTimeout: Leader lease timeout (default: 500ms)

Log Parameters

MaxAppendEntries: Maximum entries per AppendEntries RPC (default: 64)
SnapshotInterval: Time between automatic snapshots (default: 30s)
SnapshotThreshold: Log entries before snapshot (default: 1024)

Cluster Membership

Static Configuration

In the current implementation, cluster membership is statically configured:

Nodes are defined in configuration files
Initial cluster bootstrap uses predefined node list
All nodes must know about each other

Bootstrap Process

The first node (node1) bootstraps the cluster
Subsequent nodes join by contacting existing nodes
Once joined, nodes participate in consensus

Performance Characteristics

Write Performance

Latency: Determined by majority replication time
Throughput: Limited by leader and network capacity
Consistency: Strong consistency guaranteed for all writes

Read Performance

Leader Reads: Always return latest committed data
Follower Reads: May return slightly stale data but faster
Availability: Reads available from any non-candidate node

Recovery Performance

Full Recovery: Requires replaying entire log
Snapshot Recovery: Much faster, uses latest snapshot
Incremental Catch-up: Followers catch up incrementally

Implementation Details

RPC Messages

Chorus implements the standard Raft RPCs:

RequestVote: Sent by candidates during elections
AppendEntries: Sent by leaders to replicate log entries
InstallSnapshot: Sent by leaders to transfer snapshots

Storage Interfaces

The implementation uses several storage interfaces:

LogStore: Persistent storage for Raft log
StableStore: Storage for stable state (current term, voted for)
SnapshotStore: Storage for snapshots
Transport: Network transport for RPC communication

Finite State Machine

The FSM in Chorus:

Applies commands to the key-value store
Handles snapshot creation and restoration
Provides thread-safe access to the store

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Raft Consensus in Chorus

Introduction to Raft

Raft Fundamentals

Core Concepts

Log Replication

Leader Election

Safety Properties

Raft Implementation in Chorus

Log Structure

Command Application

Snapshot Management

Snapshot Creation

Snapshot Restoration

Configuration Parameters

Timing Parameters

Log Parameters

Cluster Membership

Static Configuration

Bootstrap Process

Performance Characteristics

Write Performance

Read Performance

Recovery Performance

Implementation Details

RPC Messages

Storage Interfaces

Finite State Machine

FilesExpand file tree

raft.md

Latest commit

History

raft.md

File metadata and controls

Raft Consensus in Chorus

Introduction to Raft

Raft Fundamentals

Core Concepts

Log Replication

Leader Election

Safety Properties

Raft Implementation in Chorus

Log Structure

Command Application

Snapshot Management

Snapshot Creation

Snapshot Restoration

Configuration Parameters

Timing Parameters

Log Parameters

Cluster Membership

Static Configuration

Bootstrap Process

Performance Characteristics

Write Performance

Read Performance

Recovery Performance

Implementation Details

RPC Messages

Storage Interfaces

Finite State Machine