Skip to content

Latest commit

 

History

History
118 lines (87 loc) · 4.61 KB

File metadata and controls

118 lines (87 loc) · 4.61 KB

Raft Consensus in Chorus

Introduction to Raft

Raft is a consensus algorithm designed as an alternative to Paxos. It was created to be more understandable than Paxos while providing equivalent safety and performance guarantees. In Chorus, Raft ensures that all nodes agree on the same sequence of operations, providing strong consistency guarantees for the distributed key-value store.

Raft Fundamentals

Core Concepts

Log Replication

Raft maintains a replicated log of operations across all nodes in the cluster. The log is the source of truth for the system state, and all nodes must have identical logs to maintain consistency.

Leader Election

Raft uses a leader-based approach where one node is elected as the leader and handles all client requests. Followers replicate the leader's log and redirect client requests to the leader.

Safety Properties

Raft guarantees several safety properties:

  • Election Safety: At most one leader can be elected per term
  • Leader Append-Only: A leader never overwrites or deletes entries in its log
  • Log Matching: If two logs contain an entry with the same index and term, they are identical up to that point
  • Leader Completeness: If a log entry is committed in a given term, it will be present in the logs of all future leaders

Raft Implementation in Chorus

Log Structure

Each log entry in Chorus contains:

  • Term: The term number when the entry was created
  • Index: The position in the log
  • Command: The actual operation (SET or DELETE)
  • Timestamp: When the entry was created

Command Application

Commands are applied to the Finite State Machine (FSM) in the following order:

  1. SET Command: Stores or updates a key-value pair
  2. DELETE Command: Removes a key-value pair
  3. Snapshot Command: Creates a snapshot of the current state

Snapshot Management

Snapshot Creation

  • Triggered when log size exceeds SnapshotThreshold
  • Occurs at regular intervals defined by SnapshotInterval
  • Creates a point-in-time view of the entire key-value store

Snapshot Restoration

  • Used when a new node joins the cluster
  • Applied during node recovery after prolonged downtime
  • Reduces recovery time by avoiding full log replay

Configuration Parameters

Chorus provides several configurable Raft parameters:

Timing Parameters

  • HeartbeatTimeout: How often the leader sends heartbeats (default: 1s)
  • ElectionTimeout: How long followers wait before starting election (default: 1s)
  • CommitTimeout: Maximum time to wait for log commit (default: 50ms)
  • LeaderLeaseTimeout: Leader lease timeout (default: 500ms)

Log Parameters

  • MaxAppendEntries: Maximum entries per AppendEntries RPC (default: 64)
  • SnapshotInterval: Time between automatic snapshots (default: 30s)
  • SnapshotThreshold: Log entries before snapshot (default: 1024)

Cluster Membership

Static Configuration

In the current implementation, cluster membership is statically configured:

  • Nodes are defined in configuration files
  • Initial cluster bootstrap uses predefined node list
  • All nodes must know about each other

Bootstrap Process

  1. The first node (node1) bootstraps the cluster
  2. Subsequent nodes join by contacting existing nodes
  3. Once joined, nodes participate in consensus

Performance Characteristics

Write Performance

  • Latency: Determined by majority replication time
  • Throughput: Limited by leader and network capacity
  • Consistency: Strong consistency guaranteed for all writes

Read Performance

  • Leader Reads: Always return latest committed data
  • Follower Reads: May return slightly stale data but faster
  • Availability: Reads available from any non-candidate node

Recovery Performance

  • Full Recovery: Requires replaying entire log
  • Snapshot Recovery: Much faster, uses latest snapshot
  • Incremental Catch-up: Followers catch up incrementally

Implementation Details

RPC Messages

Chorus implements the standard Raft RPCs:

  • RequestVote: Sent by candidates during elections
  • AppendEntries: Sent by leaders to replicate log entries
  • InstallSnapshot: Sent by leaders to transfer snapshots

Storage Interfaces

The implementation uses several storage interfaces:

  • LogStore: Persistent storage for Raft log
  • StableStore: Storage for stable state (current term, voted for)
  • SnapshotStore: Storage for snapshots
  • Transport: Network transport for RPC communication

Finite State Machine

The FSM in Chorus:

  • Applies commands to the key-value store
  • Handles snapshot creation and restoration
  • Provides thread-safe access to the store