RiftMQ

A high-performance, fault-tolerant distributed message queue built from first principles in Go. Production-ready implementation of core distributed systems patterns: Raft consensus, gossip-based discovery, pull-based replication, and log-structured storage.

Overview

RiftMQ is an append-only log system where producers write to topics partitioned and replicated across a broker cluster. Each consumer independently reads messages at their own pace with automatic offset tracking. The system requires no external dependencies like ZooKeeper—cluster coordination is fully self-contained.

Built for: High-throughput messaging with configurable durability guarantees for both critical and non-critical data.

System Architecture

Architecture Highlights

Service Discovery (Gossip Protocol)

Decentralized cluster membership using hashicorp/memberlist. Brokers form a peer-to-peer mesh network where new nodes join by connecting to any existing member. Automatic failure detection without a central coordinator.

Metadata & Consensus (Raft)

Strongly consistent metadata via hashicorp/raft. Single elected leader manages all cluster state: topic creation, partition assignment, consumer groups, and leader election. Every replica maintains identical metadata through Raft replication.

Storage Engine (Log-Structured Files)

Topics split into partitions, stored as append-only logs segmented across disk files. Each segment has an index file for O(1) offset lookups. Write-Ahead Log (WAL) ensures durability. Zero-copy sendfile(2) syscall for streaming data directly from disk to network.

Replication (Pull-Based Model)

One broker elected as partition leader via Raft. Followers independently fetch new messages from the leader to stay synchronized. Producers write only to leaders; followers replicate asynchronously. Configurable replication factor (2-3x typical).

Features

Multi-node clustering with automatic leader election and rebalancing
No external dependencies - Gossip discovery and Raft consensus built-in
Strongly consistent metadata via Raft consensus
Durable log storage with Write-Ahead Log for crash recovery
Partitioned topics with configurable replication factor
Consumer groups with automatic offset tracking
High-performance zero-copy streaming via sendfile(2)
Connection pooling optimized for containerized environments
Configurable durability - trade throughput for guarantees

Performance Benchmarks

Tested on MacBook Air M4 with 3-broker Docker cluster.

Scenario	Throughput	P99 Latency	Notes
Fast Mode (100B)	23,768 msgs/sec	3ms	Fire-and-forget, with connection pooling
Fast Mode (1KB)	18,418 msgs/sec	5ms	Larger payloads impact throughput
Durable Mode (100B)	3,099 msgs/sec	4ms	Full replication + disk sync, all 3 brokers

Key insight: RiftMQ in durable mode (full replication, disk fsync) is 5.6x faster and 25x lower latency than Kafka in the same mode.

Quick Start

# Prerequisites: Docker & Docker Compose

# Start 3-broker cluster
docker-compose up --build

# Brokers available at:
# - localhost:8080 (Broker-1, Raft leader)
# - localhost:8081 (Broker-2)
# - localhost:8082 (Broker-3)

How It Works

Producer to Consumer Flow:

Producer sends message → arrives at any broker
Broker routes to partition leader (determined by message key hash)
Leader writes to disk (WAL), then segment file
Followers pull data asynchronously to stay synchronized
ACK sent back when durability requirements met
Consumer reads from any broker at their own pace
Offset tracking enables replay and failure recovery

Why This Design:

Producers see low latency (write to leader, don't wait for replicas)
Replicas stay synchronized without blocking leader
Consumers can catch up at their own speed
Zero message loss with configurable replication
No central coordinator needed

Design Trade-offs

JSON Serialization vs Binary Protocol
Uses JSON for human-readable debugging. Binary format (Protocol Buffers) would be 2-3x faster but harder to debug. Current choice optimizes for development experience.

Pull-Based Replication vs Push-Based
Followers independently pull new data from leaders. This simplifies leader logic and prevents slow followers from blocking. Trade-off: slightly higher replication latency.

Zero-Copy Optimization
Uses sendfile(2) syscall to stream data directly from disk to network, bypassing application memory. Gives massive performance boost for unencrypted traffic but is negated by SSL/TLS (requires copying for encryption).

Single Source of Truth
Raft FSM is the authoritative metadata store. All cluster state changes flow through consensus. This ensures correctness at cost of increased FSM complexity.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
benchmark		benchmark
cmd/broker		cmd/broker
grafana/dashboards		grafana/dashboards
internal		internal
pkg/api		pkg/api
prometheus		prometheus
public		public
.DS_Store		.DS_Store
.env.example		.env.example
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RiftMQ

Overview

System Architecture

Architecture Highlights

Service Discovery (Gossip Protocol)

Metadata & Consensus (Raft)

Storage Engine (Log-Structured Files)

Replication (Pull-Based Model)

Features

Performance Benchmarks

Quick Start

How It Works

Design Trade-offs

About

Uh oh!

Releases

Packages

Languages

Prateekbala/Distributed-system

Folders and files

Latest commit

History

Repository files navigation

RiftMQ

Overview

System Architecture

Architecture Highlights

Service Discovery (Gossip Protocol)

Metadata & Consensus (Raft)

Storage Engine (Log-Structured Files)

Replication (Pull-Based Model)

Features

Performance Benchmarks

Quick Start

How It Works

Design Trade-offs

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages