Skip to content

manfredcml/cdcflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cdcflow

CI License: MIT Rust

A Change Data Capture (CDC) pipeline in Rust. Captures row-level changes from databases and streams them to various destinations.

Features

  • Database sources: PostgreSQL and MySQL
  • Sink destinations: Stdout, Kafka, PostgreSQL, Apache Iceberg
  • Offset stores: SQLite, Memoryfor resumable streaming after restarts
  • Sync modes: CDC (append-only changelog) and Replication (live replica of source tables)
  • Admin server: REST API for job management - not completely implemented yet.

Architecture

┌────────────┐     ┌────────────┐     ┌──────────────┐
│   Source   │────▶│  Pipeline  │────▶│     Sink     │
│ PG / MySQL │     │ (batch +   │     │ Kafka / PG / │
│            │     │  flush)    │     │ Iceberg / Out│
└────────────┘     └─────┬──────┘     └──────────────┘
                         │
                  ┌──────▼──────┐
                  │ OffsetStore │
                  └─────────────┘

The pipeline reads events from a source, batches them by transaction boundaries, writes each batch to the sink, and checkpoints progress in the offset store. On restart, the pipeline resumes from the last checkpoint.

Requirements

  • Platform: Linux and macOS only
  • CMake: Required for building the bundled librdkafka (Kafka dependency)

Quick Start - Standalone Mode

The standalone mode runs a single pipeline without the admin server. This is ideal for local testing and development.

# Start local infrastructure (PostgreSQL, MySQL, Kafka, MinIO, Iceberg, Trino)
# You can optionally start just the components you need for testing a specific source/sink combo
cd example && docker compose up -d && cd ..

# Build
cargo build

# Run a pipeline
RUST_LOG=info ./target/debug/cdcflow job run --standalone -c example/configs/pg-to-stdout.json

Quick Start - Server Mode

The admin server provides centralized job management via a REST API. Workers connect to the admin server, pull their configuration, and report health via heartbeats — instead of reading a local config file.

# 1. Start the admin server
RUST_LOG=info ./target/debug/cdcflow admin start --port 8090

# 2. Create a job (registers config in the admin server)
RUST_LOG=info ./target/debug/cdcflow job create \
  --admin-url http://localhost:8090 \
  --name pg-stdout \
  --config example/configs/pg-to-stdout.json \
  --description "PostgreSQL CDC to stdout"

# 3. Run a worker that pulls config from the admin server
RUST_LOG=info ./target/debug/cdcflow job run \
  --admin-url http://localhost:8090 \
  --job pg-stdout \
  --http-port 9091

# 4. List jobs / workers
RUST_LOG=info ./target/debug/cdcflow job list --admin-url http://localhost:8090

Example Configs

The example/configs/ directory contains example configuration files for various source/sink combinations. You can use these as templates for your own pipelines.

Testing

# Unit tests
cargo test

# Integration tests (requires Docker)
cargo test -- --ignored

License

This project is licensed under the MIT License.

About

Change Data Capture (CDC) pipeline in Rust

Resources

License

Stars

Watchers

Forks

Packages