Iceberg Provider

This is an incubator project for a new provider in the Transferia ecosystem. It's part of the Transferia project.

Overview

Iceberg is a provider implementation that handles data processing and transformation tasks. It's designed to be integrated into the Transferia ecosystem as a new data processing provider.

Prerequisites

Go 1.23 or higher
Docker (for running tests with testcontainers)
Make

Quick Start

Clone the repository:

git clone https://github.com/transferia/iceberg.git
cd iceberg

Install dependencies:

go mod download

Build the project:

make build

Testing

Run the test suite:

make test

For detailed test reports:

make run-tests

Test reports will be generated in the reports/ directory.

Development

The project uses standard Go tooling and Make for common tasks:

make clean - Remove build artifacts
make build - Build the project
make test - Run tests
make run-tests - Run tests with detailed reporting

Project Structure

cmd/ - Main application entry points, it's custom main file same as in transfer, but with extra plugin
reports/ - Test reports
binaries/ - Compiled binaries
doc/ - Documentation, including design documents
...rest - plugin code base

Key Features

Iceberg Table Reading

The Iceberg Provider implements a robust Table Reading mechanism that:

Provides efficient data access through optimized manifest processing
Ensures data consistency through snapshot-based reading
Implements advanced optimization techniques like partition pruning and column projection

For more details, see the Iceberg Table Reading Design Document.

Snapshot Sink

The Iceberg Provider implements a powerful Snapshot Sink mechanism that:

Efficiently transforms incoming data into Parquet files
Tracks files generated by each worker
Coordinates file registration using a central coordinator
Atomically commits all files to the target table in a single transaction

For more details, see the Snapshot Sink Design Document.

Streaming Sink

The Iceberg Provider also implements a Streaming Sink mechanism that:

Processes data in real-time as it arrives
Maintains continuous data ingestion with minimal latency
Provides exactly-once semantics for data delivery
Supports automatic schema evolution and data type mapping

Note: It's for append-only sources, not for CDC

CDC Replication Sink (NEW)

Full Change Data Capture replication from PostgreSQL to Iceberg v2 tables using iceberg-go — entirely in Go, no JVM required.

What works:

INSERT, UPDATE, DELETE replication via Iceberg v2 equality deletes (merge-on-read)
Snapshot + incremental replication (WAL-based CDC)
Automatic table creation with schema inference from source
PK-based row deduplication within commit batches
Time-based flush with configurable commit interval

Benchmark results (Apple M1 Pro, local MinIO + REST catalog):

Profile	Duration	Rows	Avg Rate	Lag	Data Loss
InsertOnly (1K→10K ramp)	5 min	1.35M	4,500 rows/s	~3s	0

Key numbers:

6,400 rows/sec peak write throughput
3 second steady-state replication lag
Zero data loss — Iceberg row count matches PG row count exactly after drain

For details, see benchmark README and equality delete performance analysis.

Contributing

This project is part of the Transferia ecosystem and follows its contribution guidelines. Please refer to the main Transferia repository for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
cmd/trcli		cmd/trcli
demo		demo
doc		doc
logger		logger
recipe		recipe
tests		tests
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
arrow_conversion.go		arrow_conversion.go
destination_model.go		destination_model.go
go.mod		go.mod
go.sum		go.sum
provider.go		provider.go
recipe.go		recipe.go
replication_arrow.go		replication_arrow.go
replication_buffer.go		replication_buffer.go
s3_writer.go		s3_writer.go
sink_replication.go		sink_replication.go
sink_replication_test.go		sink_replication_test.go
sink_snapshot.go		sink_snapshot.go
sink_streaming.go		sink_streaming.go
sink_streaming_test.go		sink_streaming_test.go
source_model.go		source_model.go
storage.go		storage.go
storage_test.go		storage_test.go
typesystem.go		typesystem.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Iceberg Provider

Overview

Prerequisites

Quick Start

Testing

Development

Project Structure

Key Features

Iceberg Table Reading

Snapshot Sink

Streaming Sink

CDC Replication Sink (NEW)

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Iceberg Provider

Overview

Prerequisites

Quick Start

Testing

Development

Project Structure

Key Features

Iceberg Table Reading

Snapshot Sink

Streaming Sink

CDC Replication Sink (NEW)

Contributing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 1

Languages

Packages