This is an incubator project for a new provider in the Transferia ecosystem. It's part of the Transferia project.
Iceberg is a provider implementation that handles data processing and transformation tasks. It's designed to be integrated into the Transferia ecosystem as a new data processing provider.
- Go 1.23 or higher
- Docker (for running tests with testcontainers)
- Make
- Clone the repository:
git clone https://github.com/transferia/iceberg.git
cd iceberg- Install dependencies:
go mod download- Build the project:
make buildRun the test suite:
make testFor detailed test reports:
make run-testsTest reports will be generated in the reports/ directory.
The project uses standard Go tooling and Make for common tasks:
make clean- Remove build artifactsmake build- Build the projectmake test- Run testsmake run-tests- Run tests with detailed reporting
cmd/- Main application entry points, it's custom main file same as in transfer, but with extra pluginreports/- Test reportsbinaries/- Compiled binariesdoc/- Documentation, including design documents...rest- plugin code base
The Iceberg Provider implements a robust Table Reading mechanism that:
- Provides efficient data access through optimized manifest processing
- Ensures data consistency through snapshot-based reading
- Implements advanced optimization techniques like partition pruning and column projection
For more details, see the Iceberg Table Reading Design Document.
The Iceberg Provider implements a powerful Snapshot Sink mechanism that:
- Efficiently transforms incoming data into Parquet files
- Tracks files generated by each worker
- Coordinates file registration using a central coordinator
- Atomically commits all files to the target table in a single transaction
For more details, see the Snapshot Sink Design Document.
The Iceberg Provider also implements a Streaming Sink mechanism that:
- Processes data in real-time as it arrives
- Maintains continuous data ingestion with minimal latency
- Provides exactly-once semantics for data delivery
- Supports automatic schema evolution and data type mapping
Note: It's for append-only sources, not for CDC
Full Change Data Capture replication from PostgreSQL to Iceberg v2 tables using iceberg-go — entirely in Go, no JVM required.
What works:
- INSERT, UPDATE, DELETE replication via Iceberg v2 equality deletes (merge-on-read)
- Snapshot + incremental replication (WAL-based CDC)
- Automatic table creation with schema inference from source
- PK-based row deduplication within commit batches
- Time-based flush with configurable commit interval
Benchmark results (Apple M1 Pro, local MinIO + REST catalog):
| Profile | Duration | Rows | Avg Rate | Lag | Data Loss |
|---|---|---|---|---|---|
| InsertOnly (1K→10K ramp) | 5 min | 1.35M | 4,500 rows/s | ~3s | 0 |
Key numbers:
- 6,400 rows/sec peak write throughput
- 3 second steady-state replication lag
- Zero data loss — Iceberg row count matches PG row count exactly after drain
For details, see benchmark README and equality delete performance analysis.
This project is part of the Transferia ecosystem and follows its contribution guidelines. Please refer to the main Transferia repository for more information.