Currently the iceberg plugin only supports snapshot mode — full table copies. For replication (streaming CDC from Postgres WAL), we need to handle row-level mutations: INSERTs, UPDATEs, and DELETEs.
What works today
- Snapshot mode: bulk copy all rows into Iceberg tables via
SinkSnapshot
- Streaming append-only mode:
SinkStreaming for insert-only sources like Kafka
What's missing
Replication from Postgres produces a mix of INSERT, UPDATE, and DELETE operations. Efficiently applying these to Iceberg requires:
- Equality deletes — write small delete files keyed on PK columns instead of rewriting entire data files
- RowDelta API — commit new data files + delete files in a single atomic snapshot (so an UPDATE = delete old row + insert new row happens atomically)
- Equality delete reading — so downstream consumers see correct query results
These are upstream gaps in apache/iceberg-go:
- #602 — RowDelta API
- #784 — Multi-table commit (done)
Rough plan
Once iceberg-go has RowDelta + equality deletes, the replication sink would look something like:
for _, change := range cdcBatch {
switch change.Kind {
case INSERT:
rd.AddRows(writeDataFile(change.After))
case DELETE:
rd.AddDeletes(writeEqDeleteFile(change.Before, pkFields))
case UPDATE:
rd.AddDeletes(writeEqDeleteFile(change.Before, pkFields))
rd.AddRows(writeDataFile(change.After))
}
}
Combined with multi-table commit, all tables in a WAL batch can be committed atomically — consistent cross-table point-in-time view.
Blocked on
- apache/iceberg-go RowDelta API (#602)
- apache/iceberg-go equality delete write + read support
Currently the iceberg plugin only supports snapshot mode — full table copies. For replication (streaming CDC from Postgres WAL), we need to handle row-level mutations: INSERTs, UPDATEs, and DELETEs.
What works today
SinkSnapshotSinkStreamingfor insert-only sources like KafkaWhat's missing
Replication from Postgres produces a mix of INSERT, UPDATE, and DELETE operations. Efficiently applying these to Iceberg requires:
These are upstream gaps in apache/iceberg-go:
Rough plan
Once iceberg-go has RowDelta + equality deletes, the replication sink would look something like:
Combined with multi-table commit, all tables in a WAL batch can be committed atomically — consistent cross-table point-in-time view.
Blocked on