Skip to content

CDC replication support (DML for Iceberg with INSERT/UPDATE/DELETE) #5

@laskoviymishka

Description

@laskoviymishka

Currently the iceberg plugin only supports snapshot mode — full table copies. For replication (streaming CDC from Postgres WAL), we need to handle row-level mutations: INSERTs, UPDATEs, and DELETEs.

What works today

  • Snapshot mode: bulk copy all rows into Iceberg tables via SinkSnapshot
  • Streaming append-only mode: SinkStreaming for insert-only sources like Kafka

What's missing

Replication from Postgres produces a mix of INSERT, UPDATE, and DELETE operations. Efficiently applying these to Iceberg requires:

  1. Equality deletes — write small delete files keyed on PK columns instead of rewriting entire data files
  2. RowDelta API — commit new data files + delete files in a single atomic snapshot (so an UPDATE = delete old row + insert new row happens atomically)
  3. Equality delete reading — so downstream consumers see correct query results

These are upstream gaps in apache/iceberg-go:

  • #602 — RowDelta API
  • #784 — Multi-table commit (done)

Rough plan

Once iceberg-go has RowDelta + equality deletes, the replication sink would look something like:

for _, change := range cdcBatch {
    switch change.Kind {
    case INSERT:
        rd.AddRows(writeDataFile(change.After))
    case DELETE:
        rd.AddDeletes(writeEqDeleteFile(change.Before, pkFields))
    case UPDATE:
        rd.AddDeletes(writeEqDeleteFile(change.Before, pkFields))
        rd.AddRows(writeDataFile(change.After))
    }
}

Combined with multi-table commit, all tables in a WAL batch can be committed atomically — consistent cross-table point-in-time view.

Blocked on

  • apache/iceberg-go RowDelta API (#602)
  • apache/iceberg-go equality delete write + read support

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions