Skip to content

feat(table): add support for merge-on-read delete#721

Draft
alexandre-normand wants to merge 3 commits intoapache:mainfrom
alexandre-normand:alex.normand/merge-on-read-delete
Draft

feat(table): add support for merge-on-read delete#721
alexandre-normand wants to merge 3 commits intoapache:mainfrom
alexandre-normand:alex.normand/merge-on-read-delete

Conversation

@alexandre-normand
Copy link
Contributor

@alexandre-normand alexandre-normand commented Feb 12, 2026

delete this

This adds support for merge-on-read deletes. It offers an alternative to the copy-on-write to generate position delete files instead of rewriting existing data files.

I'm not very confident in the elegance of my solution as I'm still new to the internals of iceberg-go but the high-level is:

  • Reuse the classification code from the existing delete implementation to get the list of files of dropped files vs files with partial deletes
  • Reuse the arrow scanning facilities to filter records from the data files with partial deletes and emit position delete records with file path and position.
    • This is done by reusing the pipeline code and function and making the first stage in the pipeline one to enrich the RecordBatch with the file Path and position before the original position is lost due to filtering.
    • After filtering, the RecordBatch is projected to the position delete schema (i.e. the original schema fields are dropped)
  • Once we have filtered PositionDelete records that need to be emitted, we reuse the record to file writing to generate position delete files.

Testing

Integration tests were added to exercise the partitioned and unpartitioned paths and the data is such that it's meant to actually produce a position delete file rather than just go through the quick path that drops an entire file because all records are gone.

Indirect fixes

While working on this change and adding the testing for the partitioned table deletions, I realized that the manifest evaluation when the filter affected a field that was part of a partition spec was not built correctly. It needed to use similar code as what's done during scanning to build projections and build a manifest evaluator per partition id. This is fixed in this PR but this technically also applies to copy-on-write and overwrite paths so the fix goes beyond the scope of the merge-on-read.

@alexandre-normand alexandre-normand force-pushed the alex.normand/merge-on-read-delete branch 5 times, most recently from 5079248 to 114fc57 Compare February 12, 2026 23:42
var PositionalDeleteSchema = NewSchema(0,
NestedField{ID: 2147483546, Type: PrimitiveTypes.String, Name: "file_path", Required: true},
NestedField{ID: 2147483545, Type: PrimitiveTypes.Int32, Name: "pos", Required: true},
NestedField{ID: 2147483545, Type: PrimitiveTypes.Int64, Name: "pos", Required: true},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec says that pos is a long so I updated this to be Int64 rather than Int32.

@alexandre-normand alexandre-normand force-pushed the alex.normand/merge-on-read-delete branch 3 times, most recently from a1d417f to 38d1aff Compare February 13, 2026 00:26
@alexandre-normand alexandre-normand force-pushed the alex.normand/merge-on-read-delete branch from 38d1aff to c9e30c4 Compare February 13, 2026 00:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant