feat(table): add support for merge-on-read delete#721
Draft
alexandre-normand wants to merge 3 commits intoapache:mainfrom
Draft
feat(table): add support for merge-on-read delete#721alexandre-normand wants to merge 3 commits intoapache:mainfrom
alexandre-normand wants to merge 3 commits intoapache:mainfrom
Conversation
5079248 to
114fc57
Compare
| var PositionalDeleteSchema = NewSchema(0, | ||
| NestedField{ID: 2147483546, Type: PrimitiveTypes.String, Name: "file_path", Required: true}, | ||
| NestedField{ID: 2147483545, Type: PrimitiveTypes.Int32, Name: "pos", Required: true}, | ||
| NestedField{ID: 2147483545, Type: PrimitiveTypes.Int64, Name: "pos", Required: true}, |
Contributor
Author
There was a problem hiding this comment.
The spec says that pos is a long so I updated this to be Int64 rather than Int32.
a1d417f to
38d1aff
Compare
38d1aff to
c9e30c4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This adds support for merge-on-read deletes. It offers an alternative to the copy-on-write to generate position delete files instead of rewriting existing data files.
I'm not very confident in the elegance of my solution as I'm still new to the internals of iceberg-go but the high-level is:
RecordBatchwith the file Path and position before the original position is lost due to filtering.Testing
Integration tests were added to exercise the partitioned and unpartitioned paths and the data is such that it's meant to actually produce a position delete file rather than just go through the quick path that drops an entire file because all records are gone.
Indirect fixes
While working on this change and adding the testing for the partitioned table deletions, I realized that the manifest evaluation when the filter affected a field that was part of a partition spec was not built correctly. It needed to use similar code as what's done during scanning to build projections and build a manifest evaluator per partition id. This is fixed in this PR but this technically also applies to copy-on-write and overwrite paths so the fix goes beyond the scope of the
merge-on-read.