fix: use KeyBasedFileGroupRecordBuffer for non-Parquet base files in MOR merge#18588
fix: use KeyBasedFileGroupRecordBuffer for non-Parquet base files in MOR merge#18588yihua wants to merge 1 commit into
Conversation
3d0299e to
98206d6
Compare
hudi-agent
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Thanks for the fix! This narrows PositionBasedFileGroupRecordBuffer selection to Parquet base files only, matching the existing isParquetBaseFile() gate that HoodieFileGroupReader uses for shouldMergeUseRecordPosition. The Lance MOR delete test is un-skipped accordingly. No issues flagged from this automated pass — a Hudi committer or PMC member can take it from here for a final review.
cc @yihua
hudi-agent
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Thanks for the fix! This narrows PositionBasedFileGroupRecordBuffer selection to Parquet base files, matching the existing shouldMergeUseRecordPosition gate in HoodieFileGroupReader so non-Parquet formats (Lance, ORC) consistently use KeyBasedFileGroupRecordBuffer. No issues flagged from this automated pass — a Hudi committer or PMC member can take it from here for a final review.
cc @yihua
a1bcc94 to
2a2516f
Compare
hudi-agent
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Thanks for the fix! This narrows PositionBasedFileGroupRecordBuffer selection to Parquet base files, aligning with the merge strategy HoodieFileGroupReader already enforces via shouldMergeUseRecordPosition, and unblocks the previously-skipped MOR DELETE/INSERT cases in TestLanceDataSource. No issues flagged from this automated pass — a Hudi committer or PMC member can take it from here for a final review.
cc @yihua
|
@yihua can you look at test failures on ci |
2a2516f to
9854f52
Compare
hudi-agent
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Thanks for the fix! This PR removes the MOR-specific workarounds in the Lance test now that deletes merge correctly via KeyBasedFileGroupRecordBuffer for non-Parquet base files. No issues flagged from this automated pass — a Hudi committer or PMC member can take it from here for a final review.
cc @yihua
…18558) Remove the COW-only guard on the DELETE test in TestLanceDataSource so that deletes are verified for both COW and MOR table types. Closes apache#18558
9854f52 to
2c413fd
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #18588 +/- ##
=============================================
+ Coverage 45.00% 68.11% +23.10%
- Complexity 8571 29089 +20518
=============================================
Files 1202 2518 +1316
Lines 62816 141221 +78405
Branches 6811 17531 +10720
=============================================
+ Hits 28271 96188 +67917
- Misses 31428 37139 +5711
- Partials 3117 7894 +4777
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
hudi-agent
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Thanks for the fix! This PR aligns the buffer selection with HoodieFileGroupReader's merge strategy gating so non-Parquet base files (Lance/ORC) use KeyBasedFileGroupRecordBuffer, and the test changes simply remove the now-unnecessary CoW-only workarounds. No issues flagged from this automated pass — a Hudi committer or PMC member can take it from here for a final review.
cc @yihua
Describe the issue this Pull Request addresses
Closes #18558
Deletes are not merged correctly with Lance format on MOR table type.
Summary and Changelog
DefaultFileGroupRecordBufferLoaderselectsPositionBasedFileGroupRecordBufferwhenuseRecordPositionis true, regardless of the base file format. However,HoodieFileGroupReadersetsshouldMergeUseRecordPosition = falsefor non-Parquet files via theisParquetBaseFile()gate. This mismatch causes the buffer to enter a fallback hybrid-strategy mode that fails to merge delete blocks for Lance base files.Added the
isParquetBaseFile()check to the buffer selection so non-Parquet formats (Lance, ORC) useKeyBasedFileGroupRecordBufferdirectly, matching the merge strategy set byHoodieFileGroupReader.Impact
Non-Parquet base file formats in MOR tables now correctly merge deletes. No impact on Parquet tables.
Risk Level
Low — narrows
PositionBasedFileGroupRecordBufferselection to Parquet only, which is the only format that supports row-index-based positional merging today.Documentation Update
None.
Contributor's checklist