feat: Add ReverseOrderHoodieRecordPayload and configurable ordering behavior#17928
Conversation
f93de26 to
6c0d83f
Compare
6d7a2ff to
1c86d71
Compare
nsivabalan
left a comment
There was a problem hiding this comment.
can we split this into two patches.
one for ReverseOrderHoodieRecordPayload
and another one for modifying DefualtHoodieRecordPayload
for now, I will review ReverseOrderHoodieRecordPayload in this patch.
Sure, will do that Siva! |
|
is this ready for review again? @suryaprasanna |
|
hey @suryaprasanna : can you raise another patch for the SENTINEL fix in DefaultHoodieRecordPayload |
@nsivabalan sure, will do that. |
yihua
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
[Update review] Good progress on removing the SENTINEL indirection from DefaultHoodieRecordPayload. The change is functionally safe — in the merger path, reference equality (updatedRecord == previousAvroData) preserves the same no-rewrite behavior that SENTINEL provided. Just a couple of minor notes inline.
yihua
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Thanks for addressing the feedback. The revert to returning SENTINEL from combineAndGetUpdateValue is clean and the tests are properly updated (including fixing the assertEquals argument ordering from my prior nit). One concern with the new canProduceSentinel() override — see inline comment.
yihua
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Greptile Summary: This PR adds a ReverseOrderHoodieRecordPayload class that keeps the oldest record for a key (by reversing the ordering field comparison), and introduces a new configurable property hoodie.payload.update.on.same.ordering.field (default "true") that controls whether two records with the same ordering value should result in an update. It also refactors DefaultHoodieRecordPayload.combineAndGetUpdateValue to return Option.of(SENTINEL) instead of Option.of(currentValue) when the incoming record should not replace the persisted one — aligning with the SENTINEL pattern already understood by HoodieAvroRecordMerger.
Key changes:
- New class:
ReverseOrderHoodieRecordPayload— overridespreCombineto keep lowest ordering value and overridescompareOrderingValto reverse the comparison direction. DefaultHoodieRecordPayload:combineAndGetUpdateValuenow returnsOption.of(SENTINEL)(instead of the oldcurrentValuereference) when the incoming record is not newer;canProduceSentinel()overridden totrueto signal this to callers;compareOrderingValextracted as a protected, overridable method.HoodiePayloadProps: New property keyhoodie.payload.update.on.same.ordering.fieldand its default"true"(preserves prior<=semantics).- Tests: Existing test assertions updated for the SENTINEL change; new parameterized tests for the same-ordering-field config; new
TestReverseOrderHoodieRecordPayloadsuite.
Greptile Confidence Score: 4/5
Safe to merge with minor cleanup; no logic bugs or breaking changes detected.
The core logic is sound: the SENTINEL return correctly aligns with the existing HoodieAvroRecordMerger sentinel check; the reverse-order comparison is correctly inverted for both preCombine and compareOrderingVal; the new property default preserves prior <= semantics for backward compatibility. All issues found are P2 style/documentation nits.
TestReverseOrderHoodieRecordPayload.java — inconsistent orderingVal vs ts field values in test constructor args could mask future ordering bugs.
Greptile: yihua#36 (review)
88463f1 to
23d726c
Compare
yihua
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Style & Readability Review — A few naming and clarity suggestions: Javadoc could be more specific about which method is overridden, a test variable name could be clearer, and a test method name is somewhat long.
yihua
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
LGTM — clean implementation of reverse-order payload semantics. The compareOrderingVal override correctly inverts the comparison logic, and preCombine properly selects the record with the lowest ordering value. Traced through all key paths (equal ordering, delete records, null persisted values) and the behavior is consistent with the parent class patterns.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #17928 +/- ##
=============================================
+ Coverage 44.88% 68.84% +23.96%
- Complexity 8494 28226 +19732
=============================================
Files 1196 2461 +1265
Lines 62037 135271 +73234
Branches 6682 16398 +9716
=============================================
+ Hits 27844 93131 +65287
- Misses 31155 34766 +3611
- Partials 3038 7374 +4336
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
This PR addresses the need for reverse-order payload merging in Hudi, where the oldest record (based on ordering field) should be preserved instead of the latest. It also adds configurability to control behavior when ordering values are equal, and optimizes the default payload to avoid unnecessary rewrites. What users gain: New ReverseOrderHoodieRecordPayload class for use cases requiring oldest-record-wins semantics Configuration option to control update behavior when ordering field values are equal Performance improvement by avoiding unnecessary record rewrites when incoming records are older
Describe the issue this Pull Request addresses
This PR addresses the need for reverse-order payload merging in Hudi, where the oldest record (based on ordering field) should be preserved instead of the latest. It also adds configurability to control behavior when ordering values are equal, and optimizes the default payload to avoid unnecessary rewrites.
Summary and Changelog
What users gain:
ReverseOrderHoodieRecordPayloadclass for use cases requiring oldest-record-wins semanticsChanges:
ReverseOrderHoodieRecordPayloadclass that keeps the oldest record based on ordering fieldhoodie.payload.update.on.same.ordering.fieldconfig property toHoodiePayloadProps(default: true)DefaultHoodieRecordPayload.combineAndGetUpdateValue()to return SENTINEL instead of currentValue when incoming record is oldercompareOrderingVal()method inDefaultHoodieRecordPayloadfor extensibilityOverwriteWithLatestAvroPayload.preCombine()to properly compare ordering values instead of always returningthiscanProduceSentinel()method toDefaultHoodieRecordPayloadImpact
Public API changes:
org.apache.hudi.common.model.ReverseOrderHoodieRecordPayloadhoodie.payload.update.on.same.ordering.field(default: true, maintains backward compatibility)DefaultHoodieRecordPayload.compareOrderingVal()Behavior changes:
DefaultHoodieRecordPayload.combineAndGetUpdateValue()now returns SENTINEL for older records (avoids rewriting with new commit time)OverwriteWithLatestAvroPayload.preCombine()now properly compares ordering valuesPerformance impact:
Risk Level
Low
The changes maintain backward compatibility by:
hoodie.payload.update.on.same.ordering.field=true) preserves existing behaviorReverseOrderHoodieRecordPayloadis a new opt-in classVerification:
Documentation Update
Config documentation:
hoodie.payload.update.on.same.ordering.fieldneeds to be documented in Hudi configuration referenceFeature documentation:
ReverseOrderHoodieRecordPayloadshould be documented as an alternative payload option for reverse-ordering use casesContributor's checklist