Skip to content

feat: Add ReverseOrderHoodieRecordPayload and configurable ordering behavior#17928

Merged
nsivabalan merged 1 commit into
apache:masterfrom
suryaprasanna:surya-dev-01
Apr 16, 2026
Merged

feat: Add ReverseOrderHoodieRecordPayload and configurable ordering behavior#17928
nsivabalan merged 1 commit into
apache:masterfrom
suryaprasanna:surya-dev-01

Conversation

@suryaprasanna
Copy link
Copy Markdown
Contributor

@suryaprasanna suryaprasanna commented Jan 17, 2026

Describe the issue this Pull Request addresses

This PR addresses the need for reverse-order payload merging in Hudi, where the oldest record (based on ordering field) should be preserved instead of the latest. It also adds configurability to control behavior when ordering values are equal, and optimizes the default payload to avoid unnecessary rewrites.

Summary and Changelog

What users gain:

  • New ReverseOrderHoodieRecordPayload class for use cases requiring oldest-record-wins semantics
  • Configuration option to control update behavior when ordering field values are equal
  • Performance improvement by avoiding unnecessary record rewrites when incoming records are older

Changes:

  • Added ReverseOrderHoodieRecordPayload class that keeps the oldest record based on ordering field
  • Added hoodie.payload.update.on.same.ordering.field config property to HoodiePayloadProps (default: true)
  • Enhanced DefaultHoodieRecordPayload.combineAndGetUpdateValue() to return SENTINEL instead of currentValue when incoming record is older
  • Extracted compareOrderingVal() method in DefaultHoodieRecordPayload for extensibility
  • Fixed OverwriteWithLatestAvroPayload.preCombine() to properly compare ordering values instead of always returning this
  • Added canProduceSentinel() method to DefaultHoodieRecordPayload

Impact

Public API changes:

  • New payload class: org.apache.hudi.common.model.ReverseOrderHoodieRecordPayload
  • New config: hoodie.payload.update.on.same.ordering.field (default: true, maintains backward compatibility)
  • New protected method: DefaultHoodieRecordPayload.compareOrderingVal()

Behavior changes:

  • DefaultHoodieRecordPayload.combineAndGetUpdateValue() now returns SENTINEL for older records (avoids rewriting with new commit time)
  • OverwriteWithLatestAvroPayload.preCombine() now properly compares ordering values

Performance impact:

  • Positive: Reduced unnecessary rewrites when older records arrive

Risk Level

Low

The changes maintain backward compatibility by:

  • Default config value (hoodie.payload.update.on.same.ordering.field=true) preserves existing behavior
  • ReverseOrderHoodieRecordPayload is a new opt-in class
  • SENTINEL optimization is transparent to users

Verification:

  • Comprehensive unit tests added for new functionality
  • Existing tests updated and passing
  • No breaking changes to existing APIs

Documentation Update

Config documentation:

  • New config hoodie.payload.update.on.same.ordering.field needs to be documented in Hudi configuration reference

Feature documentation:

  • ReverseOrderHoodieRecordPayload should be documented as an alternative payload option for reverse-ordering use cases

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@suryaprasanna suryaprasanna changed the title [UBER] Include ReverseOrderHoodieRecordPayload and enhance DefaultHoodieRecordPayload class feat: Add ReverseOrderHoodieRecordPayload and enhance ordering control in DefaultHoodieRecordPayload Jan 17, 2026
@suryaprasanna suryaprasanna changed the title feat: Add ReverseOrderHoodieRecordPayload and enhance ordering control in DefaultHoodieRecordPayload feat: Add ReverseOrderHoodieRecordPayload and configurable ordering behavior Jan 17, 2026
@github-actions github-actions Bot added the size:L PR with lines of changes in (300, 1000] label Jan 17, 2026
Copy link
Copy Markdown
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we split this into two patches.
one for ReverseOrderHoodieRecordPayload
and another one for modifying DefualtHoodieRecordPayload

for now, I will review ReverseOrderHoodieRecordPayload in this patch.

@suryaprasanna
Copy link
Copy Markdown
Contributor Author

can we split this into two patches.
one for ReverseOrderHoodieRecordPayload
and another one for modifying DefualtHoodieRecordPayload

for now, I will review ReverseOrderHoodieRecordPayload in this patch.

Sure, will do that Siva!

@apache apache deleted a comment from hudi-bot Feb 10, 2026
@nsivabalan
Copy link
Copy Markdown
Contributor

is this ready for review again? @suryaprasanna

@github-actions github-actions Bot added size:M PR with lines of changes in (100, 300] and removed size:L PR with lines of changes in (300, 1000] labels Mar 27, 2026
@nsivabalan
Copy link
Copy Markdown
Contributor

hey @suryaprasanna : can you raise another patch for the SENTINEL fix in DefaultHoodieRecordPayload

@suryaprasanna
Copy link
Copy Markdown
Contributor Author

hey @suryaprasanna : can you raise another patch for the SENTINEL fix in DefaultHoodieRecordPayload

@nsivabalan sure, will do that.

@github-actions github-actions Bot added size:L PR with lines of changes in (300, 1000] and removed size:M PR with lines of changes in (100, 300] labels Mar 27, 2026
Copy link
Copy Markdown
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

[Update review] Good progress on removing the SENTINEL indirection from DefaultHoodieRecordPayload. The change is functionally safe — in the merger path, reference equality (updatedRecord == previousAvroData) preserves the same no-rewrite behavior that SENTINEL provided. Just a couple of minor notes inline.

Copy link
Copy Markdown
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for addressing the feedback. The revert to returning SENTINEL from combineAndGetUpdateValue is clean and the tests are properly updated (including fixing the assertEquals argument ordering from my prior nit). One concern with the new canProduceSentinel() override — see inline comment.

Copy link
Copy Markdown
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Greptile Summary: This PR adds a ReverseOrderHoodieRecordPayload class that keeps the oldest record for a key (by reversing the ordering field comparison), and introduces a new configurable property hoodie.payload.update.on.same.ordering.field (default "true") that controls whether two records with the same ordering value should result in an update. It also refactors DefaultHoodieRecordPayload.combineAndGetUpdateValue to return Option.of(SENTINEL) instead of Option.of(currentValue) when the incoming record should not replace the persisted one — aligning with the SENTINEL pattern already understood by HoodieAvroRecordMerger.

Key changes:

  • New class: ReverseOrderHoodieRecordPayload — overrides preCombine to keep lowest ordering value and overrides compareOrderingVal to reverse the comparison direction.
  • DefaultHoodieRecordPayload: combineAndGetUpdateValue now returns Option.of(SENTINEL) (instead of the old currentValue reference) when the incoming record is not newer; canProduceSentinel() overridden to true to signal this to callers; compareOrderingVal extracted as a protected, overridable method.
  • HoodiePayloadProps: New property key hoodie.payload.update.on.same.ordering.field and its default "true" (preserves prior <= semantics).
  • Tests: Existing test assertions updated for the SENTINEL change; new parameterized tests for the same-ordering-field config; new TestReverseOrderHoodieRecordPayload suite.

Greptile Confidence Score: 4/5
Safe to merge with minor cleanup; no logic bugs or breaking changes detected.

The core logic is sound: the SENTINEL return correctly aligns with the existing HoodieAvroRecordMerger sentinel check; the reverse-order comparison is correctly inverted for both preCombine and compareOrderingVal; the new property default preserves prior <= semantics for backward compatibility. All issues found are P2 style/documentation nits.

TestReverseOrderHoodieRecordPayload.java — inconsistent orderingVal vs ts field values in test constructor args could mask future ordering bugs.

Greptile: yihua#36 (review)

Copy link
Copy Markdown
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Style & Readability Review — A few naming and clarity suggestions: Javadoc could be more specific about which method is overridden, a test variable name could be clearer, and a test method name is somewhat long.

Copy link
Copy Markdown
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

LGTM — clean implementation of reverse-order payload semantics. The compareOrderingVal override correctly inverts the comparison logic, and preCombine properly selects the record with the lowest ordering value. Traced through all key paths (equal ordering, delete records, null persisted values) and the behavior is consistent with the parent class patterns.

@github-actions github-actions Bot added size:M PR with lines of changes in (100, 300] and removed size:L PR with lines of changes in (300, 1000] labels Apr 15, 2026
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 46.66667% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.84%. Comparing base (12b3a06) to head (23d726c).

Files with missing lines Patch % Lines
.../common/model/ReverseOrderHoodieRecordPayload.java 46.66% 5 Missing and 3 partials ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             master   #17928       +/-   ##
=============================================
+ Coverage     44.88%   68.84%   +23.96%     
- Complexity     8494    28226    +19732     
=============================================
  Files          1196     2461     +1265     
  Lines         62037   135271    +73234     
  Branches       6682    16398     +9716     
=============================================
+ Hits          27844    93131    +65287     
- Misses        31155    34766     +3611     
- Partials       3038     7374     +4336     
Flag Coverage Δ
common-and-other-modules 44.58% <46.66%> (?)
hadoop-mr-java-client 44.88% <0.00%> (+<0.01%) ⬆️
spark-client-hadoop-common 48.44% <0.00%> (?)
spark-java-tests 48.93% <46.66%> (?)
spark-scala-tests 45.50% <0.00%> (?)
utilities 38.22% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
.../common/model/ReverseOrderHoodieRecordPayload.java 46.66% <46.66%> (ø)

... and 2062 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hudi-bot
Copy link
Copy Markdown
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@nsivabalan nsivabalan merged commit f144abc into apache:master Apr 16, 2026
56 checks passed
dwshmilyss pushed a commit to dwshmilyss/hudi that referenced this pull request May 21, 2026
This PR addresses the need for reverse-order payload merging in Hudi, where the oldest record (based on ordering field) should be preserved instead of the latest. It also adds configurability to control behavior when ordering values are equal, and optimizes the default payload to avoid unnecessary rewrites.

What users gain:

New ReverseOrderHoodieRecordPayload class for use cases requiring oldest-record-wins semantics
Configuration option to control update behavior when ordering field values are equal
Performance improvement by avoiding unnecessary record rewrites when incoming records are older
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M PR with lines of changes in (100, 300]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants