fix: Skip pre-compaction rollback metadata reads in getValidInstantTimestamps by yihua · Pull Request #18544 · apache/hudi

yihua · 2026-04-22T02:08:44Z

Describe the issue this Pull Request addresses

This PR addresses a performance issue in building metadata table-based file system view and constructing metadata table log file reader.

Summary and Changelog

This PR optimizes getValidInstantTimestamps in HoodieTableMetadataUtil to skip reading rollback metadata for rollbacks older than the latest MDT compaction instant. After compaction, rolled-back log blocks are already merged into base files, so pre-compaction rollback timestamps are no longer needed for log block filtering. This avoids sequential storage reads (GCS/S3) for old rollback instants, which causes blocking when the ConcurrentHashMap.computeIfAbsent lock is held during metadata reader opening, leading to Spark driver CPU throttling, e.g., for a timeline server running on Spark driver with a metadata table-based file system view, as the FSV needs to be refreshed before the timeline server can serve requests. When there are 100+ rollbacks on the active timeline, without the fix, the metadata reading for file listing can take 10s or even 100s of seconds, causing severe performance degradation. After this fix, the expected latency should be sub-second if there are a few rollbacks or 0 latency if all rollbacks are filtered out.

HoodieTableMetadataUtil.java: Compute rollbackFilterThreshold as the later of the earliest valid instant and the latest MDT compaction time. Only read rollback metadata for rollbacks newer than this threshold.
TestHoodieTableMetadataUtil.java: Two new tests:
- testGetValidInstantTimestampsSkipsPreCompactionRollbacks — verifies pre-compaction rollbacks are skipped and post-compaction rollbacks are still read
- testGetValidInstantTimestampsReadsAllRollbacksWithNoCompaction — verifies fallback behavior when no MDT compaction exists (all rollbacks read)

Impact

Significantly improves MDT read with log files when there are a lot of rollbacks on the data table timeline.

Risk Level

low

Documentation Update

N/A

Contributor's checklist

Read through contributor's guide
Enough context is provided in the sections above
Adequate tests were added if applicable

hudi-agent

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for the contribution! This PR avoids unnecessary reads of pre-MDT-compaction rollback metadata in getValidInstantTimestamps by clamping the rollback filter threshold to max(earliestInstantTime, latestMdtCompactionTime), since rolled-back log blocks before the latest MDT compaction are already merged into base files. The fallback to SOLO_COMMIT_TIMESTAMP correctly preserves original behavior when no MDT compaction exists. No correctness issues found. A few style/readability suggestions in the inline comments. Please take a look, and this should be ready for a Hudi committer or PMC member to take it from here. One truncated comment in the test — the rest of the change looks clean.

cc @yihua

…mestamps

hudi-bot · 2026-05-09T02:28:58Z

CI report:

67eac7a Azure: FAILURE

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

hudi-agent

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for the contribution! This PR avoids unnecessary sequential reads of pre-MDT-compaction rollback metadata in getValidInstantTimestamps by clamping the rollback filter threshold to max(earliestInstantTime, latestMdtCompactionTime). The fallback to SOLO_COMMIT_TIMESTAMP when no MDT compaction exists correctly preserves the original behavior. One question on the inline comment about consistency with the existing getLatestCompactionTime() helper. Please take a look, and this should be ready for a Hudi committer or PMC member to take it from here. A couple of small readability issues in the new test — one truncated comment and one variable name that implies the wrong shape of data.

hudi-agent · 2026-05-09T02:31:11Z

+    // reads for old rollback instants that can cause long latency during metadata table reading.
    final String earliestInstantTime = validInstantTimestamps.isEmpty() ? SOLO_COMMIT_TIMESTAMP : Collections.min(validInstantTimestamps);
+    final String latestMdtCompactionTime = metadataMetaClient.getActiveTimeline()
+        .getCommitTimeline()


🤖 The existing HoodieBackedTableMetadata.getLatestCompactionTime() (line 808) uses getCommitAndReplaceTimeline() which also includes REPLACE_COMMIT_ACTION / CLUSTERING_ACTION, while this new code uses only getCommitTimeline() (just COMMIT_ACTION). Was this intentional? It's safe today since MDT only emits compaction commits as COMMIT_ACTION, but the inconsistency is a small future-proofing risk if MDT ever gains clustering/replace semantics — and reusing/sharing the existing helper would also avoid the duplicated lookup logic.

_{- AI-generated; verify before applying. React 👍/👎 to flag quality.}

Building ontop of this, if anything else ever writes a COMMIT_ACTION to MDT, this would silently treat that timestamp as a "compaction." It's worth being defensive, consider filtering explicitly on the compaction action.

IIRC, COMMIT_ACTION writes to MDT are exclusively generated by compaction, so this is safe for now.

The only problem that may arise in the future is if there's a change in contract API, and this becomes a regression.

As of now, i don't think this should be a blocker, just want to highlight this.

Feel free to create an issue to track this or if it's worth the effort to clean things up for new users to pick up.

hudi-agent · 2026-05-09T02:31:11Z

+    mdtTestTable.addDeltaCommit("20260101020101000");
+
+    metaClient = HoodieTableMetaClient.reload(metaClient);
+    mdtMetaClient = HoodieTableMetaClient.reload(mdtMetaClient);


🤖 nit: the comment is cut off mid-sentence — "rolled-back commits appear" appears to be missing the end of the thought. Could you complete it, e.g. "rolled-back commits appear in the valid timestamps"?

_{- AI-generated; verify before applying. React 👍/👎 to flag quality.}

hudi-agent · 2026-05-09T02:31:11Z

+    assertFalse(validTimestamps.contains(commit3), "commit3 should NOT be in valid timestamps (pre-compaction rollback skipped)");
+  }
+
+  private void addCompletedRollback(HoodieTestTable testTable, String rollbackTime, String rolledBackCommit) throws Exception {


🤖 nit: emptyPartitionFiles reads as though the map is empty, but it actually contains a partition1 entry. Something like partitionFiles or partitionToFiles might be less surprising.

_{- AI-generated; verify before applying. React 👍/👎 to flag quality.}

codecov-commenter · 2026-05-09T03:38:01Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 67.30%. Comparing base (47bf4e4) to head (67eac7a).

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #18544      +/-   ##
============================================
- Coverage     68.14%   67.30%   -0.84%     
+ Complexity    29077    28665     -412     
============================================
  Files          2522     2522              
  Lines        141177   141185       +8     
  Branches      17514    17515       +1     
============================================
- Hits          96208    95028    -1180     
- Misses        37061    38148    +1087     
- Partials       7908     8009     +101

Flag	Coverage Δ
common-and-other-modules	`44.42% <100.00%> (+<0.01%)`	⬆️
hadoop-mr-java-client	`45.01% <100.00%> (+<0.01%)`	⬆️
spark-client-hadoop-common	`21.36% <0.00%> (-26.99%)`	⬇️
spark-java-tests	`49.00% <100.00%> (+<0.01%)`	⬆️
spark-scala-tests	`44.91% <100.00%> (+<0.01%)`	⬆️
utilities	`37.65% <100.00%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
.../apache/hudi/metadata/HoodieTableMetadataUtil.java	`82.08% <100.00%> (-0.28%)`	⬇️

... and 165 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

voonhous

LGTM with a non-blocking concern.

Let's fix the CI errors before we merge this in.

hudi-agent reviewed Apr 23, 2026

View reviewed changes

Comment thread hudi-hadoop-common/src/test/java/org/apache/hudi/metadata/TestHoodieTableMetadataUtil.java

github-actions Bot added the size:M PR with lines of changes in (100, 300] label Apr 23, 2026

yihua added 2 commits May 8, 2026 19:15

fix: Skip pre-compaction rollback metadata reads in getValidInstantTi…

b317fbf

…mestamps

Fix import

67eac7a

yihua force-pushed the fix-skip-pre-compaction-rollbacks branch from e32a618 to 67eac7a Compare May 9, 2026 02:16

hudi-agent reviewed May 9, 2026

View reviewed changes

yihua added this to the release-1.2.0 milestone May 13, 2026

voonhous approved these changes May 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Skip pre-compaction rollback metadata reads in getValidInstantTimestamps#18544

fix: Skip pre-compaction rollback metadata reads in getValidInstantTimestamps#18544
yihua wants to merge 2 commits into
apache:masterfrom
yihua:fix-skip-pre-compaction-rollbacks

yihua commented Apr 22, 2026 •

edited

Loading

Uh oh!

hudi-agent left a comment

Uh oh!

Uh oh!

hudi-bot commented May 9, 2026

Uh oh!

hudi-agent left a comment

Uh oh!

hudi-agent May 9, 2026

Uh oh!

voonhous May 15, 2026 •

edited

Loading

Uh oh!

hudi-agent May 9, 2026

Uh oh!

hudi-agent May 9, 2026

Uh oh!

codecov-commenter commented May 9, 2026

Uh oh!

voonhous left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

yihua commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe the issue this Pull Request addresses

Summary and Changelog

Impact

Risk Level

Documentation Update

Contributor's checklist

Uh oh!

hudi-agent left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hudi-bot commented May 9, 2026

CI report:

Uh oh!

hudi-agent left a comment

Choose a reason for hiding this comment

Uh oh!

hudi-agent May 9, 2026

Choose a reason for hiding this comment

Uh oh!

voonhous May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hudi-agent May 9, 2026

Choose a reason for hiding this comment

Uh oh!

hudi-agent May 9, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented May 9, 2026

Codecov Report

Uh oh!

voonhous left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

yihua commented Apr 22, 2026 •

edited

Loading

voonhous May 15, 2026 •

edited

Loading

voonhous left a comment •

edited

Loading