Conversation
Greptile SummaryThis PR adds an optional configuration ( Key changes:
Notable issues:
Confidence Score: 4/5Safe to merge after fixing the missing pre-ECTR assertion; remaining issues are style/cleanup Core production logic in TimelineArchiverV1 is correct and well-structured. The P1 issue is a test gap (no assertion pre-ECTR commits are archived), but the separate testArchivalMakesProgressWhenECTRIsLaterThanArchivalWindow test does cover this path. Unrelated HoodieWriteConfig changes are safe (no active callers). Minor naming and comment issues in tests bring the score to 4 rather than 5. TestHoodieTimelineArchiver.java — missing assertion and dead-code variable; HoodieWriteConfig.java — unrelated changes should be confirmed or extracted. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[archiveIfRequired called] --> B{shouldBlockArchivalOnCleanECTR?}
B -- No --> F[Build instantToArchiveStream with existing filters]
B -- Yes --> C[Get last completed clean instant]
C --> D{lastCleanInstant present?}
D -- No --> F
D -- Yes --> E[readCleanMetadata from activeTimeline]
E -- IOException --> ERR[throw HoodieIOException - abort archival]
E -- success --> G{ECTR non-null and non-empty?}
G -- No --> F
G -- Yes --> H[oldestInstantToRetainForClean = firstInstant after-or-equals ECTR in commitTimeline]
H --> F
F --> I[Apply filter: s.requestedTime < ECTR orElse true]
I --> J[Apply limit: countInstants - minInstantsToKeep]
J --> K[Archive selected instants]
Reviews (3): Last reviewed commit: "Addressing feedback in tests" | Re-trigger Greptile |
📝 WalkthroughWalkthroughWhen enabled, archival consults the latest CLEAN metadata's ECTR (Earliest Commit To Retain) and blocks archiving of commit instants older than that threshold; failures reading clean metadata under this mode throw a HoodieIOException. Changes
Sequence DiagramsequenceDiagram
participant Client
participant TimelineArchiver
participant Config
participant Timeline
participant CleanMetadata
participant CommitFilter
Client->>TimelineArchiver: trigger archiveIfRequired()
TimelineArchiver->>Config: shouldBlockArchivalOnCleanECTR()
alt ECTR Blocking Enabled
TimelineArchiver->>Timeline: get latest CLEAN instant
TimelineArchiver->>CleanMetadata: read/parse HoodieCleanMetadata
CleanMetadata-->>TimelineArchiver: earliestCommitToRetain (ECTR)
alt ECTR present
TimelineArchiver->>CommitFilter: apply ECTR-based lower bound filter
CommitFilter-->>TimelineArchiver: commits >= ECTR retained
else ECTR missing/blank
TimelineArchiver->>CommitFilter: skip ECTR filter
CommitFilter-->>TimelineArchiver: normal filtered commits
end
else ECTR Blocking Disabled
TimelineArchiver->>CommitFilter: apply standard archival filters
CommitFilter-->>TimelineArchiver: filtered commits
end
TimelineArchiver->>Timeline: archive selected commit instants
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/TestHoodieTimelineArchiver.java (1)
2620-2628: Rename this V2 test to match what it actually verifies.The Javadoc and method name say V2 blocks on clean ECTR, but the assertions below verify the opposite: V2 archives the ECTR instant and does not block on it. That mismatch will mislead the next person debugging a failure here.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/TestHoodieTimelineArchiver.java` around lines 2620 - 2628, The test's name and Javadoc claim that TimelineArchiverV2 "blocks" on a clean ECTR, but the assertions actually verify that V2 archives the ECTR instant (i.e., does not block); rename the test method testArchivalBlocksOnCleanECTRWithTimelineArchiverV2AndVersion9 and update its Javadoc to reflect that it verifies TimelineArchiverV2 archives (does not block on) clean ECTR for table version 9, and adjust any internal references/comments to mention TimelineArchiverV2 archives the ECTR instant rather than blocking.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/versioning/v1/TimelineArchiverV1.java`:
- Around line 291-294: The catch block in TimelineArchiverV1 currently logs and
throws with a message that says "skipping ECTR check" which is inaccurate
because the code immediately throws and aborts archival; update both the
log.warn call and the HoodieIOException message in the catch(IOException e)
handler so the text reflects that the archival is being aborted (e.g., "aborting
archival due to failure reading clean metadata for <instant>") instead of saying
it will skip the ECTR check, leaving the exception chaining intact.
In
`@hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/TestHoodieTimelineArchiver.java`:
- Around line 2596-2617: The test only asserts "some progress" and checks a few
specific commits but doesn't verify the full pre-ECTR retention window; update
the assertions after archival (using commitsBeforeArchival/commitsAfterArchival,
activeCommits and activeCommitTimes) to assert that the set of active commit
times is exactly the expected window ["00000008","00000009","00000010"] (e.g.
check size == 3 and that it contains those three) so any leftover commits like
"00000003"-"00000007" would fail the test.
- Around line 2380-2384: Replace the weak inequality check using
commitsAfterArchival/commitsBeforeArchival with an assertion that the ECTR-bound
commit is actually removed: locate the test in TestHoodieTimelineArchiver.java
where commitsAfterArchival and commitsBeforeArchival are compared and instead
assert that the specific commit id "00000003" (the ECTR-bound instant) is not
present in the active timeline or that it appears in the archived timeline;
e.g., check activeCommits.contains("00000003") is false or
archivedTimeline.containsInstant(Instant.ofEpochMilli(...)/or the API used in
this test) is true so the test verifies the ECTR commit was archived when ECTR
blocking is disabled.
---
Nitpick comments:
In
`@hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/TestHoodieTimelineArchiver.java`:
- Around line 2620-2628: The test's name and Javadoc claim that
TimelineArchiverV2 "blocks" on a clean ECTR, but the assertions actually verify
that V2 archives the ECTR instant (i.e., does not block); rename the test method
testArchivalBlocksOnCleanECTRWithTimelineArchiverV2AndVersion9 and update its
Javadoc to reflect that it verifies TimelineArchiverV2 archives (does not block
on) clean ECTR for table version 9, and adjust any internal references/comments
to mention TimelineArchiverV2 archives the ECTR instant rather than blocking.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b4aab404-17c6-4dad-880f-c87514d98e14
📒 Files selected for processing (4)
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/versioning/v1/TimelineArchiverV1.javahudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieArchivalConfig.javahudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.javahudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/TestHoodieTimelineArchiver.java
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/TestHoodieTimelineArchiver.java (1)
2294-2306:⚠️ Potential issue | 🟠 MajorAssert the exact retained commit window here.
This still passes if archival is skipped entirely, because
00000003/00000004/00000005would all remain active in that broken case too. Please also assert that00000001and00000002are gone, or compare the active commit list against the exact expected set.Suggested assertion shape
- assertTrue(activeCommitTimes.contains("00000003"), "Commit 00000003 (ECTR) should not be archived"); - assertTrue(activeCommitTimes.contains("00000004"), "Commit 00000004 (after ECTR) should not be archived"); - assertTrue(activeCommitTimes.contains("00000005"), "Commit 00000005 (after ECTR) should not be archived"); + assertEquals(Arrays.asList("00000003", "00000004", "00000005"), activeCommitTimes, + "ECTR should retain exactly commits 00000003-00000005 in the active commit timeline");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/TestHoodieTimelineArchiver.java` around lines 2294 - 2306, The test only checks that commits >= ECTR remain active which still passes when archival is entirely skipped; update the assertions around activeCommits/activeCommitTimes (the variable activeCommits and mapping via HoodieInstant::requestedTime) to assert the exact expected retained window: assert that "00000001" and "00000002" are not present (e.g., assertFalse on activeCommitTimes.contains for those) and assert that the activeCommitTimes exactly equals the expected list ["00000003","00000004","00000005"] (or use an assertEquals between a sorted expected list and activeCommitTimes) so the test fails if archival was not applied.
🧹 Nitpick comments (1)
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/TestHoodieTimelineArchiver.java (1)
2629-2637: Rename this test to match what it actually verifies.The method/Javadoc says V2 “blocks” on clean ECTR, but the assertions verify the opposite behavior by expecting
00000003to be archived. A name/description that says V2 does not block on ECTR would be much less misleading.Also applies to: 2722-2728
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/TestHoodieTimelineArchiver.java` around lines 2629 - 2637, Rename the test method testArchivalBlocksOnCleanECTRWithTimelineArchiverV2AndVersion9 and its Javadoc to reflect the actual expectation (that TimelineArchiverV2 does not block archival on clean ECTR) — e.g., testArchivalDoesNotBlockOnCleanECTRWithTimelineArchiverV2AndVersion9 — and update the Javadoc description accordingly; also apply the same rename/update to the corresponding test referenced around the other block (the test at the 2722-2728 region) so names/comments consistently state "DoesNotBlock" instead of "Blocks" to match the assertions expecting 00000003 to be archived.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/TestHoodieTimelineArchiver.java`:
- Around line 2396-2451: Add a new test (or extend
testArchivalContinuesWhenCleanMetadataIsMissing) that creates a completed clean
instant and then corrupts/removes its metadata file before invoking
TimelineArchiverV1.archiveIfRequired so the code exercises the new failure path:
create commits and a completed clean instant using HoodieTestTable/metaClient,
identify the clean instant from
metaClient.getActiveTimeline().getCleanerTimeline().filterCompletedInstants(),
delete or corrupt that instant's .clean metadata file via the filesystem
obtained from metaClient, then call new TimelineArchiverV1(writeConfig,
table).archiveIfRequired(context) and assertThrows HoodieIOException to validate
the abort behavior when clean metadata exists but cannot be read.
---
Duplicate comments:
In
`@hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/TestHoodieTimelineArchiver.java`:
- Around line 2294-2306: The test only checks that commits >= ECTR remain active
which still passes when archival is entirely skipped; update the assertions
around activeCommits/activeCommitTimes (the variable activeCommits and mapping
via HoodieInstant::requestedTime) to assert the exact expected retained window:
assert that "00000001" and "00000002" are not present (e.g., assertFalse on
activeCommitTimes.contains for those) and assert that the activeCommitTimes
exactly equals the expected list ["00000003","00000004","00000005"] (or use an
assertEquals between a sorted expected list and activeCommitTimes) so the test
fails if archival was not applied.
---
Nitpick comments:
In
`@hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/TestHoodieTimelineArchiver.java`:
- Around line 2629-2637: Rename the test method
testArchivalBlocksOnCleanECTRWithTimelineArchiverV2AndVersion9 and its Javadoc
to reflect the actual expectation (that TimelineArchiverV2 does not block
archival on clean ECTR) — e.g.,
testArchivalDoesNotBlockOnCleanECTRWithTimelineArchiverV2AndVersion9 — and
update the Javadoc description accordingly; also apply the same rename/update to
the corresponding test referenced around the other block (the test at the
2722-2728 region) so names/comments consistently state "DoesNotBlock" instead of
"Blocks" to match the assertions expecting 00000003 to be archived.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a43ad2c3-a3ea-40e5-814c-d7ad67bc8aa5
📒 Files selected for processing (2)
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/versioning/v1/TimelineArchiverV1.javahudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/TestHoodieTimelineArchiver.java
| /** | ||
| * Tests graceful handling when clean metadata is missing or cannot be read. | ||
| * Verifies archival continues normally without blocking when metadata read fails. | ||
| */ | ||
| @Test | ||
| public void testArchivalContinuesWhenCleanMetadataIsMissing() throws Exception { | ||
| initTableToTestECTRBlock(); | ||
|
|
||
| // Create config with ECTR blocking enabled | ||
| HoodieWriteConfig writeConfig = HoodieWriteConfig.newBuilder() | ||
| .withPath(basePath) | ||
| .withSchema(HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA) | ||
| .withParallelism(2, 2) | ||
| .withArchivalConfig(HoodieArchivalConfig.newBuilder() | ||
| .archiveCommitsWith(2, 3) | ||
| .withBlockArchivalOnCleanECTR(true) | ||
| .build()) | ||
| .withCleanConfig(HoodieCleanConfig.newBuilder() | ||
| .retainCommits(1) | ||
| .build()) | ||
| .withFileSystemViewConfig(FileSystemViewStorageConfig.newBuilder() | ||
| .withRemoteServerPort(timelineServicePort) | ||
| .build()) | ||
| .withMetadataConfig(HoodieMetadataConfig.newBuilder() | ||
| .enable(false) | ||
| .build()) | ||
| .forTable("test-trip-table") | ||
| .build(); | ||
|
|
||
| HoodieTestTable testTable = HoodieTestTable.of(metaClient); | ||
|
|
||
| // Create 6 commits | ||
| for (int i = 1; i <= 6; i++) { | ||
| testTable.addCommit(String.format("%08d", i)); | ||
| } | ||
|
|
||
| // DO NOT create a clean commit - this simulates missing clean metadata | ||
|
|
||
| // Reload metaClient | ||
| metaClient = HoodieTableMetaClient.reload(metaClient); | ||
|
|
||
| // Trigger archival - should not fail even though no clean metadata exists | ||
| HoodieTable table = HoodieSparkTable.create(writeConfig, context, metaClient); | ||
| TimelineArchiverV1 archiver = new TimelineArchiverV1(writeConfig, table); | ||
|
|
||
| // Should not throw exception | ||
| assertDoesNotThrow(() -> archiver.archiveIfRequired(context), | ||
| "Archival should continue gracefully when clean metadata is missing"); | ||
|
|
||
| // Verify some archival happened (based on min/max archival commits) | ||
| metaClient = HoodieTableMetaClient.reload(metaClient); | ||
| int commitsAfterArchival = metaClient.getActiveTimeline().getCommitsTimeline().filterCompletedInstants().countInstants(); | ||
|
|
||
| // With 6 commits and max_archival=3, some archival should occur | ||
| assertTrue(commitsAfterArchival <= 3, "Archival should proceed when clean metadata is missing"); | ||
| } |
There was a problem hiding this comment.
This does not exercise the new metadata-read failure path.
This case only covers the absence of a clean instant. The new V1 behavior in hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/versioning/v1/TimelineArchiverV1.java throws HoodieIOException when a completed clean instant exists but its metadata cannot be read, and that abort path is still untested here. Please add a case with a completed clean instant whose metadata file is missing/corrupt and assert the exception.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/TestHoodieTimelineArchiver.java`
around lines 2396 - 2451, Add a new test (or extend
testArchivalContinuesWhenCleanMetadataIsMissing) that creates a completed clean
instant and then corrupts/removes its metadata file before invoking
TimelineArchiverV1.archiveIfRequired so the code exercises the new failure path:
create commits and a completed clean instant using HoodieTestTable/metaClient,
identify the clean instant from
metaClient.getActiveTimeline().getCleanerTimeline().filterCompletedInstants(),
delete or corrupt that instant's .clean metadata file via the filesystem
obtained from metaClient, then call new TimelineArchiverV1(writeConfig,
table).archiveIfRequired(context) and assertThrows HoodieIOException to validate
the abort behavior when clean metadata exists but cannot be read.
Mirror of apache#18380 for automated bot review.
Original author: @nsivabalan
Base branch: master
Summary by CodeRabbit
New Features
Bug Fixes
Tests