Skip to content

[OSS PR #18295] feat(metadata): Allow users to safely execute compaction plans on metadata table concurrently through a table service platform (rather than only inline during write)#27

Open
yihua wants to merge 22 commits into
masterfrom
oss-18295

Conversation

@yihua
Copy link
Copy Markdown
Owner

@yihua yihua commented Apr 9, 2026

Mirror of apache#18295 for automated bot review.

Original author: @kbuci
Base branch: master

Summary by CodeRabbit

  • New Features

    • Clustering expiration with automatic rollback; table-service-manager delegation for metadata actions; Flink read record limit push‑down; vector search table-valued functions (single & batch) with distance metrics.
  • Improvements

    • Better metadata-table concurrency controls and delegation; smarter compaction/log‑compaction gating; distributed metrics propagation to executors; stricter VECTOR schema validation and clearer rejection of unsupported file format on non‑Spark engines; improved partition discovery options.
  • Tests

    • Expanded end-to-end and unit coverage for clustering expiration, locking, metrics, CDC, record limiting, compaction, and vector search.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Multiple features and refactors across modules: clustering-expiration/heartbeat rollback, metadata-table service delegation, distributed metric registries, Flink reader refactors with global record-limit push-down, VECTOR nesting validation, compaction/log-compaction helpers, lock-config derivation, catalog-backed partition listing, CDC iterator/image refactor, many handler/factory abstractions, and extensive tests.

Changes

Cohort / File(s) Summary
Clustering expiration & heartbeat
hudi-client/.../BaseHoodieTableServiceClient.java, hudi-client/.../BaseHoodieClient.java, hudi-client/.../HoodieClusteringConfig.java, hudi-client/.../HoodieWriteConfig.java, hudi-utilities/.../HoodieClusteringJob.java, hudi-client/.../TestBaseHoodieTableServiceClient.java, hudi-utilities/.../TestHoodieClusteringJob.java
Add clustering expiration config and threshold, heartbeat start/stop on clustering instants, eligibility API isClusteringInstantEligibleForRollback(...), partition-scoped pending-instant discovery and rollback; tests added/updated.
MDT delegation & metadata concurrency
hudi-client/.../HoodieBackedTableMetadataWriter*.java, hudi-client/.../HoodieMetadataWriteUtils.java, hudi-common/.../HoodieMetadataConfig.java, hudi-client/.../TestHoodieBackedTableMetadataWriter.java, hudi-client/.../TestHoodieMetadataWriteUtils.java
Introduce table-service-manager delegation flags/actions for MDT compaction/log-compaction; add metadata write concurrency mode, validation for multi-writer MDT, and TSM propagation into MDT configs; tests updated.
Lock config derivation
hudi-client/.../HoodieLockConfig.java, hudi-client/.../TestHoodieLockConfig.java
Add built-in lock provider constants and deriveLockConfigForDifferentTable(...) with provider-specific identity validation and property copying; unit tests added.
Distributed registries & wrapper FS metrics
hudi-client-spark/.../DistributedRegistryUtil.java, hudi-client-spark/.../HoodieSparkEngineContext.java, hudi-common/.../HoodieEngineContext.java, hudi-io/.../Registry.java, hudi-io/.../LocalRegistry.java, hudi-client-spark/.../DistributedRegistry.java, hudi-hadoop-common/.../HoodieWrapperFileSystem.java, hudi-client-spark/.../TestDistributedRegistry.java
Extend Registry for table-scoped keys, getRegistryOfClass, setRegistries, getName; add engine/context support to create/distribute registries and executor propagation; centralize wrapper-FS registry wiring; tests added.
Spark write client metric wiring
hudi-client-spark/.../SparkRDDWriteClient.java
Move wrapper filesystem registry initialization into DistributedRegistryUtil and remove class-local metrics init override.
Compaction / log-compaction helpers
hudi-client/.../ScheduleCompactionActionExecutor.java, hudi-common/.../CompactionUtils.java, hudi-hadoop-common/.../TestCompactionUtils.java
Add helpers to locate last log compaction and delta-commit ranges; refine scheduling logic for log-compaction threshold and gating; tests added.
VECTOR schema nesting & conversions
hudi-common/.../HoodieSchema.java, hudi-spark/.../HoodieSparkSchemaConverters.scala, hudi-common/.../TestHoodieSchema.java, hudi-spark/.../TestSchemaConverters.scala, hudi-spark/.../TestVectorDataSource.scala
Enforce VECTOR only as top-level record field via depth-tracked validation in schema builders/converters; add failure messages and tests for nested VECTOR rejection.
Flink reader refactor & record-limit push-down
hudi-flink/.../HoodieScanContext.java, HoodieSource.java, HoodieSourceReader.java, HoodieSourceSplitReader.java, RecordLimiter.java, AbstractSplitReaderFunction.java, HoodieSplitReaderFunction.java, HoodieCdcSplitReaderFunction.java, tests under hudi-flink/.../source/reader/*
Add limit to scan context; implement RecordLimiter to enforce global row limits across splits and drain remaining splits as finished; propagate limiter through split readers; factor shared reader state into AbstractSplitReaderFunction; many tests added/updated.
Flink split assigner & bucket assigner
hudi-flink/.../HoodieSplitAssigners.java, hudi-flink/.../HoodieSplitBucketAssigner.java, tests
Change bucket assignment to use bucket-index partition configuration (constructor now accepts Configuration), derive task id via partition-index func instead of simple modulo; tests updated.
Flink table/source adjustments
hudi-flink/.../HoodieTableSource.java, HoodieSourceReader.java, HoodieSourceSplitReader.java, tests
Pass full HoodieScanContext into readers; readers accept optional RecordLimiter; tests adjusted.
Incremental query pruning & instant normalization
hudi-spark-common/.../IncrementalRelationV1.scala, IncrementalRelationV2.scala, IncrementalRelationUtil.scala, DataSourceOptions.scala, HoodieStreamSourceV1.scala, HoodieStreamSourceV2.scala, hudi-spark/.../HoodieSqlCommonUtils.scala, HoodieTableChanges.scala, tests
Switch incremental relations to PrunedScan, add utilities to compute pruned schemas and filter DataFrames; add formatIncrementalInstant(...) to normalize START/END commit expressions (ISO/epoch/legacy sentinels); tests added.
Parquet iterator lifecycle
hudi-hadoop-common/.../ParquetReaderIterator.java, test update
Make Parquet reader iterator close idempotent and close on exhaustion; test expectation updated.
Catalog-backed partition listing & partition filters
hudi-spark/.../CatalogBackedTableMetadata.scala, SparkHoodieTableFileIndex.scala, HoodieFileIndex.scala, BaseHoodieTableFileIndex.java, HoodieTableMetadata.java, PartitionPathFilterUtil.java, tests
Add CatalogBackedTableMetadata to obtain partitions via Spark catalog when enabled; thread partition-predicate expressions through listing APIs and factor prefix predicate via PartitionPathFilterUtil; tests added.
Flink table-service handler abstraction & factory
many files under hudi-flink/.../sink/compact/handler/*, TableServiceHandlerFactory.java, operator classes (CompactOperator, CompactionPlanOperator, CompactionCommitSink, etc.), composites and tests
Abstract handler types introduced (CompactHandler, CompactionPlanHandler, CompactionCommitHandler, CleanHandler), add concrete data/metadata implementations, composite wrappers and a factory to produce appropriate single/composite handlers; operators refactored to use factory instances; extensive tests added.
CDC reader refactor: image manager and iterators
hudi-flink/.../CdcImageManager.java, CdcIterators.java, CdcInputFormat.java, HoodieCdcSplitReaderFunction.java, tests
Extract CDC image caching and per-split iterator implementations into CdcImageManager and CdcIterators; simplify CdcInputFormat and CDC split readers to use new utilities; large test updates.
Registry/metrics utilities & small API changes
hudi-io/.../Registry.java, LocalRegistry.java, DistributedRegistry.java, HoodieWrapperFileSystem.java, DistributedRegistryUtil.java, HoodieEngineContext.java
Registry enhanced with key separator, computeIfAbsent creation, table-scoped registry support, ability to pre-register registries, and getName(); Local/Distributed registries expose getName; wrapper FS registry constants added.
I/O / varint & HFile tests
hudi-io/.../IOUtils.java, HFileRootIndexBlock.java, HFileMetaIndexBlock.java, hudi-io/.../TestHFileWriter.java, TestIOUtils.java
Add IOUtils.writeVarInt for Hadoop-compatible varint encoding; use it in HFile index block serialization; add tests for long key varint behavior.
LANCE format guards
many Flink readers/factory/catalog files (HoodieTableFactory, FlinkRowDataReaderContext, HoodieHiveCatalog, HoodieFileReaderFactory, HoodieFileWriterFactory), tests
Introduce explicit validation and UnsupportedOperationException messages forbidding LANCE base format for non-Spark engines, using HoodieFileFormat.LANCE_SPARK_ONLY_ERROR_MSG; tests added.
Numerous tests & small fixes
many test files across modules
Add/adjust extensive unit/integration/functional tests reflecting changes: lock derivation, registry distribution, MDT delegation, compaction plan/commit changes, record-limiter concurrency, instant parsing, VECTOR validation, vector-search TVF, LANCE guards, HFile varint tests, and many Flink harness tests.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🐇 I hopped through timelines, heartbeats in tow,
Wrapped registries scattered where metrics now flow.
Limits that cut when the last carrot's been found,
Schemas that guard vectors from nesting around.
A rabbitly nibble — the patch humbly chewed. 🥕✨

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch oss-18295

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java (1)

1352-1359: ⚠️ Potential issue | 🟠 Major

Validate table.service.manager.actions for property-based builders too.

Only withTableServiceManagerActions() calls validateTableServiceManagerActions(...). Values supplied through fromFile(), fromProperties(), or withProperties() bypass validation entirely, so build() currently accepts invalid action lists and pushes the failure downstream.

Fail fast during build
     public HoodieMetadataConfig build() {
       metadataConfig.setDefaultValue(ENABLE, getDefaultMetadataEnable(engineType));
       metadataConfig.setDefaultValue(ENABLE_METADATA_INDEX_COLUMN_STATS, getDefaultColStatsEnable(engineType));
       metadataConfig.setDefaultValue(SECONDARY_INDEX_ENABLE_PROP, getDefaultSecondaryIndexEnable(engineType));
       metadataConfig.setDefaultValue(STREAMING_WRITE_ENABLED, getDefaultForStreamingWriteEnabled(engineType));
       // fix me: disable when schema on read is enabled.
       metadataConfig.setDefaults(HoodieMetadataConfig.class.getName());
+      String actions = metadataConfig.getString(TABLE_SERVICE_MANAGER_ACTIONS);
+      if (StringUtils.nonEmpty(actions)) {
+        validateTableServiceManagerActions(actions);
+      }
       return metadataConfig;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java`
around lines 1352 - 1359, build() lets property-based builders bypass validation
of table.service.manager.actions because only withTableServiceManagerActions()
invokes validateTableServiceManagerActions(...); fix by invoking
validateTableServiceManagerActions(...) in HoodieMetadataConfig.build() for any
actions coming from properties/file defaults as well—retrieve the configured
table.service.manager.actions value (the same key used by
withTableServiceManagerActions), call
validateTableServiceManagerActions(actions) before returning metadataConfig, and
ensure the validation path covers values supplied via fromFile(),
fromProperties(), and withProperties(), not just the fluent
withTableServiceManagerActions() setter.
🟡 Minor comments (7)
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/IncrementalRelationV1.scala-308-309 (1)

308-309: ⚠️ Potential issue | 🟡 Minor

Duplicate log statement.

The "Additional Filters to be applied" message is logged twice - once at line 244 and again at line 309. Remove the duplicate to avoid log noise.

Suggested fix
       scanDf = IncrementalRelationUtil.filterRequiredColumnsFromDF(scanDf, requiredColumns, metaClient)
-      log.info("Additional Filters to be applied to incremental source are :" + filters.mkString("Array(", ", ", ")"))
       filters.foldLeft(scanDf)((e, f) => e.filter(f)).rdd
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/IncrementalRelationV1.scala`
around lines 308 - 309, Remove the duplicate log.info call that prints
"Additional Filters to be applied..." after calling
IncrementalRelationUtil.filterRequiredColumnsFromDF; keep the earlier log
emitted at the original location (around the previous log at line ~244).
Specifically, delete the log.info(...) invocation that uses
filters.mkString("Array(", ", ", ")") in IncrementalRelationV1.scala (the one
immediately following the scanDf =
IncrementalRelationUtil.filterRequiredColumnsFromDF(...) line) so the message is
logged only once.
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/split/TestDefaultHoodieSplitProvider.java-567-568 (1)

567-568: ⚠️ Potential issue | 🟡 Minor

Pin the bucket config in this test.

These assertions now depend on the default value of FlinkOptions.BUCKET_INDEX_NUM_BUCKETS. If that default changes, this test can fail for the wrong reason. Set the option explicitly here so the bucket→task mapping stays stable.

Suggested test hardening
+import org.apache.hudi.configuration.FlinkOptions;
...
-    HoodieSplitBucketAssigner assigner =
-        new HoodieSplitBucketAssigner(4, new Configuration());
+    Configuration conf = new Configuration();
+    conf.set(FlinkOptions.BUCKET_INDEX_NUM_BUCKETS, 4);
+    HoodieSplitBucketAssigner assigner =
+        new HoodieSplitBucketAssigner(4, conf);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/split/TestDefaultHoodieSplitProvider.java`
around lines 567 - 568, The test relies on the default for
FlinkOptions.BUCKET_INDEX_NUM_BUCKETS which makes it brittle; instead create a
Configuration, set FlinkOptions.BUCKET_INDEX_NUM_BUCKETS explicitly (e.g. to 4)
and pass that Configuration into the HoodieSplitBucketAssigner constructor
(HoodieSplitBucketAssigner) so the bucket→task mapping is stable regardless of
global defaults.
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/split/assign/TestHoodieSplitBucketAssigner.java-177-194 (1)

177-194: ⚠️ Potential issue | 🟡 Minor

This test never proves the assignments actually differ.

"country=DE" only guarantees a non-zero hashCode, not a non-zero hashCode() % parallelism. If that modulo is 0, taskWith4 and taskWith8 can be identical and this still passes, so the behavior promised by the test name is not covered. Pick a partition with a known modulo and assert the two task IDs are different, or rename the test.

Suggested assertion fix
+import static org.junit.jupiter.api.Assertions.assertNotEquals;
...
-    String partition = "country=DE"; // non-zero hashCode guaranteed
+    String partition = "p1"; // hashCode = 3521, so hashCode % 5 == 1
...
     assertEquals(expectedTaskId(partition, bucket, 4, parallelism), taskWith4);
     assertEquals(expectedTaskId(partition, bucket, 8, parallelism), taskWith8);
+    assertNotEquals(taskWith4, taskWith8,
+        "Changing numBuckets should change the assignment for this partition");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/split/assign/TestHoodieSplitBucketAssigner.java`
around lines 177 - 194, The test
testDifferentNumBucketsProduceDifferentAssignments currently can pass even when
assignments are identical because "country=DE" may produce the same task id
after modulo; update the test so it actually proves different assignments by
choosing a partition value with a known hashCode behavior or by explicitly
asserting the two assignments differ. Concretely, keep the existing calls to new
HoodieSplitBucketAssigner(parallelism, conf4).assign(split(...)) and conf8, then
add an assertion like assertNotEquals(taskWith4, taskWith8) (or replace the
partition with a literal whose hashCode % parallelism yields different
expectedTaskId values) in addition to the existing expectedTaskId checks;
reference the test method testDifferentNumBucketsProduceDifferentAssignments,
the HoodieSplitBucketAssigner constructor, the assign(...) call, the split(...)
helper, and expectedTaskId(...) when making the change.
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/ITTestHoodieDataSource.java-3244-3247 (1)

3244-3247: ⚠️ Potential issue | 🟡 Minor

Assert the filtered row, not just the row count.

This passes even if the engine applies LIMIT 1 before the partition filter, because any single row satisfies the assertion. Please also assert that the returned row belongs to par1 so the test actually protects filter+limit push-down behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/ITTestHoodieDataSource.java`
around lines 3244 - 3247, Update the assertion for the query result stored in
result1 to verify the filtered row's partition value instead of only its count:
after executing tableEnv.sqlQuery("select * from t1 where `partition` = 'par1'
limit 1").execute().collect() and converting to result1, assert that result1 has
size 1 and that the single Row in result1 has its `partition` field equal to
"par1" (or assert the expected id value for that partition), so the test
verifies filter+limit push-down rather than just row count.
hudi-client/hudi-client-common/src/test/java/org/apache/hudi/metadata/TestHoodieMetadataWriteUtils.java-163-183 (1)

163-183: ⚠️ Potential issue | 🟡 Minor

Assert the streaming-write flag in this test.

testCreateMetadataWriteConfigForcesStreamingWritesOffWithMultiWriter currently only re-validates the failed-write policy, concurrency mode, and lock provider. If STREAMING_WRITE_ENABLED regresses back to true, this test still passes. Please add an explicit assertion for the MDT streaming-write setting here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-client/hudi-client-common/src/test/java/org/apache/hudi/metadata/TestHoodieMetadataWriteUtils.java`
around lines 163 - 183, The test
testCreateMetadataWriteConfigForcesStreamingWritesOffWithMultiWriter is missing
an explicit assertion that streaming writes are disabled on the metadata config;
after obtaining metadataWriteConfig from
HoodieMetadataWriteUtils.createMetadataWriteConfig, add an assertion that the
metadataWriteConfig's metadata config streaming flag is false (e.g.,
assertFalse(metadataWriteConfig.getMetadataConfig().isStreamingWriteEnabled())
or equivalent) so the test fails if STREAMING_WRITE_ENABLED regresses to true.
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieClusteringConfig.java-279-285 (1)

279-285: ⚠️ Potential issue | 🟡 Minor

Reject negative clustering expiration thresholds.

EXPIRATION_THRESHOLD_MINS currently accepts any long. A negative value effectively removes the age guardrail and makes heartbeat-expired clustering instants immediately rollback-eligible. Please validate this as non-negative in Builder.validate().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieClusteringConfig.java`
around lines 279 - 285, The EXPIRATION_THRESHOLD_MINS ConfigProperty currently
allows negative values; update HoodieClusteringConfig.Builder.validate() to
ensure the value retrieved from EXPIRATION_THRESHOLD_MINS is non-negative and
throw a clear IllegalArgumentException (or similar) if it is negative. Locate
the check in Builder.validate(), read the long via
EXPIRATION_THRESHOLD_MINS.get(config) (or equivalent accessor), and add a guard
like if (value < 0) throw new
IllegalArgumentException("hoodie.clustering.expiration.threshold.mins must be
non-negative: " + value) so invalid configs are rejected early.
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/reader/function/TestAbstractSplitReaderFunction.java-213-245 (1)

213-245: ⚠️ Potential issue | 🟡 Minor

Assert cross-instance non-identity in the “independence” tests.

These tests only prove that each instance is stable across repeated calls. A broken implementation that shares one static Configuration/HoodieWriteConfig across every reader would still pass.

Suggested test tightening
 import static org.junit.jupiter.api.Assertions.assertFalse;
 import static org.junit.jupiter.api.Assertions.assertNotNull;
+import static org.junit.jupiter.api.Assertions.assertNotSame;
 import static org.junit.jupiter.api.Assertions.assertSame;
 import static org.junit.jupiter.api.Assertions.assertTrue;
@@
     assertNotNull(hadoopConf1);
     assertNotNull(hadoopConf2);
+    assertNotSame(hadoopConf1, hadoopConf2,
+        "distinct reader instances should not share the same hadoopConf singleton");
     // Each instance must keep returning its own stable singleton.
     assertSame(hadoopConf1, fn1.hadoopConfForTest(),
         "fn1's hadoopConf must remain the same singleton across calls");
@@
     assertNotNull(wc1);
     assertNotNull(wc2);
+    assertNotSame(wc1, wc2,
+        "distinct reader instances should not share the same writeConfig singleton");
     // Each instance must keep returning its own stable singleton.
     assertSame(wc1, fn1.writeConfigForTest(),
         "fn1's writeConfig must remain the same singleton across calls");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/reader/function/TestAbstractSplitReaderFunction.java`
around lines 213 - 245, The tests
testTwoInstancesHaveStableIndependentHadoopConf and
testTwoInstancesHaveStableIndependentWriteConfig currently only assert
per-instance stability; update each to also assert the two instances do not
share the same object by adding an assertNotSame between fn1.hadoopConfForTest()
and fn2.hadoopConfForTest() in the Hadoop config test, and an assertNotSame
between fn1.writeConfigForTest() and fn2.writeConfigForTest() in the write
config test so that hadoopConfForTest and writeConfigForTest prove
cross-instance independence as well as stability.
🧹 Nitpick comments (7)
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestCOWDataSource.scala (1)

2605-2617: Strengthen this regression test by asserting _hoodie_partition_path values explicitly.

Line 2606 selects _hoodie_partition_path, but current assertions only check _row_key and data. Adding value checks here will validate the duplicate-meta-field behavior end-to-end.

Proposed test assertion enhancement
     assertEquals("row1", results(0).getAs[String]("_row_key"))
+    assertEquals("src_partition1", results(0).getAs[String]("_hoodie_partition_path"))
     assertEquals("value1", results(0).getAs[String]("data"))
 
     assertEquals("row2", results(1).getAs[String]("_row_key"))
+    assertEquals("src_partition1", results(1).getAs[String]("_hoodie_partition_path"))
     assertEquals("value2", results(1).getAs[String]("data"))
 
     assertEquals("row3", results(2).getAs[String]("_row_key"))
+    assertEquals("src_partition2", results(2).getAs[String]("_hoodie_partition_path"))
     assertEquals("value3", results(2).getAs[String]("data"))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestCOWDataSource.scala`
around lines 2605 - 2617, Add explicit assertions for the _hoodie_partition_path
column in the TestCOWDataSource regression check: after collecting incrementalDf
into results and before or alongside the existing asserts on "_row_key" and
"data", assert that results(0).getAs[String]("_hoodie_partition_path"),
results(1).getAs[String]("_hoodie_partition_path"), and
results(2).getAs[String]("_hoodie_partition_path") equal the expected partition
values for row1, row2, and row3 respectively; update the test near the
incrementalDf.select(...).collect() block (references: incrementalDf, results)
to include these checks so duplicate-meta-field behavior is validated
end-to-end.
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/util/IncrementalRelationUtil.scala (3)

28-75: Consider using explicit return type annotation.

The getPrunedSchema method lacks an explicit return type. While Scala can infer it, adding : StructType improves code readability and API clarity.

Suggested fix
   def getPrunedSchema(requiredColumns: Array[String],
                       usedSchema: StructType,
-                      metaClient: HoodieTableMetaClient) = {
+                      metaClient: HoodieTableMetaClient): StructType = {
     var prunedSchema = StructType(Seq())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/util/IncrementalRelationUtil.scala`
around lines 28 - 75, Add an explicit return type to the getPrunedSchema method
signature: declare it as returning org.apache.spark.sql.types.StructType (or
simply StructType if the import is present) so the signature becomes def
getPrunedSchema(...): StructType = { ... }; leave the implementation
unchanged—only update the method declaration to include the return type for
clarity.

77-109: Consider using explicit return type annotation.

Similarly, filterRequiredColumnsFromDF would benefit from an explicit return type for consistency and clarity.

Suggested fix
   def filterRequiredColumnsFromDF(df: DataFrame,
                                   requiredColumns: Array[String],
-                                  metaClient: HoodieTableMetaClient) = {
+                                  metaClient: HoodieTableMetaClient): DataFrame = {
     var updatedDF = df
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/util/IncrementalRelationUtil.scala`
around lines 77 - 109, The method filterRequiredColumnsFromDF lacks an explicit
return type; update its signature to declare the return type (DataFrame) — e.g.,
change def filterRequiredColumnsFromDF(df: DataFrame, requiredColumns:
Array[String], metaClient: HoodieTableMetaClient): DataFrame = { ... } — so the
compiler and readers see the intended return type clearly while leaving the
method body unchanged.

82-84: Redundant toDF() calls before drop().

The toDF() calls are unnecessary since drop() already returns a DataFrame. This applies to lines 83, 92, and 103.

Suggested simplification
     // If _hoodie_commit_time is not part of the required columns remove it from the resultant Dataframe
     if (!requiredColumns.contains(HoodieRecord.COMMIT_TIME_METADATA_FIELD)) {
-      updatedDF = updatedDF.toDF().drop(HoodieRecord.COMMIT_TIME_METADATA_FIELD)
+      updatedDF = updatedDF.drop(HoodieRecord.COMMIT_TIME_METADATA_FIELD)
     }

Apply similar changes to lines 92 and 103.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/util/IncrementalRelationUtil.scala`
around lines 82 - 84, Remove the redundant toDF() calls before drop on updatedDF
in IncrementalRelationUtil.scala: instead of updatedDF =
updatedDF.toDF().drop(HoodieRecord.COMMIT_TIME_METADATA_FIELD) (and the similar
lines for HoodieRecord.COMMIT_SEQNO_METADATA_FIELD and
HoodieRecord.RECORD_KEY_METADATA_FIELD), call drop directly on updatedDF
(updatedDF = updatedDF.drop(...)) because drop already returns a DataFrame;
update the three occurrences referencing updatedDF and those HoodieRecord.*
metadata fields.
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestVectorDataSource.scala (1)

1022-1025: Make nested VECTOR error matching less brittle.

Matching the full literal message is fragile; prefer checking stable tokens (for example, "Nested VECTOR columns" and "top-level field").

💡 Suggested refactor
-  private def nestedVectorMessageInCauseChain(ex: Throwable): Boolean =
-    ex != null && (Option(ex.getMessage).exists(_.contains(
-      "VECTOR column 'embedding' must be a top-level field. Nested VECTOR columns (inside STRUCT, ARRAY, or MAP) are not supported."))
-      || nestedVectorMessageInCauseChain(ex.getCause))
+  private def nestedVectorMessageInCauseChain(ex: Throwable): Boolean = {
+    var cause = ex
+    while (cause != null) {
+      val msg = Option(cause.getMessage).getOrElse("")
+      if (msg.contains("Nested VECTOR columns") && msg.contains("top-level field")) {
+        return true
+      }
+      cause = cause.getCause
+    }
+    false
+  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestVectorDataSource.scala`
around lines 1022 - 1025, The current nestedVectorMessageInCauseChain method
matches the full error string which is brittle; update it to look for stable
tokens instead (e.g., check that Option(ex.getMessage).exists(msg =>
msg.contains("Nested VECTOR columns") && msg.contains("top-level field")) )
while preserving the recursive cause traversal
(nestedVectorMessageInCauseChain(ex.getCause)); keep the function name and
behavior but replace the exact-literal contains check with the two-token check
to make matching resilient to minor wording changes.
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/reader/function/TestHoodieSplitReaderFunction.java (1)

389-445: Drop or repurpose these duplicated constructor smoke tests.

testConstructorWithEmitDelete on Line 394 still passes false, and the null-schema/default-constructor cases are already covered earlier in this file. This block adds maintenance noise without increasing coverage.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/reader/function/TestHoodieSplitReaderFunction.java`
around lines 389 - 445, These tests duplicate existing constructor coverage for
HoodieSplitReaderFunction (testConstructorWithNullTableSchemaThrows,
testConstructorWithNullRequiredSchemaThrows, testDefaultConstructor) and add
noise; remove the redundant methods testConstructorWithEmitDelete,
testConstructorWithNullTableSchemaThrows,
testConstructorWithNullRequiredSchemaThrows, and testDefaultConstructor from
this block OR repurpose testConstructorWithEmitDelete to actually exercise
emit-delete behavior by passing true for the emitDelete parameter and asserting
expected behavior/state for emit-delete so the test provides unique coverage.
hudi-client/hudi-client-common/src/test/java/org/apache/hudi/metadata/TestHoodieBackedTableMetadataWriter.java (1)

130-185: Please add coverage for the delegated compactIfNecessary(...) branch too.

These tests exercise delegation for pending services, but the production change also added a second branch inside compactIfNecessary(...) where scheduling proceeds and inline compact() / logCompact() are skipped. A regression there would still pass this suite.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-client/hudi-client-common/src/test/java/org/apache/hudi/metadata/TestHoodieBackedTableMetadataWriter.java`
around lines 130 - 185, Add a unit test that targets the delegated branch inside
HoodieBackedTableMetadataWriter.compactIfNecessary(...) by mocking
BaseHoodieWriteClient and HoodieTableMetaClient so that
shouldDelegateToTableServiceManager(...) returns true for compaction (and/or
logcompaction) while compactIfNecessary is triggered (e.g., set pending instants
via
metaClient.getActiveTimeline().filterPendingCompactionTimeline()/filterPendingLogCompactionTimeline()).
Verify that when delegation is enabled the inline methods
writeClient.compact(...) and writeClient.logCompact(...) are NOT invoked and
that metaClient.reloadActiveTimeline() is called (or not) as appropriate; use
the same pattern as existing tests (mock timelines, set HoodieWriteConfig
TABLE_SERVICE_MANAGER_* props, call compactIfNecessary via
HoodieBackedTableMetadataWriter) to assert the skip behavior and timeline reload
for the delegated branch.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java`:
- Around line 635-637: Rework the clustering heartbeat lifecycle in
BaseHoodieTableServiceClient so the executor (the code path that actually starts
cluster execution) claims/starts or refreshes the heartbeat when execution
begins (e.g., at the start of cluster()) rather than when the plan is merely
scheduled; ensure heartbeatClient.stop(...) is executed in a failure-safe path
(finally/failure handling) so stopped on both success and error to avoid orphan
heartbeats blocking expiration; update the symmetric heartbeat handling location
in this class as well (the other clustering heartbeat block) to follow the same
claim/start-at-execution and guaranteed-stop pattern, and keep using
config.isExpirationOfClusteringEnabled() to gate the behavior.
- Around line 1210-1215: The age calculation in hasInstantExpired() uses the
table timeline timezone for currentTimeMs but parses instantTime with a
different/default timezone; update hasInstantExpired() to parse/convert
instantTime using the same ZoneId from
metaClient.getTableConfig().getTimelineTimezone() so currentTimeMs and
instantTimeMs are computed in the same timezone. Locate hasInstantExpired(),
replace the call to
HoodieInstantTimeGenerator.parseDateFromInstantTime(instantTime) with the
overload or conversion that accepts/uses the ZoneId
(metaClient.getTableConfig().getTimelineTimezone().getZoneId()), and ensure both
timestamps are derived using that zone before computing ageMs.

In
`@hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieLockConfig.java`:
- Around line 278-317: getLockConfigForBuiltInLockProvider builds lockProps only
from lock-prefixed keys, causing inferred values like ZK_LOCK_KEY (which
defaults from HoodieWriteConfig.TBL_NAME) to be lost; fix by copying any needed
non-prefixed inferred entries into lockProps before building the
HoodieLockConfig. Specifically, in getLockConfigForBuiltInLockProvider (use
dataProps and lockProps), after the provider-specific copyPropsWithPrefix(...)
call but before newBuilder().fromProperties(...).build(), check for keys that
are inferred by providers (e.g., ZK_LOCK_KEY and any other provider-specific
inferred keys) and if absent in lockProps, populate them from
writeConfig.getProps()/HoodieWriteConfig.TBL_NAME or compute the same default
the provider expects so the rebuilt HoodieLockConfig preserves inferred inputs.

In
`@hudi-client/hudi-client-common/src/main/java/org/apache/hudi/util/DistributedRegistryUtil.java`:
- Around line 39-54: The helper createWrapperFileSystemRegistries only calls
HoodieWrapperFileSystem.setMetricsRegistry(...) when config.isMetricsOn() is
true, leaving process-wide registries from a previous table intact when metrics
are off; modify createWrapperFileSystemRegistries so that in the metrics-off
branch you explicitly clear or replace the registries (e.g., by setting
neutral/no-op or null registries) via
HoodieWrapperFileSystem.setMetricsRegistry(...) so the global state is reset for
the current context; update the branch that currently calls
Registry.getRegistryOfClass(...) (or the else path) to call
HoodieWrapperFileSystem.setMetricsRegistry(...) with the appropriate
cleared/empty Registry instances to ensure metrics isolation across
tables/executors.

In
`@hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/common/HoodieSparkEngineContext.java`:
- Around line 96-97: DISTRIBUTED_REGISTRY_MAP is currently keyed only by
tableName + "." + registryName which is JVM-global and causes
DistributedRegistry instances to be reused across different Spark applications;
change the cache key to include the Spark application identifier (e.g.,
SparkContext.applicationId or SparkEnv.get.conf / SparkContext.hashCode when
applicationId is unavailable) so each Spark application gets its own
DistributedRegistry; update code paths that construct keys (references:
DISTRIBUTED_REGISTRY_MAP, DistributedRegistry, and register(javaSparkContext))
to build and use the new composite key (applicationId + "." + tableName + "." +
registryName) and ensure entries are looked up/inserted with that composite key
in all places including the other usages around lines 282-289.

In `@hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java`:
- Around line 424-427: The current checks only inspect the immediate non-null
type and validateNoVectorInNestedRecord() only traverses RECORDs, so VECTOR can
still be hidden inside UNION/ARRAY/MAP (e.g., ARRAY<UNION<VECTOR,INT>>). Add a
recursive type-walking check (e.g., containsVectorInType or
containsVectorInNestedType) and call it from the places that currently only
check elementSchema.getNonNullType() (the ARRAY element check around
elementSchema, the MAP value/key checks, and the RECORD child validation
locations referenced) so any VECTOR nested under UNION, ARRAY, MAP, or RECORD
triggers a HoodieSchemaException with the same "VECTOR columns must be top-level
fields" message; update validateNoVectorInNestedRecord() to use this helper
instead of only walking child RECORDs and reuse the helper in the other two
spots (lines around 445-448 and 484-515).

In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/reader/HoodieSourceReader.java`:
- Around line 50-52: The current code creates a RecordLimiter per
HoodieSourceReader from scanContext.getLimit(), which applies the full limit per
subtask and overproduces when parallel; fix by computing a per-subtask quota and
passing that to HoodieSourceSplitReader instead of the full limit: if
scanContext.getLimit() == RecordLimiter.NO_LIMIT keep Option.empty(), otherwise
compute int parallelism = context.getNumberOfParallelSubtasks() (or equivalent
method on context), compute perSubtaskLimit = Math.max(1,
(scanContext.getLimit() + parallelism - 1) / parallelism) (or use floor/ceil
policy you prefer), and create Option.of(new RecordLimiter(perSubtaskLimit))
when constructing HoodieSourceSplitReader; update the constructor call in
HoodieSourceReader accordingly so split readers get the divided limit rather
than the global limit.

In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/reader/RecordLimiter.java`:
- Around line 53-69: The current wrap() implementation in
RecordsWithSplitIds.nextRecordFromSplit uses separate totalReadCount.get() >=
limit and totalReadCount.incrementAndGet() calls which can race and overshoot
the limit and also treat NO_LIMIT (-1) as exhausted; change the logic to treat
negative limit as unlimited and use an atomic combine-and-check (e.g., call
totalReadCount.incrementAndGet() or getAndIncrement() and base the decision on
the returned value) so a thread only returns a record when the atomic result
stays within the limit, otherwise discard the record and return null; update the
wrapped method in wrap() (reference: RecordsWithSplitIds.nextRecordFromSplit,
totalReadCount, limit, wrap) accordingly.

In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/split/assign/HoodieSplitBucketAssigner.java`:
- Around line 44-50: Add a legacy overload public HoodieSplitBucketAssigner(int
parallelism) that preserves the old public API and delegates to the new
constructor; implement it to build a legacy-compatible Configuration (e.g., a
default Configuration populated with legacy/default
BUCKET_INDEX_PARTITION_EXPRESSIONS, BUCKET_INDEX_PARTITION_RULE, and
BUCKET_INDEX_NUM_BUCKETS) and call this(parallelism, legacyConf). Update
references to the new fields (numBucketsFunction and partitionIndexFunc) remain
unchanged; ensure the legacy ctor still validates parallelism the same way
before delegation.

In
`@hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/reader/function/TestHoodieCdcSplitReaderFunction.java`:
- Around line 168-219: The tests currently only call the same 6-arg constructor
and never exercise the limit logic; update the tests to invoke the 7-arg
constructor of HoodieCdcSplitReaderFunction with explicit limit values
(positive, zero, and -1 sentinel) and assert the limit behavior: for example,
construct HoodieCdcSplitReaderFunction(..., limit=5),
HoodieCdcSplitReaderFunction(..., limit=0) and HoodieCdcSplitReaderFunction(...,
limit=-1) and verify the class uses the provided limit by either asserting a
public/package-private getter (add getLimit() if missing) or, if no getter
exists, read the private field 'limit' via reflection or assert observable
behavior (e.g., that record iteration is wrapped/truncated when limit>0 and not
wrapped for limit<=0); update the test methods (testConstructorWithLimit,
testConstructorWithLimitAndEmptyFieldTypes,
testDefaultConstructorUsesNoLimitSentinel,
testConstructorWithLimitZeroIsAccepted) to perform these concrete assertions
instead of only assertNotNull.

In
`@hudi-hadoop-common/src/main/java/org/apache/hudi/common/util/ParquetReaderIterator.java`:
- Around line 48-65: Ensure the iterator's closed flag is set on every teardown
path before attempting to close the underlying reader: in
ParquetReaderIterator.hasNext() and any exception handlers set closed = true
prior to calling parquetReader.close() (or
FileIOUtils.closeQuietly(parquetReader)), so that failures during read() or
close() cannot leave closed=false; update all places referencing
parquetReader.close(), FileIOUtils.closeQuietly(parquetReader), or throwing
HoodieException to flip the closed field first (use the existing closed field,
parquetReader, hasNext(), next and close() symbols) and then attempt to close
the reader.

In `@hudi-io/src/main/java/org/apache/hudi/common/metrics/Registry.java`:
- Around line 105-109: When an existing registry's implementation class doesn't
match the requested clazz, fail fast instead of logging and returning the wrong
instance: replace the LOG.error(...) branch in the Registry lookup (the code
that inspects registry, registryName and clazz) with a thrown exception (e.g.,
throw new IllegalStateException or IllegalArgumentException) that includes
registryName, registry.getClass().getName(), and the requested clazz so callers
cannot continue with an incompatible registry implementation.
- Around line 152-155: setRegistries is re-keying table-scoped registries
incorrectly: when iterating registries it uses makeKey("", registry.getName())
which produces "::"+registry.getName(), but getRegistryOfClass expects keys
created by makeKey(tableName, registryName) (tableName+"::"+registryName) and
registry.getName() already contains "tableName.registryName"; fix setRegistries
by parsing registry.getName() into its tableName and registryName parts (or by
calling the same key-construction logic used by getRegistryOfClass) and insert
into REGISTRY_MAP with makeKey(tableName, registryName) so lookups by
getRegistryOfClass find the propagated instance (update code in setRegistries to
derive tableName and registryName from registry.getName() or use registry
methods that return them).

In
`@hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java`:
- Around line 332-350: The current rollbackFailedClusteringForPartitions method
aborts on the first ineligible instant by throwing from the
BaseHoodieTableServiceClient.isClusteringInstantEligibleForRollback check,
causing partial rollback; change this to skip ineligible instants instead of
throwing: for each instant from getPendingClusteringInstantsForPartitions, call
isClusteringInstantEligibleForRollback and if it returns false, log a
warning/info referencing the instant.requestedTime() and continue to the next
instant (optionally collect skipped instants for metrics), otherwise proceed
with metaClient.reloadActiveTimeline(), re-check containsInstant and call
client.rollback(instant.requestedTime()) with LOG.info around successful
rollbacks; do not throw during per-instant processing so other eligible instants
can still be cleaned up.

---

Outside diff comments:
In
`@hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java`:
- Around line 1352-1359: build() lets property-based builders bypass validation
of table.service.manager.actions because only withTableServiceManagerActions()
invokes validateTableServiceManagerActions(...); fix by invoking
validateTableServiceManagerActions(...) in HoodieMetadataConfig.build() for any
actions coming from properties/file defaults as well—retrieve the configured
table.service.manager.actions value (the same key used by
withTableServiceManagerActions), call
validateTableServiceManagerActions(actions) before returning metadataConfig, and
ensure the validation path covers values supplied via fromFile(),
fromProperties(), and withProperties(), not just the fluent
withTableServiceManagerActions() setter.

---

Minor comments:
In
`@hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieClusteringConfig.java`:
- Around line 279-285: The EXPIRATION_THRESHOLD_MINS ConfigProperty currently
allows negative values; update HoodieClusteringConfig.Builder.validate() to
ensure the value retrieved from EXPIRATION_THRESHOLD_MINS is non-negative and
throw a clear IllegalArgumentException (or similar) if it is negative. Locate
the check in Builder.validate(), read the long via
EXPIRATION_THRESHOLD_MINS.get(config) (or equivalent accessor), and add a guard
like if (value < 0) throw new
IllegalArgumentException("hoodie.clustering.expiration.threshold.mins must be
non-negative: " + value) so invalid configs are rejected early.

In
`@hudi-client/hudi-client-common/src/test/java/org/apache/hudi/metadata/TestHoodieMetadataWriteUtils.java`:
- Around line 163-183: The test
testCreateMetadataWriteConfigForcesStreamingWritesOffWithMultiWriter is missing
an explicit assertion that streaming writes are disabled on the metadata config;
after obtaining metadataWriteConfig from
HoodieMetadataWriteUtils.createMetadataWriteConfig, add an assertion that the
metadataWriteConfig's metadata config streaming flag is false (e.g.,
assertFalse(metadataWriteConfig.getMetadataConfig().isStreamingWriteEnabled())
or equivalent) so the test fails if STREAMING_WRITE_ENABLED regresses to true.

In
`@hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/reader/function/TestAbstractSplitReaderFunction.java`:
- Around line 213-245: The tests testTwoInstancesHaveStableIndependentHadoopConf
and testTwoInstancesHaveStableIndependentWriteConfig currently only assert
per-instance stability; update each to also assert the two instances do not
share the same object by adding an assertNotSame between fn1.hadoopConfForTest()
and fn2.hadoopConfForTest() in the Hadoop config test, and an assertNotSame
between fn1.writeConfigForTest() and fn2.writeConfigForTest() in the write
config test so that hadoopConfForTest and writeConfigForTest prove
cross-instance independence as well as stability.

In
`@hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/split/assign/TestHoodieSplitBucketAssigner.java`:
- Around line 177-194: The test
testDifferentNumBucketsProduceDifferentAssignments currently can pass even when
assignments are identical because "country=DE" may produce the same task id
after modulo; update the test so it actually proves different assignments by
choosing a partition value with a known hashCode behavior or by explicitly
asserting the two assignments differ. Concretely, keep the existing calls to new
HoodieSplitBucketAssigner(parallelism, conf4).assign(split(...)) and conf8, then
add an assertion like assertNotEquals(taskWith4, taskWith8) (or replace the
partition with a literal whose hashCode % parallelism yields different
expectedTaskId values) in addition to the existing expectedTaskId checks;
reference the test method testDifferentNumBucketsProduceDifferentAssignments,
the HoodieSplitBucketAssigner constructor, the assign(...) call, the split(...)
helper, and expectedTaskId(...) when making the change.

In
`@hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/split/TestDefaultHoodieSplitProvider.java`:
- Around line 567-568: The test relies on the default for
FlinkOptions.BUCKET_INDEX_NUM_BUCKETS which makes it brittle; instead create a
Configuration, set FlinkOptions.BUCKET_INDEX_NUM_BUCKETS explicitly (e.g. to 4)
and pass that Configuration into the HoodieSplitBucketAssigner constructor
(HoodieSplitBucketAssigner) so the bucket→task mapping is stable regardless of
global defaults.

In
`@hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/ITTestHoodieDataSource.java`:
- Around line 3244-3247: Update the assertion for the query result stored in
result1 to verify the filtered row's partition value instead of only its count:
after executing tableEnv.sqlQuery("select * from t1 where `partition` = 'par1'
limit 1").execute().collect() and converting to result1, assert that result1 has
size 1 and that the single Row in result1 has its `partition` field equal to
"par1" (or assert the expected id value for that partition), so the test
verifies filter+limit push-down rather than just row count.

In
`@hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/IncrementalRelationV1.scala`:
- Around line 308-309: Remove the duplicate log.info call that prints
"Additional Filters to be applied..." after calling
IncrementalRelationUtil.filterRequiredColumnsFromDF; keep the earlier log
emitted at the original location (around the previous log at line ~244).
Specifically, delete the log.info(...) invocation that uses
filters.mkString("Array(", ", ", ")") in IncrementalRelationV1.scala (the one
immediately following the scanDf =
IncrementalRelationUtil.filterRequiredColumnsFromDF(...) line) so the message is
logged only once.

---

Nitpick comments:
In
`@hudi-client/hudi-client-common/src/test/java/org/apache/hudi/metadata/TestHoodieBackedTableMetadataWriter.java`:
- Around line 130-185: Add a unit test that targets the delegated branch inside
HoodieBackedTableMetadataWriter.compactIfNecessary(...) by mocking
BaseHoodieWriteClient and HoodieTableMetaClient so that
shouldDelegateToTableServiceManager(...) returns true for compaction (and/or
logcompaction) while compactIfNecessary is triggered (e.g., set pending instants
via
metaClient.getActiveTimeline().filterPendingCompactionTimeline()/filterPendingLogCompactionTimeline()).
Verify that when delegation is enabled the inline methods
writeClient.compact(...) and writeClient.logCompact(...) are NOT invoked and
that metaClient.reloadActiveTimeline() is called (or not) as appropriate; use
the same pattern as existing tests (mock timelines, set HoodieWriteConfig
TABLE_SERVICE_MANAGER_* props, call compactIfNecessary via
HoodieBackedTableMetadataWriter) to assert the skip behavior and timeline reload
for the delegated branch.

In
`@hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/reader/function/TestHoodieSplitReaderFunction.java`:
- Around line 389-445: These tests duplicate existing constructor coverage for
HoodieSplitReaderFunction (testConstructorWithNullTableSchemaThrows,
testConstructorWithNullRequiredSchemaThrows, testDefaultConstructor) and add
noise; remove the redundant methods testConstructorWithEmitDelete,
testConstructorWithNullTableSchemaThrows,
testConstructorWithNullRequiredSchemaThrows, and testDefaultConstructor from
this block OR repurpose testConstructorWithEmitDelete to actually exercise
emit-delete behavior by passing true for the emitDelete parameter and asserting
expected behavior/state for emit-delete so the test provides unique coverage.

In
`@hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/util/IncrementalRelationUtil.scala`:
- Around line 28-75: Add an explicit return type to the getPrunedSchema method
signature: declare it as returning org.apache.spark.sql.types.StructType (or
simply StructType if the import is present) so the signature becomes def
getPrunedSchema(...): StructType = { ... }; leave the implementation
unchanged—only update the method declaration to include the return type for
clarity.
- Around line 77-109: The method filterRequiredColumnsFromDF lacks an explicit
return type; update its signature to declare the return type (DataFrame) — e.g.,
change def filterRequiredColumnsFromDF(df: DataFrame, requiredColumns:
Array[String], metaClient: HoodieTableMetaClient): DataFrame = { ... } — so the
compiler and readers see the intended return type clearly while leaving the
method body unchanged.
- Around line 82-84: Remove the redundant toDF() calls before drop on updatedDF
in IncrementalRelationUtil.scala: instead of updatedDF =
updatedDF.toDF().drop(HoodieRecord.COMMIT_TIME_METADATA_FIELD) (and the similar
lines for HoodieRecord.COMMIT_SEQNO_METADATA_FIELD and
HoodieRecord.RECORD_KEY_METADATA_FIELD), call drop directly on updatedDF
(updatedDF = updatedDF.drop(...)) because drop already returns a DataFrame;
update the three occurrences referencing updatedDF and those HoodieRecord.*
metadata fields.

In
`@hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestCOWDataSource.scala`:
- Around line 2605-2617: Add explicit assertions for the _hoodie_partition_path
column in the TestCOWDataSource regression check: after collecting incrementalDf
into results and before or alongside the existing asserts on "_row_key" and
"data", assert that results(0).getAs[String]("_hoodie_partition_path"),
results(1).getAs[String]("_hoodie_partition_path"), and
results(2).getAs[String]("_hoodie_partition_path") equal the expected partition
values for row1, row2, and row3 respectively; update the test near the
incrementalDf.select(...).collect() block (references: incrementalDf, results)
to include these checks so duplicate-meta-field behavior is validated
end-to-end.

In
`@hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestVectorDataSource.scala`:
- Around line 1022-1025: The current nestedVectorMessageInCauseChain method
matches the full error string which is brittle; update it to look for stable
tokens instead (e.g., check that Option(ex.getMessage).exists(msg =>
msg.contains("Nested VECTOR columns") && msg.contains("top-level field")) )
while preserving the recursive cause traversal
(nestedVectorMessageInCauseChain(ex.getCause)); keep the function name and
behavior but replace the exact-literal contains check with the two-token check
to make matching resilient to minor wording changes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 77107fc0-3ad3-4e3d-be84-bb7c0d616f5e

📥 Commits

Reviewing files that changed from the base of the PR and between 35e2bbf and 85f11d9.

📒 Files selected for processing (68)
  • hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieClient.java
  • hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java
  • hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieClusteringConfig.java
  • hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieLockConfig.java
  • hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
  • hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
  • hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriterTableVersionSix.java
  • hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java
  • hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/ScheduleCompactionActionExecutor.java
  • hudi-client/hudi-client-common/src/main/java/org/apache/hudi/util/DistributedRegistryUtil.java
  • hudi-client/hudi-client-common/src/test/java/org/apache/hudi/client/TestBaseHoodieTableServiceClient.java
  • hudi-client/hudi-client-common/src/test/java/org/apache/hudi/config/TestHoodieLockConfig.java
  • hudi-client/hudi-client-common/src/test/java/org/apache/hudi/config/TestHoodieWriteConfig.java
  • hudi-client/hudi-client-common/src/test/java/org/apache/hudi/metadata/TestHoodieBackedTableMetadataWriter.java
  • hudi-client/hudi-client-common/src/test/java/org/apache/hudi/metadata/TestHoodieMetadataWriteUtils.java
  • hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/SparkRDDWriteClient.java
  • hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/common/HoodieSparkEngineContext.java
  • hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/metrics/DistributedRegistry.java
  • hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/avro/HoodieSparkSchemaConverters.scala
  • hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/metrics/TestDistributedRegistry.java
  • hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java
  • hudi-common/src/main/java/org/apache/hudi/common/engine/HoodieEngineContext.java
  • hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java
  • hudi-common/src/main/java/org/apache/hudi/common/util/CompactionUtils.java
  • hudi-common/src/test/java/org/apache/hudi/common/config/TestHoodieMetadataConfig.java
  • hudi-common/src/test/java/org/apache/hudi/common/schema/TestHoodieSchema.java
  • hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/HoodieScanContext.java
  • hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/HoodieSource.java
  • hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/reader/HoodieSourceReader.java
  • hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/reader/HoodieSourceSplitReader.java
  • hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/reader/RecordLimiter.java
  • hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/reader/function/AbstractSplitReaderFunction.java
  • hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/reader/function/HoodieCdcSplitReaderFunction.java
  • hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/reader/function/HoodieSplitReaderFunction.java
  • hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/split/assign/HoodieSplitAssigners.java
  • hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/split/assign/HoodieSplitBucketAssigner.java
  • hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSource.java
  • hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/TestWriteCopyOnWrite.java
  • hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/reader/TestHoodieSourceSplitReader.java
  • hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/reader/TestRecordLimiter.java
  • hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/reader/function/TestAbstractSplitReaderFunction.java
  • hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/reader/function/TestHoodieCdcSplitReaderFunction.java
  • hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/reader/function/TestHoodieSplitReaderFunction.java
  • hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/split/TestDefaultHoodieSplitProvider.java
  • hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/split/assign/TestHoodieSplitBucketAssigner.java
  • hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/ITTestHoodieDataSource.java
  • hudi-hadoop-common/src/main/java/org/apache/hudi/common/util/ParquetReaderIterator.java
  • hudi-hadoop-common/src/main/java/org/apache/hudi/hadoop/fs/HoodieWrapperFileSystem.java
  • hudi-hadoop-common/src/test/java/org/apache/hudi/common/util/TestCompactionUtils.java
  • hudi-hadoop-common/src/test/java/org/apache/hudi/common/util/TestParquetReaderIterator.java
  • hudi-io/src/main/java/org/apache/hudi/common/metrics/LocalRegistry.java
  • hudi-io/src/main/java/org/apache/hudi/common/metrics/Registry.java
  • hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala
  • hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/IncrementalRelationV1.scala
  • hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/IncrementalRelationV2.scala
  • hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/util/IncrementalRelationUtil.scala
  • hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/HoodieTableChanges.scala
  • hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/HoodieSqlCommonUtils.scala
  • hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/streaming/HoodieStreamSourceV1.scala
  • hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/streaming/HoodieStreamSourceV2.scala
  • hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestCOWDataSource.scala
  • hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestIncrementalQueryColumnPruning.scala
  • hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala
  • hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestVectorDataSource.scala
  • hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/avro/TestSchemaConverters.scala
  • hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/common/TestInstantTimeValidation.scala
  • hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java
  • hudi-utilities/src/test/java/org/apache/hudi/utilities/offlinejob/TestHoodieClusteringJob.java
💤 Files with no reviewable changes (1)
  • hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieClient.java

Comment on lines +635 to +637
if (config.isExpirationOfClusteringEnabled()) {
heartbeatClient.stop(clusteringCommitTime);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Rework the clustering heartbeat lifecycle so the executor owns it.

The heartbeat now starts when the plan is scheduled, but this class never reclaims/refreshes it when cluster() actually begins, and stop() only runs on the success path. That makes queued/delegated plans look abandoned before a healthy worker starts them, while failed executions can leave the old heartbeat alive indefinitely and block expiration-based rollback. Mirror the compaction flow: claim/start the heartbeat when execution begins and stop it in a failure-safe path as well.

Also applies to: 739-742

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java`
around lines 635 - 637, Rework the clustering heartbeat lifecycle in
BaseHoodieTableServiceClient so the executor (the code path that actually starts
cluster execution) claims/starts or refreshes the heartbeat when execution
begins (e.g., at the start of cluster()) rather than when the plan is merely
scheduled; ensure heartbeatClient.stop(...) is executed in a failure-safe path
(finally/failure handling) so stopped on both success and error to avoid orphan
heartbeats blocking expiration; update the symmetric heartbeat handling location
in this class as well (the other clustering heartbeat block) to follow the same
claim/start-at-execution and guaranteed-stop pattern, and keep using
config.isExpirationOfClusteringEnabled() to gate the behavior.

Comment on lines +1210 to +1215
private static boolean hasInstantExpired(HoodieTableMetaClient metaClient, String instantTime, long expirationMins) throws ParseException {
ZoneId zoneId = metaClient.getTableConfig().getTimelineTimezone().getZoneId();
long currentTimeMs = ZonedDateTime.ofInstant(java.time.Instant.now(), zoneId).toInstant().toEpochMilli();
long instantTimeMs = HoodieInstantTimeGenerator.parseDateFromInstantTime(instantTime).toInstant().toEpochMilli();
long ageMs = currentTimeMs - instantTimeMs;
return ageMs >= TimeUnit.MINUTES.toMillis(expirationMins);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -e

echo "== HoodieInstantTimeGenerator definitions =="
fd -i 'HoodieInstantTimeGenerator.java' --exec sed -n '1,260p' {}

echo
echo "== parseDateFromInstantTime call sites =="
rg -n --type=java 'parseDateFromInstantTime\s*\('

echo
echo "== timeline timezone usage =="
rg -n --type=java 'getTimelineTimezone\s*\('

Repository: yihua/hudi

Length of output: 21136


Fix timezone mismatch in hasInstantExpired() instant age calculation.

The current implementation uses the table's configured timeline timezone to compute currentTimeMs but hardcodes ZoneId.systemDefault() when parsing the instant time via parseDateFromInstantTime(). If these timezones differ, the age calculation produces incorrect results. Align the instant time parsing with the table's timeline timezone:

Original code snippet
private static boolean hasInstantExpired(HoodieTableMetaClient metaClient, String instantTime, long expirationMins) throws ParseException {
  ZoneId zoneId = metaClient.getTableConfig().getTimelineTimezone().getZoneId();
  long currentTimeMs = ZonedDateTime.ofInstant(java.time.Instant.now(), zoneId).toInstant().toEpochMilli();
  long instantTimeMs = HoodieInstantTimeGenerator.parseDateFromInstantTime(instantTime).toInstant().toEpochMilli();
  long ageMs = currentTimeMs - instantTimeMs;
  return ageMs >= TimeUnit.MINUTES.toMillis(expirationMins);
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java`
around lines 1210 - 1215, The age calculation in hasInstantExpired() uses the
table timeline timezone for currentTimeMs but parses instantTime with a
different/default timezone; update hasInstantExpired() to parse/convert
instantTime using the same ZoneId from
metaClient.getTableConfig().getTimelineTimezone() so currentTimeMs and
instantTimeMs are computed in the same timezone. Locate hasInstantExpired(),
replace the call to
HoodieInstantTimeGenerator.parseDateFromInstantTime(instantTime) with the
overload or conversion that accepts/uses the ZoneId
(metaClient.getTableConfig().getTimelineTimezone().getZoneId()), and ensure both
timestamps are derived using that zone before computing ageMs.

Comment on lines +39 to +54
public static void createWrapperFileSystemRegistries(HoodieEngineContext context, HoodieWriteConfig config) {
if (config.isMetricsOn()) {
Registry registry;
Registry registryMeta;
if (config.isExecutorMetricsEnabled()) {
// Create and set distributed registry for HoodieWrapperFileSystem
registry = context.getMetricRegistry(config.getTableName(), HoodieWrapperFileSystem.REGISTRY_NAME);
registryMeta = context.getMetricRegistry(config.getTableName(), HoodieWrapperFileSystem.REGISTRY_META_NAME);
} else {
registry = Registry.getRegistryOfClass(config.getTableName(), HoodieWrapperFileSystem.REGISTRY_NAME,
Registry.getRegistry(HoodieWrapperFileSystem.REGISTRY_NAME).getClass().getName());
registryMeta = Registry.getRegistryOfClass(config.getTableName(), HoodieWrapperFileSystem.REGISTRY_META_NAME,
Registry.getRegistry(HoodieWrapperFileSystem.REGISTRY_META_NAME).getClass().getName());
}
HoodieWrapperFileSystem.setMetricsRegistry(registry, registryMeta);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reset wrapper-FS registries in the metrics-off path.

HoodieWrapperFileSystem.setMetricsRegistry(...) updates process-wide state, but this helper only touches it when metrics are enabled. A client initialized after a metrics-enabled table will otherwise keep using the previous table's registries, which breaks metrics isolation across tables/executors. Please clear or replace the registries explicitly when config.isMetricsOn() is false.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-client/hudi-client-common/src/main/java/org/apache/hudi/util/DistributedRegistryUtil.java`
around lines 39 - 54, The helper createWrapperFileSystemRegistries only calls
HoodieWrapperFileSystem.setMetricsRegistry(...) when config.isMetricsOn() is
true, leaving process-wide registries from a previous table intact when metrics
are off; modify createWrapperFileSystemRegistries so that in the metrics-off
branch you explicitly clear or replace the registries (e.g., by setting
neutral/no-op or null registries) via
HoodieWrapperFileSystem.setMetricsRegistry(...) so the global state is reset for
the current context; update the branch that currently calls
Registry.getRegistryOfClass(...) (or the else path) to call
HoodieWrapperFileSystem.setMetricsRegistry(...) with the appropriate
cleared/empty Registry instances to ensure metrics isolation across
tables/executors.

Comment on lines +96 to +97
private static final Map<String, Registry> DISTRIBUTED_REGISTRY_MAP = new ConcurrentHashMap<>();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Scope the distributed-registry cache to the Spark application, not just the table name.

DISTRIBUTED_REGISTRY_MAP is JVM-global, and the cache key is only tableName + "." + registryName. If two Spark contexts in the same JVM use the same logical table name, the second one reuses the first DistributedRegistry and never runs register(javaSparkContext) for its own context. That can leak metrics across tenants/apps and bind later jobs to a stale registry instance.

💡 Suggested direction
-  private static final Map<String, Registry> DISTRIBUTED_REGISTRY_MAP = new ConcurrentHashMap<>();
+  private final Map<String, Registry> distributedRegistryMap = new ConcurrentHashMap<>();
...
   `@Override`
   public Registry getMetricRegistry(String tableName, String registryName) {
-    final String prefixedName = tableName.isEmpty() ? registryName : tableName + "." + registryName;
-    return DISTRIBUTED_REGISTRY_MAP.computeIfAbsent(prefixedName, key -> {
+    final String key = String.join("::",
+        javaSparkContext.sc().applicationId(),
+        tableName,
+        registryName);
+    return distributedRegistryMap.computeIfAbsent(key, unused -> {
       Registry registry = Registry.getRegistryOfClass(tableName, registryName, DistributedRegistry.class.getName());
       ((DistributedRegistry) registry).register(javaSparkContext);
       return registry;
     });
   }

Also applies to: 282-289

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/common/HoodieSparkEngineContext.java`
around lines 96 - 97, DISTRIBUTED_REGISTRY_MAP is currently keyed only by
tableName + "." + registryName which is JVM-global and causes
DistributedRegistry instances to be reused across different Spark applications;
change the cache key to include the Spark application identifier (e.g.,
SparkContext.applicationId or SparkEnv.get.conf / SparkContext.hashCode when
applicationId is unavailable) so each Spark application gets its own
DistributedRegistry; update code paths that construct keys (references:
DISTRIBUTED_REGISTRY_MAP, DistributedRegistry, and register(javaSparkContext))
to build and use the new composite key (applicationId + "." + tableName + "." +
registryName) and ensure entries are looked up/inserted with that composite key
in all places including the other usages around lines 282-289.

Comment on lines +168 to +219
// -------------------------------------------------------------------------
// Limit push-down constructor tests
// -------------------------------------------------------------------------

@Test
public void testConstructorWithLimit() {
HoodieCdcSplitReaderFunction function = new HoodieCdcSplitReaderFunction(
conf,
tableState,
internalSchemaManager,
ROW_DATA_TYPE.getChildren(),
Collections.emptyList(),
false);

assertNotNull(function);
}

@Test
public void testConstructorWithLimitAndEmptyFieldTypes() {
HoodieCdcSplitReaderFunction function = new HoodieCdcSplitReaderFunction(
conf,
tableState,
internalSchemaManager,
Collections.emptyList(),
Collections.emptyList(),
false);

assertNotNull(function);
}

@Test
public void testDefaultConstructorUsesNoLimitSentinel() {
// 6-arg constructor must delegate to 7-arg with limit=-1, both should succeed.
HoodieCdcSplitReaderFunction defaultLimit = new HoodieCdcSplitReaderFunction(
conf, tableState, internalSchemaManager,
ROW_DATA_TYPE.getChildren(), Collections.emptyList(), false);
HoodieCdcSplitReaderFunction explicitNoLimit = new HoodieCdcSplitReaderFunction(
conf, tableState, internalSchemaManager,
ROW_DATA_TYPE.getChildren(), Collections.emptyList(), false);

assertNotNull(defaultLimit);
assertNotNull(explicitNoLimit);
}

@Test
public void testConstructorWithLimitZeroIsAccepted() {
// limit=0 is a valid constructor argument (limitIterator won't wrap since limit <= 0).
HoodieCdcSplitReaderFunction function = new HoodieCdcSplitReaderFunction(
conf, tableState, internalSchemaManager,
ROW_DATA_TYPE.getChildren(), Collections.emptyList(), false);
assertNotNull(function);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

These “limit” tests never exercise a limit.

Every new test in this block still calls the same constructor shape and only asserts non-null. That means Lines 173-218 do not verify the limit value, the zero-limit case, or the default-sentinel delegation described in the names/comments, so this suite will not catch regressions in the new limit-pushdown behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/reader/function/TestHoodieCdcSplitReaderFunction.java`
around lines 168 - 219, The tests currently only call the same 6-arg constructor
and never exercise the limit logic; update the tests to invoke the 7-arg
constructor of HoodieCdcSplitReaderFunction with explicit limit values
(positive, zero, and -1 sentinel) and assert the limit behavior: for example,
construct HoodieCdcSplitReaderFunction(..., limit=5),
HoodieCdcSplitReaderFunction(..., limit=0) and HoodieCdcSplitReaderFunction(...,
limit=-1) and verify the class uses the provided limit by either asserting a
public/package-private getter (add getLimit() if missing) or, if no getter
exists, read the private field 'limit' via reflection or assert observable
behavior (e.g., that record iteration is wrapped/truncated when limit>0 and not
wrapped for limit<=0); update the test methods (testConstructorWithLimit,
testConstructorWithLimitAndEmptyFieldTypes,
testDefaultConstructorUsesNoLimitSentinel,
testConstructorWithLimitZeroIsAccepted) to perform these concrete assertions
instead of only assertNotNull.

Comment on lines 48 to 65
public boolean hasNext() {
try {
// Short-circuit if already closed
if (closed) {
return false;
}
// To handle when hasNext() is called multiple times for idempotency and/or the first time
if (this.next == null) {
this.next = parquetReader.read();
}
return this.next != null;
boolean hasNextRecord = this.next != null;
if (!hasNextRecord) {
close();
}
return hasNextRecord;
} catch (Exception e) {
FileIOUtils.closeQuietly(parquetReader);
throw new HoodieException("unable to read next record from parquet file ", e);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep closed consistent on all teardown paths.

Lines 64-65 and 82-83 still close the raw ParquetReader directly, while Line 93 flips closed only after parquetReader.close() succeeds. If read() or close() throws, the iterator state still says “open”, so later hasNext() / next() calls can retry against a reader that is already being torn down.

♻️ Suggested fix
   `@Override`
   public boolean hasNext() {
+    if (closed) {
+      return false;
+    }
     try {
-      // Short-circuit if already closed
-      if (closed) {
-        return false;
-      }
       // To handle when hasNext() is called multiple times for idempotency and/or the first time
       if (this.next == null) {
         this.next = parquetReader.read();
@@
       }
       return hasNextRecord;
     } catch (Exception e) {
+      closed = true;
       FileIOUtils.closeQuietly(parquetReader);
       throw new HoodieException("unable to read next record from parquet file ", e);
     }
   }
@@
     } catch (Exception e) {
+      closed = true;
       FileIOUtils.closeQuietly(parquetReader);
       throw new HoodieException("unable to read next record from parquet file ", e);
     }
   }

   public void close() {
     if (closed) {
       return;
     }
+    closed = true;
     try {
       parquetReader.close();
-      closed = true;
     } catch (IOException e) {
       throw new HoodieException("Exception while closing the parquet reader", e);
     }
   }

Also applies to: 69-95

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-hadoop-common/src/main/java/org/apache/hudi/common/util/ParquetReaderIterator.java`
around lines 48 - 65, Ensure the iterator's closed flag is set on every teardown
path before attempting to close the underlying reader: in
ParquetReaderIterator.hasNext() and any exception handlers set closed = true
prior to calling parquetReader.close() (or
FileIOUtils.closeQuietly(parquetReader)), so that failures during read() or
close() cannot leave closed=false; update all places referencing
parquetReader.close(), FileIOUtils.closeQuietly(parquetReader), or throwing
HoodieException to flip the closed field first (use the existing closed field,
parquetReader, hasNext(), next and close() symbols) and then attempt to close
the reader.

Comment on lines +105 to +109
if (!registry.getClass().getName().equals(clazz)) {
LOG.error("Registry with name " + registryName + " already exists with a different class " + registry.getClass().getName()
+ " than the requested class " + clazz);
}
return registry;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't silently return the wrong registry implementation.

If a name is already registered under one class, logging and then returning that instance for a different requested class means callers keep running with incompatible semantics and the first registrant wins forever. This should fail fast instead of degrading into a hidden runtime mismatch.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hudi-io/src/main/java/org/apache/hudi/common/metrics/Registry.java` around
lines 105 - 109, When an existing registry's implementation class doesn't match
the requested clazz, fail fast instead of logging and returning the wrong
instance: replace the LOG.error(...) branch in the Registry lookup (the code
that inspects registry, registryName and clazz) with a thrown exception (e.g.,
throw new IllegalStateException or IllegalArgumentException) that includes
registryName, registry.getClass().getName(), and the requested clazz so callers
cannot continue with an incompatible registry implementation.

Comment on lines +152 to +155
static void setRegistries(Collection<Registry> registries) {
for (Registry registry : registries) {
REGISTRY_MAP.putIfAbsent(makeKey("", registry.getName()), registry);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

setRegistries() re-keys propagated table-scoped registries under the wrong map key.

getRegistryOfClass(tableName, registryName, ...) stores entries under tableName + "::" + registryName, but setRegistries() reinserts them as "::" + registry.getName(), where getName() is already "tableName.registryName". A later lookup by (tableName, registryName) misses the propagated instance and creates a second registry.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hudi-io/src/main/java/org/apache/hudi/common/metrics/Registry.java` around
lines 152 - 155, setRegistries is re-keying table-scoped registries incorrectly:
when iterating registries it uses makeKey("", registry.getName()) which produces
"::"+registry.getName(), but getRegistryOfClass expects keys created by
makeKey(tableName, registryName) (tableName+"::"+registryName) and
registry.getName() already contains "tableName.registryName"; fix setRegistries
by parsing registry.getName() into its tableName and registryName parts (or by
calling the same key-construction logic used by getRegistryOfClass) and insert
into REGISTRY_MAP with makeKey(tableName, registryName) so lookups by
getRegistryOfClass find the propagated instance (update code in setRegistries to
derive tableName and registryName from registry.getName() or use registry
methods that return them).

Comment on lines +332 to +350
public static void rollbackFailedClusteringForPartitions(
SparkRDDWriteClient<?> client,
HoodieTableMetaClient metaClient,
List<String> partitions) {
for (HoodieInstant instant : getPendingClusteringInstantsForPartitions(metaClient, partitions)) {
if (!BaseHoodieTableServiceClient.isClusteringInstantEligibleForRollback(
metaClient, instant, client.getConfig(), client.getHeartbeatClient())) {
throw new HoodieException("Clustering instant " + instant.requestedTime()
+ " targeting requested partitions is not eligible for rollback "
+ "(heartbeat still active or instant too recent)");
}
// Reload timeline to handle the case where the instant committed and cleaned up
// its heartbeat after the timeline was first loaded
metaClient.reloadActiveTimeline();
if (metaClient.getActiveTimeline().filterInflightsAndRequested()
.containsInstant(instant.requestedTime())) {
LOG.info("Rolling back expired clustering instant {}", instant.requestedTime());
client.rollback(instant.requestedTime());
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid all-or-nothing rollback in this cleanup helper.

rollbackFailedClusteringForPartitions is documented as rolling back expired/eligible instants, but Lines 337-342 throw on the first live/recent instant. If the partition set contains both stale and active clustering plans, this can rollback some instants and then fail, leaving partial state and preventing cleanup of the rest.

Possible direction
     for (HoodieInstant instant : getPendingClusteringInstantsForPartitions(metaClient, partitions)) {
       if (!BaseHoodieTableServiceClient.isClusteringInstantEligibleForRollback(
           metaClient, instant, client.getConfig(), client.getHeartbeatClient())) {
-        throw new HoodieException("Clustering instant " + instant.requestedTime()
-            + " targeting requested partitions is not eligible for rollback "
-            + "(heartbeat still active or instant too recent)");
+        LOG.info("Skipping pending clustering instant {} because it is not rollback-eligible yet",
+            instant.requestedTime());
+        continue;
       }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java`
around lines 332 - 350, The current rollbackFailedClusteringForPartitions method
aborts on the first ineligible instant by throwing from the
BaseHoodieTableServiceClient.isClusteringInstantEligibleForRollback check,
causing partial rollback; change this to skip ineligible instants instead of
throwing: for each instant from getPendingClusteringInstantsForPartitions, call
isClusteringInstantEligibleForRollback and if it returns false, log a
warning/info referencing the instant.requestedTime() and continue to the next
instant (optionally collect skipped instants for metrics), otherwise proceed
with metaClient.reloadActiveTimeline(), re-check containsInstant and call
client.rollback(instant.requestedTime()) with LOG.info around successful
rollbacks; do not throw during per-instant processing so other eligible instants
can still be cleaned up.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 9, 2026

Greptile Summary

This PR adds support for safely executing metadata-table (MDT) compaction plans from an external table service platform (rather than inline during writes), along with clustering expiration/rollback via heartbeat-based detection, column pruning for incremental reads, distributed metrics support, log-compaction scheduling improvements, and VECTOR schema validation for nested types.

Key changes:

  • MDT table service delegation: New METADATA_WRITE_CONCURRENCY_MODE, TABLE_SERVICE_MANAGER_ENABLED, and TABLE_SERVICE_MANAGER_ACTIONS configs allow MDT compaction/log-compaction to be scheduled but not executed inline, delegating execution to an external table service platform. HoodieBackedTableMetadataWriter now implements RunsTableService.
  • Clustering expiration: New ENABLE_EXPIRATIONS / EXPIRATION_THRESHOLD_MINS configs enable LAZY-clean-policy writers to roll back stale pending clustering instants whose heartbeats have expired, using HoodieClusteringJob.rollbackFailedClusteringForPartitions.
  • MDT multi-writer lock config: HoodieLockConfig.deriveLockConfigForDifferentTable derives a lock config for the MDT from the data table's lock provider, with explicit validation and rejection of implicit-path providers.
  • Column pruning for incremental reads: IncrementalRelationV1 and V2 now implement PrunedScan instead of TableScan, with IncrementalRelationUtil providing schema pruning and column-dropping helpers.
  • Log-compaction gating: ScheduleCompactionActionExecutor.needLogCompact gates scheduling on delta commits since both the last compaction and the last log compaction.
  • Distributed metrics: HoodieEngineContext.getMetricRegistry() overridden in HoodieSparkEngineContext to return DistributedRegistry instances.
  • VECTOR schema validation: HoodieSchema and HoodieSparkSchemaConverters now reject VECTOR types in nested positions.
  • ParquetReaderIterator: Auto-closes on exhaustion with idempotent close guard.

Confidence Score: 3/5

PR introduces several independently useful features but has multiple open structural issues that need resolution before merge

Several previous-thread issues remain unresolved (Registry key mismatch, static DISTRIBUTED_REGISTRY_MAP, fail-fast in rollbackFailedClusteringForPartitions, inflightInstantsStream duplication). Two new issues are identified: the hasInstantExpired timezone dead-code affecting non-UTC tables, and a potential PUSH_DOWN_INCR_FILTERS regression from the TableScan→PrunedScan upgrade. The core MDT delegation and lock-config derivation logic are sound, and tests are substantial.

BaseHoodieTableServiceClient.java (timezone bug), IncrementalRelationUtil.scala (PUSH_DOWN_INCR_FILTERS regression), Registry.java (key mismatch), HoodieSparkEngineContext.java (static registry map)

Important Files Changed

Filename Overview
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java Adds clustering expiration/rollback under LAZY policy and clustering heartbeat lifecycle; introduces hasInstantExpired with a dead zoneId variable that may produce incorrect expiration for non-UTC tables
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java New multi-writer MDT lock and concurrency mode derivation; correctly validates provider and propagates table-service-manager settings to MDT write config
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieLockConfig.java New deriveLockConfigForDifferentTable correctly validates built-in providers and rejects implicit-path providers; minor misleading error message in the catch-all else clause
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/common/HoodieSparkEngineContext.java Adds getMetricRegistry override with DistributedRegistry support; static DISTRIBUTED_REGISTRY_MAP causes cross-table registry collisions (flagged in prior thread)
hudi-io/src/main/java/org/apache/hudi/common/metrics/Registry.java Adds compound key support and setRegistries; key mismatch between driver and executor storage paths (flagged in prior thread)
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/util/IncrementalRelationUtil.scala New utility for pruned-schema construction and post-scan column dropping; PUSH_DOWN_INCR_FILTERS may reference pruned columns causing AnalysisException
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java Adds partition-aware rollback of expired clustering instants; fail-fast on first ineligible instant in rollbackFailedClusteringForPartitions previously flagged
hudi-common/src/main/java/org/apache/hudi/common/util/CompactionUtils.java Adds getDeltaCommitsSinceLatestCompletedLogCompaction and getLastLogCompaction; handles pending log compaction and no-prior-log-compaction cases correctly
hudi-hadoop-common/src/main/java/org/apache/hudi/common/util/ParquetReaderIterator.java Adds idempotent close guard and auto-close on exhaustion; straightforward and correct improvement

Sequence Diagram

sequenceDiagram
    participant DT as Data Table Writer
    participant MDT_W as HoodieBackedTableMetadataWriter
    participant MDT_C as MDT Write Client
    participant TSM as Table Service Manager (External)

    DT->>MDT_W: syncDataTableCommit()
    MDT_W->>MDT_W: Check pending compactions on MDT timeline

    alt shouldDelegateToTableServiceManager(compaction)?
        MDT_W->>MDT_C: scheduleCompactionAtInstant()
        MDT_C-->>MDT_W: plan scheduled (REQUESTED state)
        MDT_W-->>DT: skip inline execution
        note over TSM: TSM independently picks up and executes MDT compaction plan
        TSM->>MDT_C: compact(instantTime)
    else inline execution
        MDT_W->>MDT_C: scheduleCompactionAtInstant()
        MDT_W->>MDT_C: compact(instantTime)
    end

    Note over DT,MDT_C: Clustering Expiration Flow
    DT->>DT: rollFailedWrites (LAZY policy)
    DT->>DT: filterPendingClusteringTimeline()
    DT->>DT: isClusteringInstantEligibleForRollback()
    alt eligible for rollback
        DT->>DT: rollback(clusteringInstant)
    end
Loading

Reviews (3): Last reviewed commit: "change name" | Re-trigger Greptile

Comment on lines +92 to +95
/**
* Map of all distributed registries created via getMetricRegistry().
* This map is passed to Spark executors to make the registries available there.
*/
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Static DISTRIBUTED_REGISTRY_MAP causes cross-table registry collisions

DISTRIBUTED_REGISTRY_MAP is static final, so it is shared across every HoodieSparkEngineContext (and therefore every SparkRDDWriteClient) in the same JVM. In a Spark application running both a data-table writer and a metadata-table writer concurrently, setRegistries pushes all registries into Registry.REGISTRY_MAP using Registry.makeKey("", registry.getName()), which can silently merge metrics from two different tables into one accumulator.

Consider making DISTRIBUTED_REGISTRY_MAP an instance field, or scoping the map key to include a unique client/session identifier.

Comment on lines 305 to 306
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Duplicate log message

The "Additional Filters to be applied to incremental source are :" message is printed twice: once inside the else branch above and again here after filterRequiredColumnsFromDF. The second log call is a copy-paste artifact and should be removed.

Suggested change
}
}
scanDf = IncrementalRelationUtil.filterRequiredColumnsFromDF(scanDf, requiredColumns, metaClient)

Comment on lines +30 to +31
metaClient: HoodieTableMetaClient) = {
var prunedSchema = StructType(Seq())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Public methods lack explicit return type annotations

Both getPrunedSchema and filterRequiredColumnsFromDF are public and infer their return types (StructType and DataFrame respectively). In Scala, public API methods should declare explicit return types for clarity and binary-compatibility guarantees:

def getPrunedSchema(requiredColumns: Array[String],
                    usedSchema: StructType,
                    metaClient: HoodieTableMetaClient): StructType = {

def filterRequiredColumnsFromDF(df: DataFrame,
                                requiredColumns: Array[String],
                                metaClient: HoodieTableMetaClient): DataFrame = {

Comment thread hudi-io/src/main/java/org/apache/hudi/common/metrics/Registry.java
Comment on lines +149 to +155
/**
* Set all registries if they are not already registered.
*/
static void setRegistries(Collection<Registry> registries) {
for (Registry registry : registries) {
REGISTRY_MAP.putIfAbsent(makeKey("", registry.getName()), registry);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 setRegistries stores registries under a mismatched key

registry.getName() returns the full name as set in getRegistryOfClass — e.g. "myTable.HoodieWrapperFileSystem". So makeKey("", registry.getName()) produces "::myTable.HoodieWrapperFileSystem".

But the driver stored the same registry under makeKey("myTable", "HoodieWrapperFileSystem") = "myTable::HoodieWrapperFileSystem". Any executor-side code that calls Registry.getRegistry("HoodieWrapperFileSystem") (key "::HoodieWrapperFileSystem") will find neither and silently create a new LocalRegistry, losing those metrics.

The intent appears to be forwarding the distributed accumulators so that executors accumulate into the same object. The key used here should be consistent with how REGISTRY_MAP is keyed elsewhere, for example:

static void setRegistries(Collection<Registry> registries) {
    for (Registry registry : registries) {
        // Re-derive the compound key the same way getRegistryOfClass() stored it.
        // getName() returns "tableName.registryName", so we need to split on the first ".".
        String fullName = registry.getName();
        int dotIdx = fullName.indexOf('.');
        String tableName = dotIdx >= 0 ? fullName.substring(0, dotIdx) : "";
        String registryName = dotIdx >= 0 ? fullName.substring(dotIdx + 1) : fullName;
        REGISTRY_MAP.putIfAbsent(makeKey(tableName, registryName), registry);
    }
}

Comment on lines +250 to +262

static final String SPARK_REJECT_UPDATE_STRATEGY_CLASS_NAME =
"org.apache.hudi.client.clustering.update.strategy.SparkRejectUpdateStrategy";

public static final ConfigProperty<String> UPDATES_STRATEGY = ConfigProperty
.key("hoodie.clustering.updates.strategy")
.defaultValue("org.apache.hudi.client.clustering.update.strategy.SparkRejectUpdateStrategy")
.noDefaultValue()
.withInferFunction(cfg -> {
String strategy = cfg.getStringOrDefault(HoodieLockConfig.WRITE_CONFLICT_RESOLUTION_STRATEGY_CLASS_NAME, "");
if (PreferWriterConflictResolutionStrategy.class.getName().equals(strategy)) {
return Option.of(SPARK_ALLOW_UPDATE_STRATEGY_CLASS_NAME);
}
return Option.of(SPARK_REJECT_UPDATE_STRATEGY_CLASS_NAME);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 noDefaultValue() makes UPDATES_STRATEGY.defaultValue() return null

Replacing the hard-coded .defaultValue("org.apache.hudi...SparkRejectUpdateStrategy") with .noDefaultValue() means UPDATES_STRATEGY.defaultValue() now returns null. The infer function computes the right value at config-resolution time, but any call-site that uses the raw ConfigProperty.defaultValue() accessor (e.g. for documentation generation, validation, or legacy code paths that haven't been updated) will now receive null and may NPE.

The deprecated constant DEFAULT_CLUSTERING_UPDATES_STRATEGY was correctly updated to point at SPARK_REJECT_UPDATE_STRATEGY_CLASS_NAME directly, so that one is safe. But review all direct usages of UPDATES_STRATEGY.defaultValue() across the codebase to confirm none depend on a non-null return.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
hudi-common/src/main/java/org/apache/hudi/common/util/collection/RocksDBDAO.java (1)

509-528: ⚠️ Potential issue | 🟡 Minor

Reorder defaultWriteOptions close after rocksDB close.

Per RocksDB Java best practices, resources should be closed in reverse order of construction: RocksDB instance first, then Options/WriteOptions, then Statistics. Currently, defaultWriteOptions is closed before rocksDB.close() at line 516. Move defaultWriteOptions.close() to after getRocksDB().close() (line 518) to follow the recommended resource cleanup order.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-common/src/main/java/org/apache/hudi/common/util/collection/RocksDBDAO.java`
around lines 509 - 528, The close() method currently closes defaultWriteOptions
before closing the RocksDB instance; change the teardown order to follow
reverse-construction best practices by closing the RocksDB instance first (call
getRocksDB().close()), then close defaultWriteOptions
(defaultWriteOptions.close()), and finally close statistics
(statistics.close()). Update the sequence in the synchronized close() method so
managedHandlesMap/managedDescriptorMap are cleared, then getRocksDB().close(),
then defaultWriteOptions.close() if non-null, and then statistics cleanup and
filesystem deletion.
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.java (1)

101-109: ⚠️ Potential issue | 🟠 Major

Fix validation order: move sanityCheck() after setupConfOptions().

The sanityCheck() method (line 107) calls checkIndexType(), which asserts that INDEX_GLOBAL_ENABLED must be true when GLOBAL_RECORD_LEVEL_INDEX is configured. However, this assertion runs before setupConfOptions() (line 108), which internally calls setupWriteOptions() (lines 428-442) to auto-enable INDEX_GLOBAL_ENABLED=true for GLOBAL_RECORD_LEVEL_INDEX configurations. This ordering prevents users from setting only index.type=GLOBAL_RECORD_LEVEL_INDEX without also explicitly setting index.global.enabled=true, even though the factory would normalize the config successfully if allowed to proceed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.java`
around lines 101 - 109, In createDynamicTableSink, the call order causes
sanityCheck() (which invokes checkIndexType() and asserts INDEX_GLOBAL_ENABLED
when index.type == GLOBAL_RECORD_LEVEL_INDEX) to run before setupConfOptions()
(which calls setupWriteOptions() and may auto-enable INDEX_GLOBAL_ENABLED); move
the sanityCheck(conf, schema) call to after setupConfOptions(conf,
context.getObjectIdentifier(), context.getCatalogTable(), schema) so
configuration normalization in setupWriteOptions() occurs before validation,
ensuring GLOBAL_RECORD_LEVEL_INDEX is accepted when setupConfOptions enables
INDEX_GLOBAL_ENABLED.
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/RLIBootstrapOperator.java (1)

63-72: ⚠️ Potential issue | 🟠 Major

Guard the bootstrap preload on restore.

initializeState() is called during both startup and recovery. The unconditional preLoadRLIRecords() will re-scan the metadata table and re-emit the full bootstrap payload after failover. The sibling BootstrapOperator guards this with context.isRestored() and operator state persistence, but RLIBootstrapOperator lacks both mechanisms, making recovery O(size of RLI) and risking duplicate index initialization downstream.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/RLIBootstrapOperator.java`
around lines 63 - 72, initializeState currently always calls preLoadRLIRecords
which causes a full re-scan on recovery; wrap the preload so it only runs on
fresh start by checking the StateInitializationContext restore flag (use
context.isRestored()) and skip preLoadRLIRecords when restored, and
persist/restore a small operator state (e.g., loadedCnt) so the operator knows
whether RLI was already loaded across restarts; update initializeState,
preLoadRLIRecords usage, and add operator state handling for loadedCnt to match
BootstrapOperator's approach.
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/MetadataTableCompactHandler.java (1)

96-116: ⚠️ Potential issue | 🟡 Minor

Stop metadata compaction metrics in a finally block.

If the log-compaction path throws, endCompaction() is never reached and the metric stays stuck in the running state.

Proposed fix
     } else {
       compactionMetrics.startCompaction();
-      // Create a write client specifically for the metadata table
-      HoodieFlinkMergeOnReadTableCompactor<?> compactor = new HoodieFlinkMergeOnReadTableCompactor<>();
-      HoodieTableMetaClient metaClient = table.getMetaClient();
-      if (needReloadMetaClient) {
-        // reload the timeline
-        metaClient.reload();
-      }
-      Option<InstantRange> instantRange = CompactHelpers.getInstance().getInstantRange(metaClient);
-      List<WriteStatus> writeStatuses = compactor.logCompact(
-          writeClient.getConfig(),
-          event.getOperation(),
-          event.getCompactionInstantTime(),
-          instantRange,
-          table,
-          table.getTaskContextSupplier());
-      compactionMetrics.endCompaction();
-      collector.collect(createCommitEvent(event, writeStatuses));
+      try {
+        HoodieFlinkMergeOnReadTableCompactor<?> compactor = new HoodieFlinkMergeOnReadTableCompactor<>();
+        HoodieTableMetaClient metaClient = table.getMetaClient();
+        if (needReloadMetaClient) {
+          metaClient.reload();
+        }
+        Option<InstantRange> instantRange = CompactHelpers.getInstance().getInstantRange(metaClient);
+        List<WriteStatus> writeStatuses = compactor.logCompact(
+            writeClient.getConfig(),
+            event.getOperation(),
+            event.getCompactionInstantTime(),
+            instantRange,
+            table,
+            table.getTaskContextSupplier());
+        collector.collect(createCommitEvent(event, writeStatuses));
+      } finally {
+        compactionMetrics.endCompaction();
+      }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/MetadataTableCompactHandler.java`
around lines 96 - 116, The log-compaction path calls
compactionMetrics.startCompaction() but currently only calls
compactionMetrics.endCompaction() on the success path after
HoodieFlinkMergeOnReadTableCompactor.logCompact(...), so if an exception is
thrown the metric remains running; wrap the compaction work (the call to
compactor.logCompact(...) and subsequent
collector.collect(createCommitEvent(...))) in a try block and move
compactionMetrics.endCompaction() into a finally block so it always runs,
rethrowing the exception after the finally to preserve error propagation; ensure
you keep use of HoodieFlinkMergeOnReadTableCompactor, writeClient.getConfig(),
event.getCompactionInstantTime(), instantRange, table,
table.getTaskContextSupplier(), createCommitEvent(...) and
collector.collect(...) in the try so successful results are still collected.
♻️ Duplicate comments (2)
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/reader/function/TestHoodieCdcSplitReaderFunction.java (1)

172-219: ⚠️ Potential issue | 🟠 Major

Limit-pushdown tests still don’t validate any limit behavior.

On Lines 174-180, 187-193, 201-207, and 215-217, all tests still use the 6-arg constructor and only assert non-null, so none of the named limit cases (limit>0, limit=0, default sentinel) are actually exercised.

Proposed test fix
@@
   `@Test`
   public void testConstructorWithLimit() {
     HoodieCdcSplitReaderFunction function = new HoodieCdcSplitReaderFunction(
         conf,
         tableState,
         internalSchemaManager,
         ROW_DATA_TYPE.getChildren(),
         Collections.emptyList(),
-        false);
+        false,
+        5L);
 
     assertNotNull(function);
+    assertEquals(5L, extractLimit(function));
   }
@@
   public void testConstructorWithLimitAndEmptyFieldTypes() {
     HoodieCdcSplitReaderFunction function = new HoodieCdcSplitReaderFunction(
         conf,
         tableState,
         internalSchemaManager,
         Collections.emptyList(),
         Collections.emptyList(),
-        false);
+        false,
+        5L);
 
     assertNotNull(function);
+    assertEquals(5L, extractLimit(function));
   }
@@
   public void testDefaultConstructorUsesNoLimitSentinel() {
@@
     HoodieCdcSplitReaderFunction explicitNoLimit = new HoodieCdcSplitReaderFunction(
         conf, tableState, internalSchemaManager,
-        ROW_DATA_TYPE.getChildren(), Collections.emptyList(), false);
+        ROW_DATA_TYPE.getChildren(), Collections.emptyList(), false, -1L);
@@
     assertNotNull(defaultLimit);
     assertNotNull(explicitNoLimit);
+    assertEquals(-1L, extractLimit(defaultLimit));
+    assertEquals(-1L, extractLimit(explicitNoLimit));
   }
@@
   public void testConstructorWithLimitZeroIsAccepted() {
@@
     HoodieCdcSplitReaderFunction function = new HoodieCdcSplitReaderFunction(
         conf, tableState, internalSchemaManager,
-        ROW_DATA_TYPE.getChildren(), Collections.emptyList(), false);
+        ROW_DATA_TYPE.getChildren(), Collections.emptyList(), false, 0L);
     assertNotNull(function);
+    assertEquals(0L, extractLimit(function));
   }
+
+  private long extractLimit(HoodieCdcSplitReaderFunction function) {
+    try {
+      java.lang.reflect.Field f = HoodieCdcSplitReaderFunction.class.getDeclaredField("limit");
+      f.setAccessible(true);
+      return (long) f.get(function);
+    } catch (ReflectiveOperationException e) {
+      throw new AssertionError("Unable to inspect limit field", e);
+    }
+  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/reader/function/TestHoodieCdcSplitReaderFunction.java`
around lines 172 - 219, Tests for HoodieCdcSplitReaderFunction are currently
only calling the 6-arg constructor and asserting non-null, so the limit cases
are not exercised; update each test (testConstructorWithLimit,
testConstructorWithLimitAndEmptyFieldTypes,
testDefaultConstructorUsesNoLimitSentinel,
testConstructorWithLimitZeroIsAccepted) to call the 7-argument
HoodieCdcSplitReaderFunction constructor that includes the limit parameter with
explicit values (e.g., limit>0 for positive case, 0 for zero case, -1 for
sentinel/default) and replace the bare assertNotNull with assertions that verify
the limit was applied (either by calling a public accessor if available, or by
using reflection to read the private limit-related field, or by asserting
observable behavior that depends on the limit such as wrapping of iterators);
ensure you reference the constructor signature HoodieCdcSplitReaderFunction(...,
List<RowType> fieldTypes, List<String> projectedFields, boolean someFlag, long
limit) when making the changes.
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java (1)

424-427: ⚠️ Potential issue | 🟠 Major

The new VECTOR guard still misses nested UNION/ARRAY/MAP shapes.

This still only rejects the immediate non-null child in createArray() / createMap(), and validateNoVectorInNestedRecord() only walks into child RECORDs. Shapes like ARRAY<UNION<VECTOR,INT>> or MAP<STRING,UNION<RECORD<...VECTOR...>>> will still pass even though the error text says nested VECTORs are unsupported.

Also applies to: 445-448, 484-515

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java`
around lines 424 - 427, The current VECTOR guard only checks the immediate
non-null child and RECORDs, so nested shapes like ARRAY<UNION<VECTOR,INT>> or
MAP<STRING,UNION<RECORD<...VECTOR...>>> slip through; update createArray() and
createMap() and the validateNoVectorInNestedRecord() flow to recursively walk
all possible nested type constructors (UNION, ARRAY, MAP, RECORD) by following
getNonNullType() and inspecting getType() (HoodieSchemaType) until you either
hit a non-VECTOR leaf or find a VECTOR; extract this into a helper (e.g.,
containsVectorRecursively(Type t)) that returns true if any nested/non-null
child is VECTOR and call it from the places that currently only check the
immediate child, then throw the existing HoodieSchemaException with the same
message when the helper returns true. Ensure the helper traverses UNION members,
ARRAY element types, MAP value/key types as applicable and reuses
getNonNullType() on each visited node.
🟡 Minor comments (6)
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cdc/CdcInputFormat.java-71-77 (1)

71-77: ⚠️ Potential issue | 🟡 Minor

Potential resource leak if exception occurs before iterator takes ownership.

If an exception is thrown between creating CdcImageManager (line 71-74) and constructing CdcFileSplitsIterator (line 77), the manager won't be closed.

🛡️ Proposed fix to ensure cleanup on exception
       HoodieCDCSupplementalLoggingMode mode = OptionsResolver.getCDCSupplementalLoggingMode(conf);
       CdcImageManager manager = new CdcImageManager(
           tableState.getRowType(),
           FlinkWriteClients.getHoodieClientConfig(conf),
           this::getFileSliceIterator);
-      Function<HoodieCDCFileSplit, ClosableIterator<RowData>> recordIteratorFunc =
-          cdcFileSplit -> getRecordIteratorSafe(split.getTablePath(), split.getMaxCompactionMemoryInBytes(), cdcFileSplit, mode, manager);
-      return new CdcIterators.CdcFileSplitsIterator(((CdcInputSplit) split).getChanges(), manager, recordIteratorFunc);
+      try {
+        Function<HoodieCDCFileSplit, ClosableIterator<RowData>> recordIteratorFunc =
+            cdcFileSplit -> getRecordIteratorSafe(split.getTablePath(), split.getMaxCompactionMemoryInBytes(), cdcFileSplit, mode, manager);
+        return new CdcIterators.CdcFileSplitsIterator(((CdcInputSplit) split).getChanges(), manager, recordIteratorFunc);
+      } catch (Exception e) {
+        manager.close();
+        throw e;
+      }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cdc/CdcInputFormat.java`
around lines 71 - 77, Creating CdcImageManager and then calling
getRecordIteratorSafe may throw before the CdcIterators.CdcFileSplitsIterator
takes ownership, leaking resources; wrap the creation and subsequent setup in a
try/catch (or try/finally) so that if any exception occurs before you return the
new CdcFileSplitsIterator you call manager.close() (or manager.closeQuietly())
and rethrow the exception; ensure the returned CdcFileSplitsIterator still
assumes ownership of the manager when successfully constructed (i.e., only close
on the error path), referencing CdcImageManager, getRecordIteratorSafe, and
CdcIterators.CdcFileSplitsIterator to locate the code.
docker/build_docker_images.sh-108-112 (1)

108-112: ⚠️ Potential issue | 🟡 Minor

Quote --build-arg assignments to prevent shell splitting if version strings contain spaces.

At lines 109–111, unquoted variable expansions can be split by the shell if the values passed via command-line arguments contain spaces or special characters (e.g., ./build_docker_images.sh --hadoop-version "2.8.4 test"). While default version strings are safe, the script accepts arbitrary values from users without validation, making this a defensive improvement.

Suggested fix
   if ! docker build \
-    --build-arg HADOOP_VERSION=${HADOOP_VERSION} \
-    --build-arg SPARK_VERSION=${SPARK_VERSION} \
-    --build-arg HIVE_VERSION=${HIVE_VERSION} \
+    --build-arg "HADOOP_VERSION=${HADOOP_VERSION}" \
+    --build-arg "SPARK_VERSION=${SPARK_VERSION}" \
+    --build-arg "HIVE_VERSION=${HIVE_VERSION}" \
     "$IMAGE_CONTEXT" -t "$TAG_LATEST" -t "$TAG_VERSIONED"; then
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/build_docker_images.sh` around lines 108 - 112, The docker build
command is using unquoted variable expansions for the --build-arg assignments
(HADOOP_VERSION, SPARK_VERSION, HIVE_VERSION) which can be split if a version
string contains spaces; update the invocation so each build-arg assignment is
quoted (e.g. --build-arg "HADOOP_VERSION=${HADOOP_VERSION}") to prevent shell
word-splitting and preserve special characters when calling docker build in this
block.
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/CompositeCleanHandler.java-24-48 (1)

24-48: ⚠️ Potential issue | 🟡 Minor

Add exception handling to ensure both handlers are always invoked.

The forEachHandler method in CompositeTableServiceHandler lacks exception handling. If the first handler throws an exception during clean(), waitForCleaningFinish(), startAsyncCleaning(), or close(), the second handler's method is never invoked. This could leave the metadata table in an inconsistent state relative to the data table.

For close() specifically, if the data table handler fails to close, the metadata table handler's resources may leak.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/CompositeCleanHandler.java`
around lines 24 - 48, The composite currently delegates via forEachHandler which
stops on the first exception; change each method in CompositeCleanHandler
(clean, waitForCleaningFinish, startAsyncCleaning, close) to invoke the two
underlying handlers individually (use the same handler references used by
CompositeTableServiceHandler or call the existing accessors), wrap each handler
call in its own try/catch, and ensure the second handler is always invoked even
if the first throws; if one or both throw, capture the first Throwable, add any
subsequent Throwable as suppressed to the first, and after both calls rethrow
the captured exception so callers still see failures while preventing resource
leaks (apply this pattern for close as well).
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/DataTableCompactionCommitHandler.java-97-101 (1)

97-101: ⚠️ Potential issue | 🟡 Minor

The NPE concern is mitigated by Flink's lifecycle contract, but defensive null checks would improve robustness.

While compactionMetrics could theoretically be null if registerMetrics() is not called before commitIfNecessary() (lines 133, 143, 180, 221), this is prevented in practice by Flink's standard SinkFunction lifecycle: open() (which calls registerMetrics() at CompactionCommitSink line 60) is always invoked before invoke() (which calls commitIfNecessary()).

However, the interface lacks explicit documentation of this required call order, and the implementation has no defensive null checks. Adding either a null check or a Javadoc contract to CompactionCommitHandler would make the dependency explicit and improve defensiveness.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/DataTableCompactionCommitHandler.java`
around lines 97 - 101, The implementation assumes registerMetrics() is always
called before commitIfNecessary(), but lacks defensive checks; update
CompactionCommitHandler (or DataTableCompactionCommitHandler) so the contract is
explicit and robust by either adding a short Javadoc on
CompactionCommitHandler.registerMetrics() stating it must be called prior to
commitIfNecessary(), and/or adding null checks in
DataTableCompactionCommitHandler.commitIfNecessary() (and any other methods that
reference compactionMetrics) to handle a missing compactionMetrics gracefully
(log an error/warning and skip metrics operations) so NPEs cannot occur if
registerMetrics() was not invoked.
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/CompactionPlanOperator.java-105-106 (1)

105-106: ⚠️ Potential issue | 🟡 Minor

Guard close() against partial initialization.

If createCompactionPlanHandler(...) fails in open(), close() now dereferences a null handler and can mask the real startup error.

💡 Suggested fix
   `@Override`
   public void close() throws Exception {
-    this.compactionPlanHandler.close();
+    if (this.compactionPlanHandler != null) {
+      this.compactionPlanHandler.close();
+    }
     super.close();
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/CompactionPlanOperator.java`
around lines 105 - 106, The close() method currently calls
this.compactionPlanHandler.close() without null-checking, which can NPE if
createCompactionPlanHandler(...) failed during open(); update close() in
CompactionPlanOperator to guard against partial initialization by checking
compactionPlanHandler != null before invoking its close (or wrap the call in a
null-safe try/catch), ensuring any exception from open() is not masked by a
secondary NPE from close().
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/DataTableCompactHandler.java-127-147 (1)

127-147: ⚠️ Potential issue | 🟡 Minor

Always end compaction metrics on failure.

endCompaction() only runs on the success path here. If the compactor or collector.collect(...) throws, the metric remains open and subsequent measurements are skewed.

Proposed fix
   protected void doCompaction(CompactionPlanEvent event,
                               Collector<CompactionCommitEvent> collector,
                               boolean needReloadMetaClient) throws Exception {
     compactionMetrics.startCompaction();
-    HoodieFlinkMergeOnReadTableCompactor<?> compactor = new HoodieFlinkMergeOnReadTableCompactor<>();
-    HoodieTableMetaClient metaClient = table.getMetaClient();
-    if (needReloadMetaClient) {
-      // reload the timeline
-      metaClient.reload();
-    }
-    // schema evolution
-    CompactionUtil.setAvroSchema(writeClient.getConfig(), metaClient);
-    List<WriteStatus> writeStatuses = compactor.compact(
-        writeClient.getConfig(),
-        event.getOperation(),
-        event.getCompactionInstantTime(),
-        table.getTaskContextSupplier(),
-        createReaderContext(needReloadMetaClient),
-        table);
-    compactionMetrics.endCompaction();
-    collector.collect(createCommitEvent(event, writeStatuses));
+    try {
+      HoodieFlinkMergeOnReadTableCompactor<?> compactor = new HoodieFlinkMergeOnReadTableCompactor<>();
+      HoodieTableMetaClient metaClient = table.getMetaClient();
+      if (needReloadMetaClient) {
+        metaClient.reload();
+      }
+      CompactionUtil.setAvroSchema(writeClient.getConfig(), metaClient);
+      List<WriteStatus> writeStatuses = compactor.compact(
+          writeClient.getConfig(),
+          event.getOperation(),
+          event.getCompactionInstantTime(),
+          table.getTaskContextSupplier(),
+          createReaderContext(needReloadMetaClient),
+          table);
+      collector.collect(createCommitEvent(event, writeStatuses));
+    } finally {
+      compactionMetrics.endCompaction();
+    }
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/DataTableCompactHandler.java`
around lines 127 - 147, The compaction metrics are only ended on the success
path in doCompaction; wrap the compaction work (the call to
HoodieFlinkMergeOnReadTableCompactor.compact(...) and collector.collect(...)) in
a try/finally so compactionMetrics.endCompaction() is always invoked (and
rethrow the exception after the finally block if one occurs). Specifically, in
DataTableCompactHandler.doCompaction(), call compactionMetrics.startCompaction()
as now, then perform compactor.compact(...) and collector.collect(...) inside a
try block and move compactionMetrics.endCompaction() into the finally to
guarantee it runs on both success and failure.
🧹 Nitpick comments (20)
hudi-io/src/test/java/org/apache/hudi/io/util/TestIOUtils.java (1)

95-127: Add negative-value encode coverage for writeVarInt.

Current new tests are good for positive/boundary values, but adding signed cases would better lock down compatibility for the full API contract.

✅ Suggested test extension
   `@Test`
   public void testWriteVarIntRoundTrip() {
@@
-    int[] testValues = {0, 1, 127, 128, 146, 200, 255, 256, 300, 1000, 32080, 65535, 100000,
-        2034958, 632492350, Integer.MAX_VALUE};
+    int[] testValues = {
+        Integer.MIN_VALUE, -203, -128, -113, -112, -1,
+        0, 1, 127, 128, 146, 200, 255, 256, 300, 1000, 32080, 65535, 100000,
+        2034958, 632492350, Integer.MAX_VALUE
+    };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hudi-io/src/test/java/org/apache/hudi/io/util/TestIOUtils.java` around lines
95 - 127, Add negative-value coverage to the writeVarInt tests by feeding a few
negative ints (e.g., -1, -128, Integer.MIN_VALUE and a couple mid-range
negatives) into IOUtils.writeVarInt and asserting round-trip correctness: use
IOUtils.decodeVarLongSizeOnDisk(encoded, 0) to verify size equals encoded.length
and IOUtils.readVarLong(encoded, 0, size) (or readVarLong(encoded, 0) where
applicable) to assert the decoded value equals the original negative int; place
these checks into testWriteVarIntRoundTrip (or a new test) alongside the
existing positive cases to ensure writeVarInt/readVarLong handle signed inputs
correctly.
hudi-common/src/main/java/org/apache/hudi/common/table/log/BaseHoodieLogRecordReader.java (3)

279-280: Prefer ifPresent() over map() for side-effect-only operations.

Using Optional.map() when you're only interested in the side effect (and discarding the return value) is non-idiomatic. The ifPresent() method better expresses intent.

♻️ Suggested fix
-        logBlock.getBlockContentLocation()
-            .map(contentLocation -> totalLogBlocksSize.addAndGet(contentLocation.getBlockSize()));
+        logBlock.getBlockContentLocation()
+            .ifPresent(contentLocation -> totalLogBlocksSize.addAndGet(contentLocation.getBlockSize()));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-common/src/main/java/org/apache/hudi/common/table/log/BaseHoodieLogRecordReader.java`
around lines 279 - 280, In BaseHoodieLogRecordReader replace the use of
Optional.map when invoking logBlock.getBlockContentLocation() for a side-effect
with Optional.ifPresent to express intent: call
logBlock.getBlockContentLocation().ifPresent(contentLocation ->
totalLogBlocksSize.addAndGet(contentLocation.getBlockSize())) so the Optional
return value is not misused and the side-effect
(totalLogBlocksSize.addAndGet(...)) is clearly represented.

546-548: Consider returning an unmodifiable view to prevent external mutation.

getBlocksStats() returns a direct reference to the internal mutable list. While current callers only read from it, returning an unmodifiable view would provide better encapsulation.

♻️ Suggested fix
 public List<Map<String, Object>> getBlocksStats() {
-  return blocksStats;
+  return Collections.unmodifiableList(blocksStats);
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-common/src/main/java/org/apache/hudi/common/table/log/BaseHoodieLogRecordReader.java`
around lines 546 - 548, getBlocksStats() currently returns the internal mutable
list blocksStats allowing external mutation; change its implementation to return
an unmodifiable view by returning Collections.unmodifiableList(blocksStats) from
getBlocksStats(), adding an import for java.util.Collections; optionally, if you
want to also protect contained maps, wrap each map with
Collections.unmodifiableMap before returning or return a defensive copy of the
list of maps.

495-511: Inconsistent null-check for recordBuffer; prefer ifPresent() for side-effect-only Optional usage.

  1. Line 500 calls recordBuffer.processDataBlock() without a null check, while lines 498 and 501 defensively check recordBuffer != null. If recordBuffer can be null when this method is called, line 500 will NPE.

  2. Line 506-507 uses Optional.map() for a side effect (putting into the map), which is non-idiomatic.

♻️ Suggested fix
 private void processDataBlock(HoodieDataBlock dataBlock, Option<KeySpec> keySpecOpt) throws IOException {
   String blockInstantTime = dataBlock.getLogBlockHeader().get(INSTANT_TIME);
   LOG.debug("Processing log block with instant time {}", blockInstantTime);
   long totalLogRecordsBefore = recordBuffer != null ? recordBuffer.getTotalLogRecords() : 0L;
   HoodieTimer blockReadTimer = HoodieTimer.start();
-  recordBuffer.processDataBlock(dataBlock, keySpecOpt);
+  if (recordBuffer != null) {
+    recordBuffer.processDataBlock(dataBlock, keySpecOpt);
+  }
   long recordsInBlock = recordBuffer != null ? recordBuffer.getTotalLogRecords() - totalLogRecordsBefore : 0L;

   Map<String, Object> blockReadMetrics = new HashMap<>();
   blockReadMetrics.put(LOG_BLOCK_FULL_READ_DURATION_IN_MILLIS, blockReadTimer.endTimer());
   blockReadMetrics.put(TOTAL_RECORDS_PRESENT_IN_LOG_BLOCK, recordsInBlock);
   dataBlock.getBlockContentLocation()
-      .map(contentLocation -> blockReadMetrics.put(BLOCK_SIZE_IN_BYTES, contentLocation.getBlockSize()));
+      .ifPresent(contentLocation -> blockReadMetrics.put(BLOCK_SIZE_IN_BYTES, contentLocation.getBlockSize()));
   blockReadMetrics.put(HoodieLogBlock.HeaderMetadataType.INSTANT_TIME.toString(), blockInstantTime);
   blocksStats.add(blockReadMetrics);
   LOG.debug("For log block, scan metrics are {}", blockReadMetrics);
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-common/src/main/java/org/apache/hudi/common/table/log/BaseHoodieLogRecordReader.java`
around lines 495 - 511, The method processDataBlock currently calls
recordBuffer.processDataBlock(...) without guarding against a null recordBuffer
while other lines use recordBuffer != null; fix by checking recordBuffer is
present (e.g., if (recordBuffer != null) or recordBuffer.ifPresent(...)
depending on its type) before invoking processDataBlock and when computing
recordsInBlock use the same presence check to avoid NPEs; also replace
dataBlock.getBlockContentLocation().map(...) which uses map for a side effect
with a proper ifPresent/ifPresentOrElse so you call
blocksStats.put(BLOCK_SIZE_IN_BYTES, ...) (or blockReadMetrics.put(...)) only
when the content location Optional is present.
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cdc/CdcIterators.java (3)

587-593: Redundant setRowKind call.

Same issue as BeforeImageIterator.getAfterImage() - the row kind is already set by imageManager.getImageRecord().

♻️ Remove redundant setRowKind
     `@Override`
     protected RowData getBeforeImage(RowKind rowKind, GenericRecord cdcRecord) {
       String recordKey = cdcRecord.get(1).toString();
       RowData row = imageManager.getImageRecord(recordKey, beforeImages, rowKind);
-      row.setRowKind(rowKind);
       return projection.project(row);
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cdc/CdcIterators.java`
around lines 587 - 593, In getBeforeImage (class CdcIterators), remove the
redundant row.setRowKind(rowKind) call since imageManager.getImageRecord(...)
already sets the RowKind (same fix as applied to
BeforeImageIterator.getAfterImage()); simply return projection.project(row)
after obtaining the row from imageManager.getImageRecord(recordKey,
beforeImages, rowKind).

541-547: Redundant setRowKind call.

imageManager.getImageRecord() already sets the row kind (line 544), but it's set again on line 545. Remove the redundant call.

♻️ Remove redundant setRowKind
     `@Override`
     protected RowData getAfterImage(RowKind rowKind, GenericRecord cdcRecord) {
       String recordKey = cdcRecord.get(1).toString();
       RowData row = imageManager.getImageRecord(recordKey, afterImages, rowKind);
-      row.setRowKind(rowKind);
       return projection.project(row);
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cdc/CdcIterators.java`
around lines 541 - 547, In getAfterImage (method getAfterImage in CdcIterators),
remove the redundant call to row.setRowKind(rowKind) since
imageManager.getImageRecord(recordKey, afterImages, rowKind) already sets the
RowKind; leave the retrieval via imageManager.getImageRecord and the final
return projection.project(row) intact.

334-338: Double-close of shared CdcImageManager.

DataLogFileIterator.close() closes the imageManager, but CdcFileSplitsIterator (which owns and created the manager) also closes it in its close() method. While CdcImageManager.close() is idempotent, this design is confusing and error-prone.

Consider having only the owner (CdcFileSplitsIterator) responsible for closing the manager, or add a flag to track closure state.

♻️ Proposed fix: Remove imageManager.close() from DataLogFileIterator
     `@Override`
     public void close() {
       logRecordIterator.close();
-      imageManager.close();
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cdc/CdcIterators.java`
around lines 334 - 338, DataLogFileIterator.close() currently calls
imageManager.close() even though CdcFileSplitsIterator (the owner/creator of
CdcImageManager) also closes it, causing a double-close pattern; remove the
imageManager.close() call from DataLogFileIterator.close() so only
CdcFileSplitsIterator is responsible for closing CdcImageManager, or
alternatively add an ownership/closed flag on CdcImageManager and only close
when the owner (CdcFileSplitsIterator) invokes close; update
DataLogFileIterator.close() to stop closing imageManager and ensure
CdcFileSplitsIterator.close() remains the single point that calls
CdcImageManager.close().
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/HoodieVectorSearchTableValuedFunction.scala (1)

90-108: Potential integer overflow when parsing k from non-Int expressions.

The parseK method converts rawValue.toString.toInt, which could silently truncate if the expression evaluates to a Long larger than Int.MaxValue. While unlikely in practice (k values that large would be impractical), explicit type handling would be safer.

Suggested improvement for type-safe parsing
   private[logical] def parseK(funcName: String, expr: Expression): Int = {
     val rawValue = expr.eval()
     if (rawValue == null) {
       throw new HoodieAnalysisException(
         s"Function '$funcName': k must be a positive integer, got null")
     }
-    val kValue = try {
-      rawValue.toString.toInt
-    } catch {
-      case _: NumberFormatException =>
-        throw new HoodieAnalysisException(
-          s"Function '$funcName': k must be a positive integer, got '$rawValue'")
+    val kValue: Int = rawValue match {
+      case i: Int => i
+      case l: Long if l >= Int.MinValue && l <= Int.MaxValue => l.toInt
+      case l: Long =>
+        throw new HoodieAnalysisException(
+          s"Function '$funcName': k value $l exceeds maximum allowed (${Int.MaxValue})")
+      case other =>
+        try { other.toString.toInt }
+        catch { case _: NumberFormatException =>
+          throw new HoodieAnalysisException(
+            s"Function '$funcName': k must be a positive integer, got '$other'")
+        }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/HoodieVectorSearchTableValuedFunction.scala`
around lines 90 - 108, parseK currently calls rawValue.toString.toInt which can
silently truncate large numeric types (e.g., Long/BigInt); change parseK (method
parseK in HoodieVectorSearchTableValuedFunction) to handle numeric types
explicitly: if rawValue is an Int use it, if it's a
Long/BigInt/Short/Byte/Number convert to a Long/BigInt and validate it fits in
Int range and is > 0, if it's a String try parsing it as a BigInt/Long then
validate range, and otherwise throw HoodieAnalysisException; ensure all invalid
or out-of-range values produce the same user-friendly error messages instead of
silent truncation.
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/analysis/VectorDistanceUtils.scala (1)

86-97: Floating-point equality comparison may cause edge-case issues.

Line 92 uses exact equality (denom == 0.0) to check for zero denominator. Due to floating-point precision, vectors with extremely small norms might produce a non-zero but near-zero denominator, leading to potential division issues or numerical instability.

Consider using a small epsilon threshold
       case DistanceMetric.COSINE => (a, b, bNorm) =>
         val aNorm = Vectors.norm(a, 2.0)
         val denom = aNorm * bNorm
-        if (denom == 0.0) 1.0 else math.min(2.0, math.max(0.0, 1.0 - (a.dot(b) / denom)))
+        if (denom < 1e-15) 1.0 else math.min(2.0, math.max(0.0, 1.0 - (a.dot(b) / denom)))

This provides a safety margin for near-zero denominators while maintaining the same behavior for actual zero vectors.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/analysis/VectorDistanceUtils.scala`
around lines 86 - 97, The COSINE branch in resolveDistanceFn uses exact
floating-point equality (denom == 0.0); change this to a safe near-zero check by
introducing a small epsilon and treating denom as zero when |denom| <= eps (for
example eps = 1e-12) to avoid division by a near-zero value; update the
DistanceMetric.COSINE case (variables aNorm, bNorm, denom) to compute denom,
compare against eps, and return the same fallback (1.0) when denom is
effectively zero, otherwise perform the division as before.
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieSparkBaseAnalysis.scala (1)

341-344: Consider preserving the cause chain for debugging.

The exception wrapper only includes the message but not the original exception as the cause. This can make debugging harder when unexpected errors occur.

Suggested improvement
     } catch {
-      case e: Exception => throw new HoodieAnalysisException(
-        s"$funcName: unable to resolve table '$table': ${e.getMessage}")
+      case e: Exception => throw new HoodieAnalysisException(
+        s"$funcName: unable to resolve table '$table': ${e.getMessage}", e)
     }

Note: This requires HoodieAnalysisException to have a constructor that accepts a cause. If it doesn't, the current approach is acceptable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieSparkBaseAnalysis.scala`
around lines 341 - 344, The catch block that currently throws new
HoodieAnalysisException(s"$funcName: unable to resolve table '$table':
${e.getMessage}") should preserve the original exception as the cause; change
the throw to construct HoodieAnalysisException with the message and pass the
caught exception `e` as the cause (i.e., use the HoodieAnalysisException
constructor that accepts a Throwable), referencing the catch in
HoodieSparkBaseAnalysis's resolver logic (the catch handling `e: Exception`,
using `funcName` and `table`) so the original stacktrace is retained for
debugging.
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/partitioner/index/TestRocksDBIndexBackend.java (1)

102-158: Consider adding upper bound assertions for hit ratios.

Lines 147-150 assert that hit ratios are >= 0D, but Line 151 only adds the upper bound check for ROCKSDB_BLOCK_CACHE_HIT_RATIO. For completeness, consider adding <= 1D assertions for the other hit ratio metrics as well (data, index, filter).

Proposed fix
       assertTrue(((Number) gauges.get(FlinkRocksDBIndexMetrics.ROCKSDB_BLOCK_CACHE_HIT_RATIO).getValue()).doubleValue() >= 0D);
       assertTrue(((Number) gauges.get(FlinkRocksDBIndexMetrics.ROCKSDB_BLOCK_CACHE_DATA_HIT_RATIO).getValue()).doubleValue() >= 0D);
       assertTrue(((Number) gauges.get(FlinkRocksDBIndexMetrics.ROCKSDB_BLOCK_CACHE_INDEX_HIT_RATIO).getValue()).doubleValue() >= 0D);
       assertTrue(((Number) gauges.get(FlinkRocksDBIndexMetrics.ROCKSDB_BLOCK_CACHE_FILTER_HIT_RATIO).getValue()).doubleValue() >= 0D);
       assertTrue(((Number) gauges.get(FlinkRocksDBIndexMetrics.ROCKSDB_BLOCK_CACHE_HIT_RATIO).getValue()).doubleValue() <= 1D);
+      assertTrue(((Number) gauges.get(FlinkRocksDBIndexMetrics.ROCKSDB_BLOCK_CACHE_DATA_HIT_RATIO).getValue()).doubleValue() <= 1D);
+      assertTrue(((Number) gauges.get(FlinkRocksDBIndexMetrics.ROCKSDB_BLOCK_CACHE_INDEX_HIT_RATIO).getValue()).doubleValue() <= 1D);
+      assertTrue(((Number) gauges.get(FlinkRocksDBIndexMetrics.ROCKSDB_BLOCK_CACHE_FILTER_HIT_RATIO).getValue()).doubleValue() <= 1D);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/partitioner/index/TestRocksDBIndexBackend.java`
around lines 102 - 158, The test testMetricsReflectWritesReadsAndAutoFlush in
TestRocksDBIndexBackend currently only asserts an upper bound (<= 1D) for
FlinkRocksDBIndexMetrics.ROCKSDB_BLOCK_CACHE_HIT_RATIO; add corresponding
upper-bound assertions for ROCKSDB_BLOCK_CACHE_DATA_HIT_RATIO,
ROCKSDB_BLOCK_CACHE_INDEX_HIT_RATIO, and ROCKSDB_BLOCK_CACHE_FILTER_HIT_RATIO so
each hit-ratio gauge is asserted to be >= 0D and <= 1D. Locate the assertions
using the test method name and the FlinkRocksDBIndexMetrics constant names and
add the missing <= 1D checks immediately after their existing >= 0D assertions.
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/common/table/log/TestLogReaderUtils.java (1)

119-121: Tighten the boundary assertion to the actual cutoff commit.

This test calls the helper with secondCommit as max, so asserting against thirdCommit is looser than the contract being validated.

Proposed refinement
-          assertTrue(commitTime.compareTo(thirdCommit) < 0,
-              "Commit time should be before third commit: " + commitTime);
+          assertTrue(commitTime.compareTo(secondCommit) <= 0,
+              "Commit time should be <= second commit: " + commitTime);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/common/table/log/TestLogReaderUtils.java`
around lines 119 - 121, The assertion in TestLogReaderUtils compares commitTime
against thirdCommit but the helper was invoked with secondCommit as the max
cutoff; tighten the boundary by asserting commitTime.compareTo(secondCommit) < 0
(or the equivalent strict comparison) so the test verifies the actual cutoff
passed to the helper (update the assertion using the commitTime and secondCommit
variables in the test).
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala (1)

1148-1156: Consider scoping instant normalization to incremental query mode.

Line 1150–1153 normalizes begin/end instants regardless of QUERY_TYPE. Limiting this to incremental reads would reduce surprising behavior for snapshot/read-optimized calls that happen to carry these keys.

Suggested refinement
     val startCommitKey = DataSourceReadOptions.START_COMMIT.key
     val endCommitKey = DataSourceReadOptions.END_COMMIT.key
-    val normalizedStartCommit = translatedParams.get(startCommitKey)
-      .map(v => startCommitKey -> HoodieSqlCommonUtils.formatIncrementalInstant(v))
-    val normalizedEndCommit = translatedParams.get(endCommitKey)
-      .map(v => endCommitKey -> HoodieSqlCommonUtils.formatIncrementalInstant(v))
+    val normalizeInstants = queryType == DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL
+    val normalizedStartCommit = if (normalizeInstants) {
+      translatedParams.get(startCommitKey)
+        .map(v => startCommitKey -> HoodieSqlCommonUtils.formatIncrementalInstant(v))
+    } else None
+    val normalizedEndCommit = if (normalizeInstants) {
+      translatedParams.get(endCommitKey)
+        .map(v => endCommitKey -> HoodieSqlCommonUtils.formatIncrementalInstant(v))
+    } else None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala`
around lines 1148 - 1156, The start/end instant normalization currently always
runs; restrict it to only run for incremental queries by checking the query type
(compare queryType against DataSourceReadOptions.QUERY_TYPE possible incremental
value) before calling HoodieSqlCommonUtils.formatIncrementalInstant. Concretely,
wrap the normalizedStartCommit/normalizedEndCommit creation so they are computed
only when QUERY_TYPE.key -> queryType indicates an incremental read (otherwise
leave them as None), and then merge those option pairs into the final Map as
before.
hudi-common/src/main/java/org/apache/hudi/util/PartitionPathFilterUtil.java (1)

26-37: Harden null handling in shared predicate utility.

At Line 29 and Line 36, null relativePathPrefixes or null path can trigger runtime failures. Since this is a shared utility, add explicit guards/fail-fast handling to keep call-sites safer.

Suggested hardening
 public class PartitionPathFilterUtil {
 
   public static Predicate<String> relativePathPrefixPredicate(List<String> relativePathPrefixes) {
+    if (relativePathPrefixes == null) {
+      return path -> false;
+    }
     return path -> relativePathPrefixes.stream().anyMatch(relativePathPrefix ->
         // Partition paths stored in metadata table do not have the slash at the end.
         // If the relativePathPrefix is empty, return all partition paths;
         // else if the relative path prefix is the same as the path, this is an exact match;
         // else, we need to make sure the path is a subdirectory of relativePathPrefix, by
         // checking if the path starts with relativePathPrefix appended by a slash ("/").
         StringUtils.isNullOrEmpty(relativePathPrefix)
-            || path.equals(relativePathPrefix) || path.startsWith(relativePathPrefix + "/"));
+            || (path != null && (path.equals(relativePathPrefix) || path.startsWith(relativePathPrefix + "/"))));
   }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hudi-common/src/main/java/org/apache/hudi/util/PartitionPathFilterUtil.java`
around lines 26 - 37, Ensure relativePathPrefixes is non-null and make the
returned predicate null-safe: in
PartitionPathFilterUtil.relativePathPrefixPredicate call
Objects.requireNonNull(relativePathPrefixes, "relativePathPrefixes must not be
null") to fail fast if the list is null, and change the lambda to first check
path != null (return false for null path) before evaluating the stream anyMatch
over relativePathPrefixes (the existing
StringUtils.isNullOrEmpty(relativePathPrefix) already handles null prefixes),
referencing the method relativePathPrefixPredicate, the parameter
relativePathPrefixes, the lambda variable path, and the stream variable
relativePathPrefix.
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java (1)

374-380: Minor typo in comment.

"recommiting" should be "recommitting".

📝 Fix typo
   `@Override`
   public void subtaskReset(int i, long resetCkpId) {
-    // There exists pending instants waiting for recommiting.
+    // There exists pending instants waiting for recommitting.
     if (tableState.isRLIWithBootstrap && !eventBuffers.getPendingInstantsBefore(resetCkpId).isEmpty()) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java`
around lines 374 - 380, In subtaskReset (use symbols: subtaskReset,
tableState.isRLIWithBootstrap, eventBuffers.getPendingInstantsBefore,
executor.executeSync, commitInstants) fix the comment typo by changing
"recommiting" to "recommitting" so the inline comment reads "use sync execution
here to make sure the recommitting finishes before RLI bootstrapping".
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/utils/StreamWriteFunctionWrapper.java (1)

428-447: Exception type CompletionException may mask root cause during debugging.

Wrapping exceptions in CompletionException at line 445 could make debugging harder since CompletionException is typically used for async operations. Consider using HoodieException or rethrowing as a RuntimeException to maintain consistency with other exception handling in this test utility.

♻️ Suggested alternative exception handling
     } catch (Exception e) {
-      throw new CompletionException(e);
+      throw new RuntimeException("Failed to setup index bootstrap function", e);
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/utils/StreamWriteFunctionWrapper.java`
around lines 428 - 447, The test helper method setupIndexBootstrapFunction
currently catches Exception and rethrows it as a CompletionException which can
obscure the real root cause; change the catch to rethrow a more appropriate
unchecked exception used across the codebase (e.g., HoodieException) or at
minimum a RuntimeException to preserve stack traces and consistency—update the
catch block that now does "throw new CompletionException(e)" to instead throw
new HoodieException(e) (or new RuntimeException(e)) so failures from
bootstrapOperator.initializeState or bucketAssignerFunction.processElement
surface clearly.
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/TestHoodieTableFactory.java (1)

231-238: Consider cleaning up config after test scenario.

The test adds DEFER_RLI_INIT_FOR_FRESH_TABLE configuration but doesn't clean it up. While this test method ends after the assertion, leaving the configuration set could affect subsequent tests if the test order changes or if additional scenarios are added to this method.

♻️ Add config cleanup
     final MockContext sourceContext6 = MockContext.getInstance(this.conf, schema, "f2");
     assertThrows(IllegalArgumentException.class, () -> new HoodieTableFactory().createDynamicTableSink(sourceContext6));
+    this.conf.removeConfig(FlinkOptions.INDEX_TYPE);
+    this.conf.removeConfig(FlinkOptions.METADATA_ENABLED);
+    this.conf.removeConfig(FlinkOptions.INDEX_GLOBAL_ENABLED);
+    this.conf.setString(HoodieMetadataConfig.DEFER_RLI_INIT_FOR_FRESH_TABLE.key(), "false");
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/TestHoodieTableFactory.java`
around lines 231 - 238, The test sets
HoodieMetadataConfig.DEFER_RLI_INIT_FOR_FRESH_TABLE in this.conf but does not
restore or clear it, which can leak state to other tests; wrap the
creation/assertion that uses the modified config (the block creating
MockContext.getInstance(this.conf, schema, "f2") and calling new
HoodieTableFactory().createDynamicTableSink(...)) in a try/finally (or reset the
conf afterwards) and in the finally remove or reset the
DEFER_RLI_INIT_FOR_FRESH_TABLE entry from this.conf so the config is cleaned up
for subsequent tests.
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/CompactionCommitSink.java (1)

76-80: Consider null-check for compactCommitHandler in close().

If open() fails after the factory call or before completing, compactCommitHandler could potentially be null when close() is invoked (though Flink typically won't call close() if open() throws). A defensive null check would be safer.

🛡️ Optional defensive fix
   `@Override`
   public void close() throws Exception {
+    if (compactCommitHandler != null) {
       compactCommitHandler.close();
+    }
     super.close();
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/CompactionCommitSink.java`
around lines 76 - 80, The close() method may call compactCommitHandler.close()
when compactCommitHandler is null if open() failed, so modify
CompactionCommitSink.close() to defensively check compactCommitHandler != null
before invoking compactCommitHandler.close(); ensure you still call
super.close() afterward (and preserve the method signature throws Exception) so
resources are cleaned even when the handler was never initialized.
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/MetadataTableCompactionCommitHandler.java (1)

82-100: Minor: Stray blank line between @Override and method signature.

Line 83 has a blank line between @Override and protected void completeCompaction(...). This is unconventional and may cause style checker warnings.

🔧 Suggested fix
   `@Override`
-
   protected void completeCompaction(String instant,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/handler/MetadataTableCompactionCommitHandler.java`
around lines 82 - 100, Remove the stray blank line between the `@Override`
annotation and the method signature for completeCompaction in
MetadataTableCompactionCommitHandler; ensure the `@Override` directly precedes the
protected void completeCompaction(String instant, boolean isLogCompaction,
List<WriteStatus> statuses) throws IOException declaration so the annotation
attaches correctly and satisfies style checks.
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/CompactOperator.java (1)

125-131: Consider null-check for compactHandler in close().

Similar to CompactionCommitSink, if open() fails before the handler is assigned, compactHandler.close() at line 130 could throw NPE.

🛡️ Optional defensive fix
   `@Override`
   public void close() throws Exception {
     if (null != this.executor) {
       this.executor.close();
     }
+    if (compactHandler != null) {
       compactHandler.close();
+    }
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/CompactOperator.java`
around lines 125 - 131, The close() implementation in CompactOperator calls
compactHandler.close() without a null-check which can NPE if open() failed
before compactHandler was assigned; update CompactOperator.close() to only call
compactHandler.close() when compactHandler != null (mirroring the defensive
pattern used for executor and in CompactionCommitSink), and ensure any
exceptions during close are propagated or logged consistently with existing
resource cleanup semantics.

Comment on lines +39 to +49
# Parse command-line arguments
while [[ "$#" -gt 0 ]]; do
case $1 in
--hadoop-version) HADOOP_VERSION="$2"; shift ;;
--spark-version) SPARK_VERSION="$2"; shift ;;
--hive-version) HIVE_VERSION="$2"; shift ;;
--version-tag) VERSION_TAG_ARG="$2"; shift ;;
*) echo "Unknown parameter passed: $1"; exit 1 ;;
esac
shift
done
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, find the file
find . -name "build_docker_images.sh" -type f

Repository: yihua/hudi

Length of output: 85


🏁 Script executed:

cd /repo && cat -n docker/build_docker_images.sh | head -60

Repository: yihua/hudi

Length of output: 109


🏁 Script executed:

cat -n docker/build_docker_images.sh | head -60

Repository: yihua/hudi

Length of output: 2720


🏁 Script executed:

# Test the vulnerability claim by simulating the argument parsing
bash -x docker/build_docker_images.sh --hadoop-version 2>&1 | head -50

Repository: yihua/hudi

Length of output: 1235


🏁 Script executed:

# Test with the fix to verify it would work correctly
cat > /tmp/test_fix.sh << 'EOF'
#!/bin/bash

HADOOP_VERSION="2.8.4"
SPARK_VERSION="3.5.3"
HIVE_VERSION="2.3.10"
VERSION_TAG_ARG=""

# Parse with validation (the suggested fix)
while [[ "$#" -gt 0 ]]; do
    case "$1" in
        --hadoop-version)
            [[ -n "${2:-}" && "$2" != --* ]] || { echo "Error: --hadoop-version requires a value" >&2; exit 1; }
            HADOOP_VERSION="$2"; shift 2 ;;
        --spark-version)
            [[ -n "${2:-}" && "$2" != --* ]] || { echo "Error: --spark-version requires a value" >&2; exit 1; }
            SPARK_VERSION="$2"; shift 2 ;;
        --hive-version)
            [[ -n "${2:-}" && "$2" != --* ]] || { echo "Error: --hive-version requires a value" >&2; exit 1; }
            HIVE_VERSION="$2"; shift 2 ;;
        --version-tag)
            [[ -n "${2:-}" && "$2" != --* ]] || { echo "Error: --version-tag requires a value" >&2; exit 1; }
            VERSION_TAG_ARG="$2"; shift 2 ;;
        *) echo "Unknown parameter passed: $1" >&2; exit 1 ;;
    esac
done

echo "HADOOP_VERSION=$HADOOP_VERSION"
echo "SPARK_VERSION=$SPARK_VERSION"
echo "HIVE_VERSION=$HIVE_VERSION"
echo "VERSION_TAG_ARG=$VERSION_TAG_ARG"
EOF

# Test the original issue
echo "=== Testing original code issue ==="
bash -x docker/build_docker_images.sh --hadoop-version 2>&1 | grep -A2 "HADOOP_VERSION="

# Test fixed version with missing value
echo -e "\n=== Testing fixed version (should fail with missing value) ==="
bash /tmp/test_fix.sh --hadoop-version 2>&1 || echo "Exit code: $?"

# Test fixed version with valid value
echo -e "\n=== Testing fixed version (should succeed with valid value) ==="
bash /tmp/test_fix.sh --hadoop-version 2.10.0 --spark-version 3.2.0

Repository: yihua/hudi

Length of output: 455


Missing value validation for CLI flags allows silent failures with empty variables.

At lines 42–45, the script assumes $2 exists without validation. Passing --hadoop-version (or similar) without a value silently sets the variable to an empty string instead of reporting an error.

Test case: Running ./build_docker_images.sh --hadoop-version sets HADOOP_VERSION="" instead of failing.

Add validation to ensure each flag receives a non-empty, non-flag argument:

Suggested fix
 while [[ "$#" -gt 0 ]]; do
-    case $1 in
-        --hadoop-version) HADOOP_VERSION="$2"; shift ;;
-        --spark-version) SPARK_VERSION="$2"; shift ;;
-        --hive-version) HIVE_VERSION="$2"; shift ;;
-        --version-tag) VERSION_TAG_ARG="$2"; shift ;;
-        *) echo "Unknown parameter passed: $1"; exit 1 ;;
-    esac
-    shift
+    case "$1" in
+        --hadoop-version)
+            [[ -n "${2:-}" && "$2" != --* ]] || { echo "Error: --hadoop-version requires a value" >&2; exit 1; }
+            HADOOP_VERSION="$2"; shift 2 ;;
+        --spark-version)
+            [[ -n "${2:-}" && "$2" != --* ]] || { echo "Error: --spark-version requires a value" >&2; exit 1; }
+            SPARK_VERSION="$2"; shift 2 ;;
+        --hive-version)
+            [[ -n "${2:-}" && "$2" != --* ]] || { echo "Error: --hive-version requires a value" >&2; exit 1; }
+            HIVE_VERSION="$2"; shift 2 ;;
+        --version-tag)
+            [[ -n "${2:-}" && "$2" != --* ]] || { echo "Error: --version-tag requires a value" >&2; exit 1; }
+            VERSION_TAG_ARG="$2"; shift 2 ;;
+        *) echo "Unknown parameter passed: $1" >&2; exit 1 ;;
+    esac
 done
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Parse command-line arguments
while [[ "$#" -gt 0 ]]; do
case $1 in
--hadoop-version) HADOOP_VERSION="$2"; shift ;;
--spark-version) SPARK_VERSION="$2"; shift ;;
--hive-version) HIVE_VERSION="$2"; shift ;;
--version-tag) VERSION_TAG_ARG="$2"; shift ;;
*) echo "Unknown parameter passed: $1"; exit 1 ;;
esac
shift
done
# Parse command-line arguments
while [[ "$#" -gt 0 ]]; do
case "$1" in
--hadoop-version)
[[ -n "${2:-}" && "$2" != --* ]] || { echo "Error: --hadoop-version requires a value" >&2; exit 1; }
HADOOP_VERSION="$2"; shift 2 ;;
--spark-version)
[[ -n "${2:-}" && "$2" != --* ]] || { echo "Error: --spark-version requires a value" >&2; exit 1; }
SPARK_VERSION="$2"; shift 2 ;;
--hive-version)
[[ -n "${2:-}" && "$2" != --* ]] || { echo "Error: --hive-version requires a value" >&2; exit 1; }
HIVE_VERSION="$2"; shift 2 ;;
--version-tag)
[[ -n "${2:-}" && "$2" != --* ]] || { echo "Error: --version-tag requires a value" >&2; exit 1; }
VERSION_TAG_ARG="$2"; shift 2 ;;
*) echo "Unknown parameter passed: $1" >&2; exit 1 ;;
esac
done
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/build_docker_images.sh` around lines 39 - 49, The argument-parsing
loop (while [[ "$#" -gt 0 ]], case entries for --hadoop-version,
--spark-version, --hive-version, --version-tag) currently assigns $2 without
validation, so flags with missing values result in empty variables; update each
flag branch to validate that $2 exists and is not another flag (e.g., test -n
"$2" and [[ "$2" != -* ]] or ! -z "$2" && "${2:0:1}" != "-"), and if the check
fails print a clear error mentioning the specific flag (e.g., "--hadoop-version
requires a value") and exit non-zero, otherwise assign the value to
HADOOP_VERSION/SPARK_VERSION/HIVE_VERSION/VERSION_TAG_ARG and shift as before.

Comment on lines +308 to +345
if (!lockProps.containsKey(LockConfiguration.FILESYSTEM_LOCK_PATH_PROP_KEY)) {
throw new IllegalArgumentException(LockConfiguration.FILESYSTEM_LOCK_PATH_PROP_KEY
+ " must be explicitly set when using " + lockProviderClass
+ ". Without it, the lock path is derived from the table's base path,"
+ " which would resolve to a different location when building a lock config for a different table.");
}
} else if (ZookeeperBasedLockProvider.class.getCanonicalName().equals(lockProviderClass)) {
copyPropsWithPrefix(dataProps, lockProps, LockConfiguration.ZOOKEEPER_BASED_LOCK_PROPERTY_PREFIX);
if (!lockProps.containsKey(LockConfiguration.ZK_LOCK_KEY_PROP_KEY)) {
throw new IllegalArgumentException(LockConfiguration.ZK_LOCK_KEY_PROP_KEY
+ " must be explicitly set when using " + lockProviderClass
+ ". The inferred default from table name is not propagated"
+ " when building a lock config for a different table.");
}
} else if (ZookeeperBasedImplicitBasePathLockProvider.class.getCanonicalName().equals(lockProviderClass)) {
throw new IllegalArgumentException(lockProviderClass
+ " is not supported because it derives its lock identity from the table's base path,"
+ " which would resolve to a different lock when building a lock config for a different table."
+ " Use " + ZookeeperBasedLockProvider.class.getCanonicalName() + " with an explicit "
+ LockConfiguration.ZK_LOCK_KEY_PROP_KEY + " instead.");
} else if (HIVE_METASTORE_BASED_LOCK_PROVIDER_CLASS.equals(lockProviderClass)) {
copyPropsWithPrefix(dataProps, lockProps, LockConfiguration.HIVE_METASTORE_LOCK_PROPERTY_PREFIX);
if (!lockProps.containsKey(LockConfiguration.HIVE_DATABASE_NAME_PROP_KEY)) {
throw new IllegalArgumentException(LockConfiguration.HIVE_DATABASE_NAME_PROP_KEY
+ " must be explicitly set when using " + lockProviderClass);
}
if (!lockProps.containsKey(LockConfiguration.HIVE_TABLE_NAME_PROP_KEY)) {
throw new IllegalArgumentException(LockConfiguration.HIVE_TABLE_NAME_PROP_KEY
+ " must be explicitly set when using " + lockProviderClass);
}
} else if (DYNAMODB_BASED_LOCK_PROVIDER_CLASS.equals(lockProviderClass)) {
copyPropsWithPrefix(dataProps, lockProps, DYNAMODB_BASED_LOCK_PROPERTY_PREFIX);
if (!lockProps.containsKey(DYNAMODB_LOCK_PARTITION_KEY_PROP_KEY)) {
throw new IllegalArgumentException(DYNAMODB_LOCK_PARTITION_KEY_PROP_KEY
+ " must be explicitly set when using " + lockProviderClass
+ ". The inferred default from table name is not propagated"
+ " when building a lock config for a different table.");
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate required lock-identity properties as non-blank, not just present.

Current checks use containsKey(...), so empty values pass validation and can still produce invalid lock identities (for example an empty ZK lock key or filesystem lock path).

Proposed fix
-      if (!lockProps.containsKey(LockConfiguration.FILESYSTEM_LOCK_PATH_PROP_KEY)) {
+      if (isBlank(lockProps.getProperty(LockConfiguration.FILESYSTEM_LOCK_PATH_PROP_KEY))) {
         throw new IllegalArgumentException(LockConfiguration.FILESYSTEM_LOCK_PATH_PROP_KEY
             + " must be explicitly set when using " + lockProviderClass
             + ". Without it, the lock path is derived from the table's base path,"
             + " which would resolve to a different location when building a lock config for a different table.");
       }
@@
-      if (!lockProps.containsKey(LockConfiguration.ZK_LOCK_KEY_PROP_KEY)) {
+      if (isBlank(lockProps.getProperty(LockConfiguration.ZK_LOCK_KEY_PROP_KEY))) {
         throw new IllegalArgumentException(LockConfiguration.ZK_LOCK_KEY_PROP_KEY
             + " must be explicitly set when using " + lockProviderClass
             + ". The inferred default from table name is not propagated"
             + " when building a lock config for a different table.");
       }
@@
-      if (!lockProps.containsKey(LockConfiguration.HIVE_DATABASE_NAME_PROP_KEY)) {
+      if (isBlank(lockProps.getProperty(LockConfiguration.HIVE_DATABASE_NAME_PROP_KEY))) {
         throw new IllegalArgumentException(LockConfiguration.HIVE_DATABASE_NAME_PROP_KEY
             + " must be explicitly set when using " + lockProviderClass);
       }
-      if (!lockProps.containsKey(LockConfiguration.HIVE_TABLE_NAME_PROP_KEY)) {
+      if (isBlank(lockProps.getProperty(LockConfiguration.HIVE_TABLE_NAME_PROP_KEY))) {
         throw new IllegalArgumentException(LockConfiguration.HIVE_TABLE_NAME_PROP_KEY
             + " must be explicitly set when using " + lockProviderClass);
       }
@@
-      if (!lockProps.containsKey(DYNAMODB_LOCK_PARTITION_KEY_PROP_KEY)) {
+      if (isBlank(lockProps.getProperty(DYNAMODB_LOCK_PARTITION_KEY_PROP_KEY))) {
         throw new IllegalArgumentException(DYNAMODB_LOCK_PARTITION_KEY_PROP_KEY
             + " must be explicitly set when using " + lockProviderClass
             + ". The inferred default from table name is not propagated"
             + " when building a lock config for a different table.");
       }
@@
   private static void copyPropsWithPrefix(TypedProperties source, Properties target, String prefix) {
     for (String key : source.stringPropertyNames()) {
       if (key.startsWith(prefix)) {
         target.setProperty(key, source.getProperty(key));
       }
     }
   }
+
+  private static boolean isBlank(String value) {
+    return value == null || value.trim().isEmpty();
+  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieLockConfig.java`
around lines 308 - 345, The current validations in HoodieLockConfig use
lockProps.containsKey(...) which allows empty values; update each validation
after copyPropsWithPrefix(...) to fetch the value (e.g.,
lockProps.get(LockConfiguration.FILESYSTEM_LOCK_PATH_PROP_KEY)) and throw
IllegalArgumentException when the value is null or blank
(value.trim().isEmpty()), not just when the key is missing. Apply this non-blank
check for the keys LockConfiguration.FILESYSTEM_LOCK_PATH_PROP_KEY,
LockConfiguration.ZK_LOCK_KEY_PROP_KEY,
LockConfiguration.HIVE_DATABASE_NAME_PROP_KEY,
LockConfiguration.HIVE_TABLE_NAME_PROP_KEY and
DYNAMODB_LOCK_PARTITION_KEY_PROP_KEY in the same conditional branches (those
that reference ZookeeperBasedLockProvider,
ZookeeperBasedImplicitBasePathLockProvider,
HIVE_METASTORE_BASED_LOCK_PROVIDER_CLASS and
DYNAMODB_BASED_LOCK_PROVIDER_CLASS), leaving copyPropsWithPrefix(...) usage
intact.

Comment on lines +158 to +168
/**
* Generates the header for a rollback command block.
*/
private Map<HoodieLogBlock.HeaderMetadataType, String> generateHeader(String commit) {
Map<HoodieLogBlock.HeaderMetadataType, String> header = new HashMap<>(3);
header.put(HoodieLogBlock.HeaderMetadataType.INSTANT_TIME, metaClient.getActiveTimeline().lastInstant().get().requestedTime());
header.put(HoodieLogBlock.HeaderMetadataType.TARGET_INSTANT_TIME, commit);
header.put(HoodieLogBlock.HeaderMetadataType.COMMAND_BLOCK_TYPE,
String.valueOf(HoodieCommandBlock.HoodieCommandBlockTypeEnum.ROLLBACK_BLOCK.ordinal()));
return header;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Potential NoSuchElementException if timeline has no instants.

Line 163 calls .get() on metaClient.getActiveTimeline().lastInstant() without checking if the Option is present. If the active timeline is empty, this will throw NoSuchElementException.

Consider adding a presence check or using orElseThrow with a descriptive exception:

🛡️ Proposed fix
 private Map<HoodieLogBlock.HeaderMetadataType, String> generateHeader(String commit) {
   Map<HoodieLogBlock.HeaderMetadataType, String> header = new HashMap<>(3);
-  header.put(HoodieLogBlock.HeaderMetadataType.INSTANT_TIME, metaClient.getActiveTimeline().lastInstant().get().requestedTime());
+  header.put(HoodieLogBlock.HeaderMetadataType.INSTANT_TIME,
+      metaClient.getActiveTimeline().lastInstant()
+          .orElseThrow(() -> new HoodieRollbackException("Cannot generate rollback header: active timeline has no instants"))
+          .requestedTime());
   header.put(HoodieLogBlock.HeaderMetadataType.TARGET_INSTANT_TIME, commit);
   header.put(HoodieLogBlock.HeaderMetadataType.COMMAND_BLOCK_TYPE,
       String.valueOf(HoodieCommandBlock.HoodieCommandBlockTypeEnum.ROLLBACK_BLOCK.ordinal()));
   return header;
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/**
* Generates the header for a rollback command block.
*/
private Map<HoodieLogBlock.HeaderMetadataType, String> generateHeader(String commit) {
Map<HoodieLogBlock.HeaderMetadataType, String> header = new HashMap<>(3);
header.put(HoodieLogBlock.HeaderMetadataType.INSTANT_TIME, metaClient.getActiveTimeline().lastInstant().get().requestedTime());
header.put(HoodieLogBlock.HeaderMetadataType.TARGET_INSTANT_TIME, commit);
header.put(HoodieLogBlock.HeaderMetadataType.COMMAND_BLOCK_TYPE,
String.valueOf(HoodieCommandBlock.HoodieCommandBlockTypeEnum.ROLLBACK_BLOCK.ordinal()));
return header;
}
/**
* Generates the header for a rollback command block.
*/
private Map<HoodieLogBlock.HeaderMetadataType, String> generateHeader(String commit) {
Map<HoodieLogBlock.HeaderMetadataType, String> header = new HashMap<>(3);
header.put(HoodieLogBlock.HeaderMetadataType.INSTANT_TIME,
metaClient.getActiveTimeline().lastInstant()
.orElseThrow(() -> new HoodieRollbackException("Cannot generate rollback header: active timeline has no instants"))
.requestedTime());
header.put(HoodieLogBlock.HeaderMetadataType.TARGET_INSTANT_TIME, commit);
header.put(HoodieLogBlock.HeaderMetadataType.COMMAND_BLOCK_TYPE,
String.valueOf(HoodieCommandBlock.HoodieCommandBlockTypeEnum.ROLLBACK_BLOCK.ordinal()));
return header;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/RollbackHelperV1.java`
around lines 158 - 168, The generateHeader method calls
metaClient.getActiveTimeline().lastInstant().get() which can throw
NoSuchElementException if the active timeline is empty; update generateHeader to
check presence of lastInstant() (e.g., use lastInstant().isPresent() or
lastInstant().orElseThrow(...)) and handle the empty-case by either throwing a
clear descriptive exception (with context including commit) or providing a
sensible default; ensure you reference
HoodieLogBlock.HeaderMetadataType.INSTANT_TIME and set its value only when the
instant is present (or use the explicit orElseThrow) so generateHeader never
blindly calls .get().

Comment on lines +551 to +553
} else if (dataType instanceof StructType
&& SparkAdapterSupport$.MODULE$.sparkAdapter().isVariantShreddingStruct((StructType) dataType)) {
return makeShreddedVariantWriter((StructType) dataType);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Pass the original HoodieSchema into shredded child writers.

makeShreddedVariantWriter() rebuilds every nested writer with null schema, so logical-type handling inside typed_value silently falls back to defaults. That is unsafe for types whose Parquet encoding depends on HoodieSchema rather than Spark DataType alone — most notably VECTOR and millisecond TIMESTAMP. In those cases convertField() still emits the Parquet type from the original schema, but the writer serializes a different physical shape. Example: a shredded VECTOR typed value will get a FIXED_LEN_BYTE_ARRAY schema and a LIST writer.

💡 Proposed fix
-    } else if (dataType instanceof StructType
-        && SparkAdapterSupport$.MODULE$.sparkAdapter().isVariantShreddingStruct((StructType) dataType)) {
-      return makeShreddedVariantWriter((StructType) dataType);
+    } else if (dataType instanceof StructType
+        && SparkAdapterSupport$.MODULE$.sparkAdapter().isVariantShreddingStruct((StructType) dataType)) {
+      return makeShreddedVariantWriter((StructType) dataType, resolvedSchema);
     } else if (dataType instanceof StructType) {
       StructType structType = (StructType) dataType;
       ValueWriter[] fieldWriters = getFieldWriters(structType, resolvedSchema);
@@
-  private ValueWriter makeShreddedVariantWriter(StructType shreddedStructType) {
+  private ValueWriter makeShreddedVariantWriter(StructType shreddedStructType, HoodieSchema sourceSchema) {
     // Create writers for the shredded struct fields
     // The shreddedStructType contains: metadata (binary), value (binary), typed_value (optional)
     ValueWriter[] shreddedFieldWriters = Arrays.stream(shreddedStructType.fields())
-        .map(field -> makeWriter(null, field.dataType()))
+        .map(field -> {
+          HoodieSchema fieldSchema = Option.ofNullable(sourceSchema)
+              .map(HoodieSchema::getNonNullType)
+              .flatMap(s -> s.hasFields() ? s.getField(field.name()) : Option.empty())
+              .map(HoodieSchemaField::schema)
+              .orElse(null);
+          return makeWriter(fieldSchema, field.dataType());
+        })
         .toArray(ValueWriter[]::new);

Also applies to: 573-578

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/row/HoodieRowParquetWriteSupport.java`
around lines 551 - 553, The shredded-variant writer is being built with null
schema which causes nested writers to lose HoodieSchema-driven logical-type info
(affecting VECTOR, millisecond TIMESTAMP, etc.); update
makeShreddedVariantWriter in HoodieRowParquetWriteSupport to accept the original
HoodieSchema (pass the HoodieSchema from the caller where you currently call
makeShreddedVariantWriter) and propagate that HoodieSchema into all child-writer
construction paths so convertField/typed_value logic uses the provided
HoodieSchema rather than null; also update the other call sites that build
shredded writers (the similar calls around the 573-578 area) to pass the same
HoodieSchema through.

Comment on lines 361 to +368
protected List<PartitionPath> listPartitionPaths(List<String> relativePartitionPaths,
Types.RecordType partitionFields,
Expression partitionColumnPredicates) {
Expression partitionColumnPredicates,
List<Object> partitionPredicateExpressions) {
List<String> matchedPartitionPaths;
try {
matchedPartitionPaths = tableMetadata.getPartitionPathWithPathPrefixUsingFilterExpression(relativePartitionPaths,
partitionFields, partitionColumnPredicates);
partitionFields, partitionColumnPredicates, partitionPredicateExpressions);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Preserve incremental partition scoping in this new overload.

Line 361 adds a second listPartitionPaths(...) path, but unlike the existing overload it never applies the incremental-read restriction from TimelineUtils.getWrittenPartitions(findInstantsInRange()). That means partition-filtered incremental queries can list partitions that match the predicate but were not written in the requested range, and then enumerate the wrong file slices. Please keep the incremental branch here too, or intersect the filtered result with the written partitions before converting to PartitionPath.

🐛 Suggested fix
 protected List<PartitionPath> listPartitionPaths(List<String> relativePartitionPaths,
                                                  Types.RecordType partitionFields,
                                                  Expression partitionColumnPredicates,
                                                  List<Object> partitionPredicateExpressions) {
   List<String> matchedPartitionPaths;
   try {
     matchedPartitionPaths = tableMetadata.getPartitionPathWithPathPrefixUsingFilterExpression(relativePartitionPaths,
         partitionFields, partitionColumnPredicates, partitionPredicateExpressions);
+    if (queryType == HoodieTableQueryType.INCREMENTAL
+        && incrementalQueryStartTime.isPresent()
+        && !isBeforeTimelineStarts()) {
+      Set<String> writtenPartitions = TimelineUtils.getWrittenPartitions(findInstantsInRange()).stream()
+          .collect(Collectors.toSet());
+      matchedPartitionPaths = matchedPartitionPaths.stream()
+          .filter(writtenPartitions::contains)
+          .collect(Collectors.toList());
+    }
   } catch (IOException e) {
     throw new HoodieIOException("Error fetching partition paths", e);
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java`
around lines 361 - 368, The new overload of listPartitionPaths in
BaseHoodieTableFileIndex currently calls
tableMetadata.getPartitionPathWithPathPrefixUsingFilterExpression(...) but omits
the incremental-read scoping; update this method to apply the same incremental
partition restriction used by the other overload by obtaining the written
partitions via TimelineUtils.getWrittenPartitions(findInstantsInRange()) (or by
intersecting the matchedPartitionPaths result with that written-partitions set)
before converting to List<PartitionPath>, ensuring only partitions written in
the requested instant range are returned.

Comment on lines +169 to +175
@Override
public void skipBytesToRead(int numBytes) throws IOException {
while (numBytes > 0) {
int skipped = skipBytes(numBytes);
numBytes -= skipped;
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Potential infinite loop when end-of-stream is reached.

DataInputStream.skipBytes(int) can return 0 when the end of stream is reached or when no bytes are available. This would cause an infinite loop since numBytes would never decrease.

🐛 Proposed fix to add end-of-stream check
     `@Override`
     public void skipBytesToRead(int numBytes) throws IOException {
       while (numBytes > 0) {
         int skipped = skipBytes(numBytes);
+        if (skipped <= 0) {
+          throw new IOException("Unable to skip remaining " + numBytes + " bytes; end of stream reached");
+        }
         numBytes -= skipped;
       }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cdc/CdcImageManager.java`
around lines 169 - 175, The skipBytesToRead method can loop forever if
skipBytes(numBytes) returns 0 at end-of-stream; modify
CdcImageManager.skipBytesToRead to detect skipped == 0 and throw an EOFException
(or IOException) so the caller fails instead of spinning, or alternatively call
readFully to consume remaining bytes; specifically update skipBytesToRead to
check the return value of skipBytes and break/throw on zero, referencing the
skipBytesToRead and skipBytes methods.

Comment on lines +111 to +126
@Override
public boolean hasNext() {
if (recordIterator != null) {
if (recordIterator.hasNext()) {
return true;
} else {
recordIterator.close();
recordIterator = null;
}
}
if (fileSplitIterator.hasNext()) {
recordIterator = recordIteratorFunc.apply(fileSplitIterator.next());
return recordIterator.hasNext();
}
return false;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Empty splits cause premature termination and potential resource leak.

If recordIteratorFunc returns an iterator with no elements, hasNext() returns false immediately on line 123 without:

  1. Closing the empty recordIterator
  2. Continuing to try the next file split

This can cause the iterator to incorrectly report no more data when subsequent splits may have records.

🐛 Proposed fix using a loop to skip empty splits
     `@Override`
     public boolean hasNext() {
-      if (recordIterator != null) {
-        if (recordIterator.hasNext()) {
-          return true;
-        } else {
-          recordIterator.close();
-          recordIterator = null;
+      while (true) {
+        if (recordIterator != null) {
+          if (recordIterator.hasNext()) {
+            return true;
+          } else {
+            recordIterator.close();
+            recordIterator = null;
+          }
         }
-      }
-      if (fileSplitIterator.hasNext()) {
-        recordIterator = recordIteratorFunc.apply(fileSplitIterator.next());
-        return recordIterator.hasNext();
+        if (fileSplitIterator.hasNext()) {
+          recordIterator = recordIteratorFunc.apply(fileSplitIterator.next());
+        } else {
+          return false;
+        }
       }
-      return false;
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@Override
public boolean hasNext() {
if (recordIterator != null) {
if (recordIterator.hasNext()) {
return true;
} else {
recordIterator.close();
recordIterator = null;
}
}
if (fileSplitIterator.hasNext()) {
recordIterator = recordIteratorFunc.apply(fileSplitIterator.next());
return recordIterator.hasNext();
}
return false;
}
`@Override`
public boolean hasNext() {
while (true) {
if (recordIterator != null) {
if (recordIterator.hasNext()) {
return true;
} else {
recordIterator.close();
recordIterator = null;
}
}
if (fileSplitIterator.hasNext()) {
recordIterator = recordIteratorFunc.apply(fileSplitIterator.next());
} else {
return false;
}
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cdc/CdcIterators.java`
around lines 111 - 126, hasNext() currently returns false if recordIteratorFunc
produces an empty iterator, leaking the empty recordIterator and stopping
iteration prematurely; change hasNext() so that when fileSplitIterator.hasNext()
yields a candidate, you create recordIterator via recordIteratorFunc.apply(...),
and if recordIterator.hasNext() is false you must call recordIterator.close(),
set recordIterator to null, and continue looping to the next file split (use a
while loop over fileSplitIterator) until you either find a non-empty
recordIterator or exhaust splits; ensure any closed/empty recordIterator is
nulled to avoid leaks.

Comment on lines 398 to 403
public boolean isAlreadyBootstrap() throws Exception {
return this.bootstrapOperator.isAlreadyBootstrap();
if (this.bootstrapOperator instanceof BootstrapOperator) {
return ((BootstrapOperator) this.bootstrapOperator).isAlreadyBootstrap();
}
return this.bootstrapOperator != null;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if RLIBootstrapOperator has an isAlreadyBootstrap method or inherits one
ast-grep --pattern 'class RLIBootstrapOperator {
  $$$
  isAlreadyBootstrap($$$) {
    $$$
  }
  $$$
}'

# Also check AbstractBootstrapOperator
rg -n "isAlreadyBootstrap" --type java -C 3

Repository: yihua/hudi

Length of output: 4008


🏁 Script executed:

# Search for RLIBootstrapOperator class definition
rg -n "class RLIBootstrapOperator" --type java -A 20

# Search for AbstractBootstrapOperator definition
rg -n "class AbstractBootstrapOperator\|interface AbstractBootstrapOperator" --type java -A 15

Repository: yihua/hudi

Length of output: 3054


🏁 Script executed:

# Search for AbstractBootstrapOperator definition
rg -n "abstract class AbstractBootstrapOperator\|class AbstractBootstrapOperator" --type java -A 50

# Also check complete RLIBootstrapOperator for isAlreadyBootstrap
rg -n "isAlreadyBootstrap" hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/RLIBootstrapOperator.java

Repository: yihua/hudi

Length of output: 36


🏁 Script executed:

# Find AbstractBootstrapOperator file
fd "AbstractBootstrapOperator" --type f

# Search for AbstractBootstrapOperator more broadly
rg -n "AbstractBootstrapOperator" --type java | head -20

Repository: yihua/hudi

Length of output: 1124


🏁 Script executed:

# Read AbstractBootstrapOperator class definition
cat -n hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/AbstractBootstrapOperator.java

Repository: yihua/hudi

Length of output: 4037


🏁 Script executed:

# Check BootstrapOperator to understand the full picture
cat -n hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/BootstrapOperator.java | grep -A 5 -B 5 "isAlreadyBootstrap"

Repository: yihua/hudi

Length of output: 377


Consider exposing bootstrap completion state for RLIBootstrapOperator.

The fallback logic return this.bootstrapOperator != null assumes any non-BootstrapOperator AbstractBootstrapOperator is fully bootstrapped when instantiated. Since RLIBootstrapOperator does not override or expose isAlreadyBootstrap(), the method cannot accurately reflect whether its bootstrap process has completed. Either add an isAlreadyBootstrap() method to RLIBootstrapOperator or define a default implementation in AbstractBootstrapOperator that all subclasses inherit.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/utils/StreamWriteFunctionWrapper.java`
around lines 398 - 403, The method
StreamWriteFunctionWrapper.isAlreadyBootstrap() incorrectly treats any
non-BootstrapOperator AbstractBootstrapOperator instance as already bootstrapped
by returning bootstrapOperator != null; update the design so the bootstrap
completion is explicitly queryable: add an isAlreadyBootstrap() implementation
to RLIBootstrapOperator (or add a default isAlreadyBootstrap() in
AbstractBootstrapOperator) and have StreamWriteFunctionWrapper call that method
instead of using the null check; ensure the symbol names involved are
AbstractBootstrapOperator.isAlreadyBootstrap(),
RLIBootstrapOperator.isAlreadyBootstrap(), and
StreamWriteFunctionWrapper.isAlreadyBootstrap() so the wrapper can accurately
reflect bootstrap completion.

Comment on lines +240 to +241
// Should not throw exception
function.read(cdcSplit);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Direct read(cdcSplit) call makes this test unstable and contradicts test intent.

Line 241 now performs real read without guarding expected downstream I/O failure. In this temp setup, that can fail for reasons unrelated to type acceptance and turn the test flaky/failing.

Proposed fix
-    // Should not throw exception
-    function.read(cdcSplit);
+    Exception ex = assertThrows(Exception.class, () -> function.read(cdcSplit));
+    if (ex instanceof IllegalArgumentException) {
+      throw new AssertionError("read() should accept HoodieCdcSourceSplit; IllegalArgumentException indicates type-guard rejection", ex);
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/reader/function/TestHoodieCdcSplitReaderFunction.java`
around lines 240 - 241, The direct call to function.read(cdcSplit) can surface
unrelated downstream I/O failures and make the test flaky; replace the plain
call with a guarded invocation that catches IOException from the read and
ignores or treats it as non-fatal for this test (while keeping any assertions
about type acceptance intact). Specifically, modify the test around
function.read(cdcSplit) to wrap the call in a try-catch for IOException
(catching exceptions thrown by function.read) and suppress/ignore those
IOExceptions so the test only validates the intended type-acceptance behavior of
the reader.

Comment on lines +100 to +104
val partitionPredicateExpressionSeq = partitionPredicateExpressions.asScala.map(_.asInstanceOf[Expression]).toSeq
filterPartitionsBasedOnRelativePathPrefixes(relativePathPrefix,
sparkSession.sessionState.catalog.externalCatalog
.listPartitionsByFilter(catalogDatabaseName, catalogTableName, partitionPredicateExpressionSeq,
SQLConf.get.sessionLocalTimeZone))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard nullable/heterogeneous predicate list before Spark-catalyst casting.

At Line 100, direct .asScala.map(_.asInstanceOf[Expression]) can fail with NPE (null list) or ClassCastException (unexpected object type), causing partition listing failures in catalog mode.

Defensive fix
-      val partitionPredicateExpressionSeq = partitionPredicateExpressions.asScala.map(_.asInstanceOf[Expression]).toSeq
+      val partitionPredicateExpressionSeq: Seq[Expression] =
+        Option(partitionPredicateExpressions)
+          .map(_.asScala.map {
+            case e: Expression => e
+            case other => throw new IllegalArgumentException(
+              s"Unsupported partition predicate expression type: ${other.getClass.getName}")
+          }.toSeq)
+          .getOrElse(Seq.empty)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/metadata/CatalogBackedTableMetadata.scala`
around lines 100 - 104, The partition predicate list is being cast unsafely in
CatalogBackedTableMetadata, risking NPE or ClassCastException; before calling
sparkSession.sessionState.catalog.externalCatalog.listPartitionsByFilter,
defensively handle partitionPredicateExpressions by treating null as empty and
only converting/casting values that are actually catalyst Expressions (e.g., use
an Option or null check and a safe collect/flatMap to keep only instances of
org.apache.spark.sql.catalyst.expressions.Expression), producing a
Seq[Expression] to pass into filterPartitionsBasedOnRelativePathPrefixes so no
nulls or wrong-typed elements reach the catalog call.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java (1)

1420-1436: Consider filtering empty strings before validation for clearer error messages.

The validation doesn't explicitly filter empty/whitespace-only entries. If a user provides "compaction," (trailing comma), the split produces ["compaction", ""], and ActionType.valueOf("") throws with a less descriptive error: "Unknown metadata table service manager action: ".

This is consistent with the strict validation philosophy (reject malformed input), but the error message could be clearer. The runtime check in HoodieTableServiceManagerConfig.isEnabledAndActionSupported() filters empty strings, so this is only a validation UX concern.

💡 Optional improvement for clearer error messages
 private static void validateTableServiceManagerActions(String actions) {
   for (String action : actions.split(",")) {
     String trimmed = action.trim();
+    if (trimmed.isEmpty()) {
+      throw new IllegalArgumentException("Empty action found in metadata table service manager actions. "
+          + "Supported actions are: " + SUPPORTED_TABLE_SERVICE_MANAGER_ACTIONS);
+    }
     ActionType actionType;
     try {
       actionType = ActionType.valueOf(trimmed);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java`
around lines 1420 - 1436, The validateTableServiceManagerActions method
currently passes empty/whitespace tokens to ActionType.valueOf causing unclear
errors; update validateTableServiceManagerActions to trim and skip any empty
tokens (e.g., after split, if trimmed.isEmpty() continue) or explicitly throw a
clearer IllegalArgumentException for empty entries, and keep the existing checks
against SUPPORTED_TABLE_SERVICE_MANAGER_ACTIONS for non-empty tokens; reference
validateTableServiceManagerActions and hoodie config helper
HoodieTableServiceManagerConfig.isEnabledAndActionSupported to ensure consistent
behavior with the runtime filter.
hudi-client/hudi-client-common/src/test/java/org/apache/hudi/metadata/TestHoodieMetadataWriteUtils.java (1)

329-351: Minor duplication in custom lock provider rejection tests.

testCreateMetadataWriteConfigRejectsCustomLockProvider and testCreateMetadataWriteConfigRejectsCustomLockProviders test the same behavior with different class names. Consider consolidating into a parameterized test if desired, but this is a nitpick.

Also applies to: 480-501

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-client/hudi-client-common/src/test/java/org/apache/hudi/metadata/TestHoodieMetadataWriteUtils.java`
around lines 329 - 351, Two nearly identical tests
(testCreateMetadataWriteConfigRejectsCustomLockProvider and
testCreateMetadataWriteConfigRejectsCustomLockProviders) duplicate coverage of
rejecting non-built-in lock providers; consolidate them into a single
parameterized test that iterates over the different custom lock provider class
names to assert HoodieMetadataWriteUtils.createMetadataWriteConfig throws
HoodieException and the message contains "only supported for built-in lock
providers" and the class name. Update or replace the two existing test methods
in TestHoodieMetadataWriteUtils to use a parameterized JUnit construct (e.g.,
`@ParameterizedTest` with `@ValueSource` or `@MethodSource`) to supply the custom
class name strings and perform the same assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@hudi-client/hudi-client-common/src/test/java/org/apache/hudi/metadata/TestHoodieMetadataWriteUtils.java`:
- Around line 329-351: Two nearly identical tests
(testCreateMetadataWriteConfigRejectsCustomLockProvider and
testCreateMetadataWriteConfigRejectsCustomLockProviders) duplicate coverage of
rejecting non-built-in lock providers; consolidate them into a single
parameterized test that iterates over the different custom lock provider class
names to assert HoodieMetadataWriteUtils.createMetadataWriteConfig throws
HoodieException and the message contains "only supported for built-in lock
providers" and the class name. Update or replace the two existing test methods
in TestHoodieMetadataWriteUtils to use a parameterized JUnit construct (e.g.,
`@ParameterizedTest` with `@ValueSource` or `@MethodSource`) to supply the custom
class name strings and perform the same assertions.

In
`@hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java`:
- Around line 1420-1436: The validateTableServiceManagerActions method currently
passes empty/whitespace tokens to ActionType.valueOf causing unclear errors;
update validateTableServiceManagerActions to trim and skip any empty tokens
(e.g., after split, if trimmed.isEmpty() continue) or explicitly throw a clearer
IllegalArgumentException for empty entries, and keep the existing checks against
SUPPORTED_TABLE_SERVICE_MANAGER_ACTIONS for non-empty tokens; reference
validateTableServiceManagerActions and hoodie config helper
HoodieTableServiceManagerConfig.isEnabledAndActionSupported to ensure consistent
behavior with the runtime filter.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0085b203-3b22-4dd7-8590-7f1d8f84950e

📥 Commits

Reviewing files that changed from the base of the PR and between 1487955 and 7a92049.

📒 Files selected for processing (4)
  • hudi-client/hudi-client-common/src/test/java/org/apache/hudi/metadata/TestHoodieMetadataWriteUtils.java
  • hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java
  • hudi-common/src/main/java/org/apache/hudi/common/config/HoodieTableServiceManagerConfig.java
  • hudi-common/src/test/java/org/apache/hudi/common/config/TestHoodieMetadataConfig.java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant