Skip to content

[OSS PR #18455] [ENG-38911] Test hudi rs with hudi functional tests#39

Open
yihua wants to merge 1 commit into
masterfrom
oss-18455
Open

[OSS PR #18455] [ENG-38911] Test hudi rs with hudi functional tests#39
yihua wants to merge 1 commit into
masterfrom
oss-18455

Conversation

@yihua
Copy link
Copy Markdown
Owner

@yihua yihua commented Apr 14, 2026

Mirror of apache#18455 for automated bot review.

Original author: @Davis-Zhang-Onehouse
Base branch: master

Summary by CodeRabbit

  • Documentation

    • Added root cause analysis documenting known Gluten/Velox integration issues with Merge-On-Read operations (MAP/Decimal/Array handling).
  • Tests

    • Added comprehensive Merge-On-Read test coverage for complex data types.
    • Disabled test scenarios unsupported with new engine integrations (CUSTOM merge mode, pre-v9 tables, custom payloads).
    • Updated default log format to AVRO; restricted file format support to PARQUET.
  • Chores

    • Added test infrastructure supporting Gluten/Velox native execution validation.
    • Updated Spark dependency version; added internal artifact repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 14, 2026

📝 Walkthrough

Walkthrough

This PR introduces Gluten/Velox native execution support to Hudi test infrastructure, adds comprehensive merge-on-read (MOR) test suites with complex data types, documents three failing MOR test cases, disables unsupported test scenarios (custom merge modes, custom payloads), and updates Maven configuration to pull specialized Spark dependencies.

Changes

Cohort / File(s) Summary
Gluten/Velox Test Infrastructure
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/GlutenTestUtils.java, HoodieClientTestUtils.java, SparkProvider.java, hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/common/HoodieSparkSqlTestBase.scala
Added new GlutenTestUtils utility class with applyGlutenConf() method to conditionally enable Gluten/Velox execution. Integrated into test SparkConf setup paths to detect and load Quanton or Gluten plugins, configure off-heap memory, set columnar shuffle manager, and apply backend-specific settings.
Diagnostic Logging
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieSparkUtils.scala, hudi-tests-common/src/main/resources/log4j2-surefire*.properties
Added warning log of Spark DataFrame execution plan during RDD creation. Added Log4j2 logger configurations for Gluten and Quanton libraries at info level.
RCA Documentation
RCA_Gluten_Velox_MOR_Failures.md
Documented three failing MOR tests when Gluten/Velox execution is enabled: testMORWithMap (UnsafeRow size assertion due to MAP byte layout/alignment), testMORWithDecimal (unsupported Arrow Decimal128), testMORWithArray (Arrow concatenation failure from schema mismatch).
Test Coverage Reduction
hudi-common/src/test/java/org/apache/hudi/common/table/read/TestFileGroupReaderSchemaHandler.java, TestHoodieFileGroupReaderBase.java, TestKeyBasedFileGroupRecordBuffer.java, hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/TestPositionBasedFileGroupRecordBuffer.java, TestMergeHandle.java, TestHoodieSparkCopyOnWriteTableRollbackTableVersionSix.java, TestHoodieSparkMergeOnReadTableRollback.java
Disabled multiple test cases: custom payload/merger tests, CUSTOM merge mode tests, table version 6 rollback tests. Removed parameter combinations for CUSTOM modes and parquet log formats. Changed default log block format from parquet to avro.
MOR Test Enhancements
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderOnSpark.scala, TestMORDataSource.scala
Extended MOR test coverage with new comprehensive test methods: testReadMORTableWithAllDataTypes, testMORWithDecimal, testMORWithBinary, testMORWithArray, testMORWithMap, testMORWithStruct. Added testGlutenVeloxNativeExecution smoke test. Applied Gluten configuration during Spark setup. Changed supported file formats to Parquet only.
New MOR File Slice Test Suite
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/others/TestMORFileSliceLayouts.scala
Introduced comprehensive test suite validating MOR file-slice layouts and block contents across five scenarios: compacted log blocks, delete blocks, column projection, all data types, and fixed-width primitive alignment stress cases.
Test Parameter Reduction
TestMergeModeCommitTimeOrdering.scala, TestMergeModeEventTimeOrdering.scala, TestPartialUpdateForMergeInto.scala
Removed COW test cases, reduced table version coverage from versions 6/8 to 9 only, introduced hard-coded paths with timestamp-based counter instead of temp directories, converted test to ignore mode.
Maven Configuration
pom.xml
Added gluten-velox Maven build profile activated by gluten.bundle.jar system property. Configured surefire/scalatest plugins with Gluten bundle classpath, JVM options for reflection access, and module exports. Added CodeArtifact repository. Updated Spark 3.5 version to 3.5.2-INTERNAL-0.16.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 The Gluten tests now hop with glee,
With Velox speed, fast as can be!
MOR tables dance through maps and arrays,
Decimals and structs in complex ways!
A newer Spark, fresh from the source—
Our Hudi tests run their native course! 🚀

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title references testing Hudi RS (Hudi Rust) with functional tests, which aligns with the core changes: adding Gluten/Velox native execution configuration and MOR test infrastructure for complex data types.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch oss-18455

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 14, 2026

Greptile Summary

This PR introduces Gluten/Velox native-execution support into Hudi's functional test suite (MOR read path), driven by an investigation into three failing sub-tests (testMORWithMap, testMORWithDecimal, testMORWithArray). It adds GlutenTestUtils for opt-in Gluten SparkConf injection, a gluten-velox Maven profile, new TestMORFileSliceLayouts and testReadMORTableWithAllDataTypes test coverage, and a number of @Disabled annotations to skip currently-unsupported test variants.

However, the PR contains several OSS-blocking artifacts from the investigation process:

  • Production debug logging: A logWarning that prints the full executed query plan at WARN level was left in HoodieSparkUtils.scala and will appear in every Hudi job's logs.
  • Private build infrastructure: pom.xml replaces the public Apache Spark 3.5.5 with 3.5.2-INTERNAL-0.16 (an Onehouse-internal build) and adds a private AWS CodeArtifact Maven repository — breaking the build for all external contributors and CI environments.
  • Hardcoded developer paths: TestMORFileSliceLayouts.scala and TestMergeModeCommitTimeOrdering.scala both write tables to hardcoded absolute paths (/home/ubuntu/ws3/hudi-rs/tables/...) instead of withTempDir, making these tests fail on any machine other than the original author's. TestMergeModeCommitTimeOrdering also introduces a global mutable cnt counter in a companion object to differentiate paths.
  • Internal investigation document: RCA_Gluten_Velox_MOR_Failures.md is committed to the repository root, an internal RCA artifact that does not belong in the OSS codebase.
  • Commented-out test parameters: TestMORDataSource.scala has large blocks of @CsvSource entries commented out (rather than cleanly removed), alongside a leftover debug path comment.
  • Significant test coverage reduction: Across multiple files, test parameters are narrowed to version 9 only and many CUSTOM/pre-v9 test cases are disabled without corresponding tracking tickets.

Confidence Score: 1/5

Not safe to merge — the PR contains private build infrastructure that breaks the OSS build, a production debug log statement, and hardcoded developer paths that will fail tests on any other machine.

Score reflects three blocking P1 issues: (1) the internal Onehouse Spark version and private CodeArtifact Maven repository in root pom.xml make the project unbuildable for all external contributors, (2) the logWarning in production HoodieSparkUtils.scala prints the full query plan at WARN level on every MOR read, and (3) two new test classes write to hardcoded absolute paths that only exist on one developer's machine.

pom.xml (private repo + internal Spark version), HoodieSparkUtils.scala (debug log), TestMORFileSliceLayouts.scala and TestMergeModeCommitTimeOrdering.scala (hardcoded paths), RCA_Gluten_Velox_MOR_Failures.md (internal document)

Important Files Changed

Filename Overview
pom.xml Changed spark35.version to an internal Onehouse build (3.5.2-INTERNAL-0.16) and added a private AWS CodeArtifact repository — both will break the build for all external contributors.
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieSparkUtils.scala Added a debug logWarning that prints the full executed query plan on every createRdd call — a clear investigation artifact that must be removed before merge.
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/others/TestMORFileSliceLayouts.scala New test class for MOR file slice layout validation, but hardcodes a developer-local absolute path (/home/ubuntu/ws3/hudi-rs/tables/fdss) that will fail on any other machine.
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/others/TestMergeModeCommitTimeOrdering.scala Replaced withTempDir with hardcoded developer paths and introduced a global mutable counter in a companion object; also reduced COW test coverage.
RCA_Gluten_Velox_MOR_Failures.md Internal RCA investigation document with stack traces and log evidence that should not be committed to the OSS repository.
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala Significantly narrows test coverage, leaves large blocks of commented-out test parameters, disables testMergerStrategySet, and includes a leftover debug path comment.
hudi-common/src/test/java/org/apache/hudi/common/table/read/buffer/TestKeyBasedFileGroupRecordBuffer.java Disables 7 tests covering custom payload and CUSTOM merge mode — significant coverage reduction that should be tracked with explicit follow-up tickets.
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/GlutenTestUtils.java New utility class for injecting Gluten/Velox SparkConf settings when gluten.bundle.jar system property is set — correctly no-ops when the property is absent.
hudi-spark-datasource/hudi-spark/pom.xml Adds a gluten-velox Maven profile activated by -Dgluten.bundle.jar that wires the bundle JAR into surefire/scalatest classpaths — structurally sound and only activates when explicitly opted in.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Test JVM starts] --> B{gluten.bundle.jar\nsystem property set?}
    B -- No --> C[Standard Spark execution]
    B -- Yes --> D[GlutenTestUtils.applyGlutenConf]
    D --> E{QuantonPlugin\non classpath?}
    E -- Yes --> F[pluginClass =\nai.onehouse.quanton.QuantonPlugin]
    E -- No --> G[pluginClass =\norg.apache.gluten.GlutenPlugin]
    F --> H[Apply spark.plugins +\noffheap + shuffle manager]
    G --> H
    H --> I[SparkSession with\nGluten/Velox native execution]
    C --> J[SparkSession with\nstandard JVM execution]
    I --> K[MOR Read Path\nHoodieSparkUtils.createRdd]
    J --> K
    K --> L[df.queryExecution.toRdd\n.mapPartitions]
    L --> M{Gluten active?}
    M -- Yes --> N[WholeStageCodegenPartitionEvaluator\nrows.isEmpty triggers processNext]
    M -- No --> O[Standard iterator\nrows.isEmpty safe]
    N --> P[RCA: UnsafeRow sizeInBytes\nmust be multiple of 8]
Loading

Comments Outside Diff (1)

  1. hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala, line 109-130 (link)

    P2 Large block of test parameters commented out rather than removed

    The @CsvSource for testCount now has more than 20 test cases commented out with // instead of being cleanly removed or replaced with @Disabled. This significantly degrades readability and gives the impression these cases are temporarily bypassed rather than intentionally removed.

    If these test cases are genuinely not applicable for hudi-rs, they should either be removed (with a corresponding tracking ticket), or the whole test annotated with @Disabled to keep the file clean. Commented-out code in test parameters is particularly noisy since it mixes active and inactive data in the same annotation.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Reviews (1): Last reviewed commit: "all test changes" | Re-trigger Greptile

// Additionally, we have to explicitly wrap around resulting [[RDD]] into the one
// injecting [[SQLConf]], which by default isn't propagated by Spark to the executor(s).
// [[SQLConf]] is required by [[AvroSerializer]]
logWarning(s"createRdd executedPlan:\n${df.queryExecution.executedPlan.treeString}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Debug warning log left in production code

This logWarning call prints the entire executed query plan (as a tree string) at WARN level every time createRdd is called. This is clearly a debug statement added during the Gluten/Velox investigation and must not reach production:

  • It will spam the logs for every MOR read in every Hudi job.
  • treeString is non-trivial to compute and allocates for every partition scan.
  • WARN level means it will appear in most production logging configurations.

This line should be removed entirely before merging.

Comment thread pom.xml
@@ -170,7 +170,7 @@
<rocksdbjni.version>7.5.3</rocksdbjni.version>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Internal Onehouse Spark build version in OSS project

The Spark 3.5 version has been changed from the public 3.5.5 to 3.5.2-INTERNAL-0.16, an Onehouse-internal build that is only resolvable via the private CodeArtifact repository added below (line ~1966). This will break the build for every developer and CI environment that does not have access to that repository.

This must be reverted to an official Apache Spark release before merging to the OSS project.

Comment thread pom.xml
Comment on lines 1963 to +1968
<id>confluent</id>
<url>https://packages.confluent.io/maven/</url>
</repository>
<repository>
<id>codeartifact</id>
<name>codeartifact</name>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Private internal Maven repository in OSS project

The repository at https://onehouse-194159489498.d.codeartifact.us-west-2.amazonaws.com/maven/onehouse-internal/ is a private AWS CodeArtifact instance owned by Onehouse. Adding it to the root pom.xml means every Maven build of the project will attempt to resolve artifacts from a private, unauthenticated endpoint — this will fail for all external contributors and CI systems outside of Onehouse infrastructure.

Both this repository entry and the internal Spark version (3.5.2-INTERNAL-0.16) that depends on it must be removed.

val hadoopPath = new Path(path)
val fs = hadoopPath.getFileSystem(spark.sessionState.newHadoopConf())
if (fs.exists(hadoopPath)) {
fs.delete(hadoopPath, true)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Hardcoded developer-local path used as base output directory

val BASE_OUTPUT_PATH = "/home/ubuntu/ws3/hudi-rs/tables/fdss" is a hardcoded absolute path on a specific developer's machine. All four tests in this file write their tables to sub-directories of this path. These tests will fail (or worse, silently write data to an unexpected location) on any other machine.

This should be replaced with a temporary directory, consistent with how other test classes in the project handle isolation, e.g.:

Suggested change
fs.delete(hadoopPath, true)
val BASE_OUTPUT_PATH = createTempDir().getAbsolutePath

Or use the withTempDir pattern already used throughout HoodieSparkSqlTestBase.

Comment on lines 104 to +107
) {
withRecordType()(withTempDir { tmp =>
withRecordType()({
val tableName = generateTableName
val currTime = System.currentTimeMillis()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Hardcoded developer-local path and global mutable state

Two related problems here:

  1. Hardcoded path: s"/home/ubuntu/ws3/hudi-rs/tables/table${TestMergeModeCommitTimeOrdering.cnt}_$currTime" is a hardcoded path on a specific developer's machine. Tests will fail on any other CI or developer environment.

  2. Global mutable cnt in companion object: The companion object TestMergeModeCommitTimeOrdering.cnt is a JVM-singleton var incremented across test invocations. This is a form of shared mutable state that makes test execution order-dependent and could cause path collisions or unexpected behavior if tests are run in parallel.

Both issues stem from the same root cause — the code was modified to write tables to a fixed developer path for manual inspection with hudi-rs. This logic should be reverted to use withTempDir { tmp => ... } as in the original code.

The same pattern appears in the second test block in this file (~line 255).

Comment on lines 576 to 578
// First Operation:
// Producing parquet files to three default partitions.
// SNAPSHOT view on MOR table with parquet files only.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Leftover commented-out developer debug path

These two commented-out lines are clearly debugging artifacts from manual testing with hudi-rs:

// val BASE_OUTPUT_PATH = "/home/ubuntu/ws3/hudi-rs/tables/testPrunedFiltered"
// val tablePath = s"$BASE_OUTPUT_PATH/table_log_compaction_${System.currentTimeMillis()}"

They should be removed entirely from the file before merging.

Comment on lines +1 to +5
# RCA: Gluten/Velox MOR Test Failures

Three sub-tests fail when Gluten/Velox native execution is enabled:
`testMORWithMap`, `testMORWithDecimal`, `testMORWithArray`.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Internal investigation document should not be committed to the repository

RCA_Gluten_Velox_MOR_Failures.md is a root-cause analysis document for a specific Gluten/Velox testing investigation. It references internal stack traces, log evidence, and Onehouse-specific infrastructure. It is not part of the project's documentation and will confuse future readers of the repository.

This file should not be committed to the OSS repository. If it needs to be preserved for reference, it belongs in an internal wiki or issue tracker rather than the source tree.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/others/TestPartialUpdateForMergeInto.scala (1)

683-730: ⚠️ Potential issue | 🟠 Major

Don't ignore the regression test for unsupported CUSTOM MERGE INTO.

This was the assertion that CUSTOM payload + CUSTOM merge mode is rejected for MERGE INTO. Converting it to ignore removes that contract entirely. If the skip is only needed for the Gluten/RS path, make it conditional and keep the default suite validating the failure.

hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/others/TestMergeModeEventTimeOrdering.scala (1)

34-39: ⚠️ Potential issue | 🟠 Major

Don't remove the v6/v8 compatibility matrix from the shared suite.

The test body still has version-specific expectations for table versions 6 and 8, but those cases are now unreachable. That turns legacy merge-mode behavior into dead, unverified code rather than isolating the Gluten-specific failures.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/others/TestMergeModeEventTimeOrdering.scala`
around lines 34 - 39, The parametrized Seq in TestMergeModeEventTimeOrdering
currently omits the legacy v6/v8 table-version cases, making the
version-specific expectations in the test body unreachable; update the Seq(...)
used where args is iterated to reintroduce the v6 and v8 compatibility rows (so
the test runs the legacy table-version permutations) ensuring the strings/tuples
you add match the shape consumed by the test loop (the same format used by the
other entries) so the existing version-specific assertions in the test body
execute for v6 and v8.
hudi-common/src/test/java/org/apache/hudi/common/table/read/TestFileGroupReaderSchemaHandler.java (1)

168-175: ⚠️ Potential issue | 🟡 Minor

Test coverage significantly reduced.

The @CsvSource has been trimmed to remove:

  • All SIX table version test cases (backwards compatibility)
  • CUSTOM merge mode scenarios
  • Several populateMetaFields=false combinations

This reduces coverage for pre-v9 table compatibility. Consider documenting which scenarios are intentionally excluded and why, or tracking re-enablement in a follow-up issue.

Would you like me to open an issue to track re-enabling these test cases once the underlying features are supported?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-common/src/test/java/org/apache/hudi/common/table/read/TestFileGroupReaderSchemaHandler.java`
around lines 168 - 175, The CSV-driven test in TestFileGroupReaderSchemaHandler
lost coverage: restore the removed `@CsvSource` rows that include table version
SIX, merge mode CUSTOM, and the populateMetaFields=false permutations so
pre-v9/backwards-compat scenarios are tested again; if those cases must remain
excluded temporarily, add a concise TODO comment next to the `@CsvSource`
explaining why they are omitted and add/mention a tracking issue ID (create one
if none) so re-enabling (or supporting the underlying features) is tracked.
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala (1)

107-122: ⚠️ Potential issue | 🟠 Major

Keep the reduced compatibility matrix behind a Gluten-only gate.

These edits permanently drop older-version/custom-merge cases from the default MOR suite, even though native execution is only turned on when the external bundle is present. That shrinks baseline Spark coverage instead of isolating the Gluten-specific limitations to the Gluten profile.

Also applies to: 568-570, 1142-1143, 1531-1534

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala`
around lines 107 - 122, The test suite in TestMORDataSource.scala has had many
CSV rows removed from the `@CsvSource` for the parameterized test, which
unintentionally shrinks baseline Spark coverage; instead, restore the full
compatibility matrix in the `@CsvSource` and run the reduced Gluten-only matrix
conditionally by gating it behind the Gluten profile/flag (e.g., detect the
Gluten/external bundle flag used in test runtime or use an `@EnabledIf/`@Tag that
checks isGlutenEnabled). Specifically, revert the commented-out rows in the
`@CsvSource` inside the parameterized test and add logic around test execution (or
a second CsvSource variant) so that only when the Gluten flag is true you use
the reduced set; reference the parameterized test in TestMORDataSource.scala and
ensure the same gating is applied to the other affected CsvSource occurrences
noted (around the same test blocks at the other locations).
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderOnSpark.scala (1)

76-106: ⚠️ Potential issue | 🟠 Major

Restore supportedFileFormats in teardown().

This symbol comes from TestHoodieFileGroupReaderBase and is shared mutable state. Forcing it to PARQUET in setup() without restoring it makes later tests/classes order-dependent.

Suggested isolation fix
 class TestHoodieFileGroupReaderOnSpark extends TestHoodieFileGroupReaderBase[InternalRow] with SparkAdapterSupport {
   var spark: SparkSession = _
+  private var previousSupportedFileFormats: java.util.List[HoodieFileFormat] = _
 
   `@BeforeEach`
   def setup() {
+    previousSupportedFileFormats = supportedFileFormats
     val sparkConf = new SparkConf
     sparkConf.set("spark.app.name", getClass.getName)
     ...
     spark = SparkSession.builder.config(sparkConf).getOrCreate
     supportedFileFormats = util.Arrays.asList(HoodieFileFormat.PARQUET)
   }
 
   `@AfterEach`
   def teardown() {
-    if (spark != null) {
-      spark.stop()
-    }
+    try {
+      if (spark != null) {
+        spark.stop()
+      }
+    } finally {
+      supportedFileFormats = previousSupportedFileFormats
+    }
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderOnSpark.scala`
around lines 76 - 106, The test mutates the shared mutable symbol
supportedFileFormats in setup(), making tests order-dependent; capture the
original value before overwriting (e.g., val previousSupported =
supportedFileFormats) inside setup() and then restore it in teardown()
(supportedFileFormats = previousSupported) so the shared state from
TestHoodieFileGroupReaderBase is returned to its prior value; implement these
changes in the setup and teardown methods that currently set
supportedFileFormats = util.Arrays.asList(HoodieFileFormat.PARQUET).
🧹 Nitpick comments (5)
hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/table/functional/TestHoodieSparkMergeOnReadTableRollback.java (1)

110-110: Parameterization is now effectively single-version and should be simplified.

At Line 110 and Line 179, the @CsvSource now passes only table version 9, but the test methods still carry a tableVersion parameter and version-dependent setup paths. Consider either restoring a true matrix or collapsing these tests to a fixed v9 path to avoid misleading maintenance surface.

♻️ Suggested simplification (apply similarly to both methods)
-  `@ParameterizedTest`
-  `@CsvSource`({ "true,9", "false,9"})
-  void testCOWToMORConvertedTableRollback(boolean rollbackUsingMarkers, int tableVersion) throws Exception {
+  `@ParameterizedTest`
+  `@ValueSource`(booleans = {true, false})
+  void testCOWToMORConvertedTableRollback(boolean rollbackUsingMarkers) throws Exception {
+    final int tableVersion = 9;

Also applies to: 179-179

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/table/functional/TestHoodieSparkMergeOnReadTableRollback.java`
at line 110, The tests in TestHoodieSparkMergeOnReadTableRollback are still
written to accept a tableVersion parameter even though the CsvSource only
supplies v9; change the parameterization to only vary the boolean and collapse
version handling to a fixed v9: update the two test annotations (the ones
currently using "@CsvSource({ \"true,9\", \"false,9\"})") to supply just the
boolean, remove the tableVersion parameter from the test method signatures, and
replace any use of tableVersion (including conditional path/build logic) with
the fixed v9 value (or a single v9 constant) so the tests no longer carry dead
version branches.
hudi-tests-common/src/main/resources/log4j2-surefire-quiet.properties (1)

27-31: Consider setting log level to warn for consistency with quiet profile.

This is the "quiet" surefire config where the root logger is warn and other loggers like hbase are error. Setting org.apache.gluten and ai.onehouse.quanton to info may produce more verbose output than expected in quiet mode.

Suggested change for quiet profile
 logger.gluten.name = org.apache.gluten
-logger.gluten.level = info
+logger.gluten.level = warn

 logger.quanton.name = ai.onehouse.quanton
-logger.quanton.level = info
+logger.quanton.level = warn
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hudi-tests-common/src/main/resources/log4j2-surefire-quiet.properties` around
lines 27 - 31, Change the log levels for the quiet surefire profile: set
logger.gluten.level and logger.quanton.level from info to warn so
org.apache.gluten and ai.onehouse.quanton follow the quiet profile (root warn)
and avoid producing verbose test output; update the entries for
logger.gluten.name and logger.quanton.name accordingly.
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/others/TestMergeModeCommitTimeOrdering.scala (2)

38-39: COW test variants commented out.

All COW test cases are commented out, leaving only MOR variants. If this is intentional for Gluten/Velox compatibility, consider adding a comment explaining why, or use @Disabled annotations with reasons instead of commenting out code.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/others/TestMergeModeCommitTimeOrdering.scala`
around lines 38 - 39, The COW variants in TestMergeModeCommitTimeOrdering have
been commented out (the parameter list containing "cow,current,..." is removed)
leaving only MOR cases; either restore the COW entries in the args list inside
TestMergeModeCommitTimeOrdering so the tests run, or instead mark the COW cases
explicitly disabled with a clear reason: add `@Disabled`("reason: Gluten/Velox
incompatibility") on the specific test or branch and include an inline comment
explaining why COW is excluded; locate the parameter list passed to args.foreach
and the test method in class TestMergeModeCommitTimeOrdering to apply the
change.

31-33: Remove mutable state from companion object.

Using a mutable var in a companion object for test uniqueness is fragile. Consider using UUID.randomUUID() or the test framework's temp directory facilities instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/others/TestMergeModeCommitTimeOrdering.scala`
around lines 31 - 33, The companion object TestMergeModeCommitTimeOrdering
currently defines a mutable var cnt used for uniqueness; remove this mutable
state and replace all uses of TestMergeModeCommitTimeOrdering.cnt with a
deterministic, thread-safe alternative such as UUID.randomUUID().toString (or
the test framework's temp directory / unique test name helper) wherever unique
names/paths are constructed (search for references to
TestMergeModeCommitTimeOrdering and cnt in the tests); update any setup/teardown
logic to not rely on incrementing cnt and ensure uniqueness is produced inline
at call sites (e.g., generate a UUID or use temporaryFolder/newTempDir) instead.
hudi-common/src/test/java/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderBase.java (1)

359-362: CUSTOM merge mode tests commented out.

Consider using @Disabled annotations with reasons instead of commenting out test parameters. This provides better documentation and makes it easier to track what needs re-enabling.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-common/src/test/java/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderBase.java`
around lines 359 - 362, Uncomment the two parameter lines that use
RecordMergeMode.CUSTOM (the commented arguments(...) calls) and instead of
leaving them commented, disable the associated parameterized test using
`@Disabled` with a short reason; locate the parameter provider or the test method
in TestHoodieFileGroupReaderBase that consumes these arguments (the method that
returns or uses arguments(RecordMergeMode..., HoodieFileFormat..., ...)) and add
`@Disabled`("CUSTOM merge mode tests temporarily disabled - <reason>") to that
test or its provider so the CUSTOM cases are recorded as intentionally disabled
rather than commented out.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieSparkUtils.scala`:
- Line 106: The current logWarning in HoodieSparkUtils.createRdd eagerly
materializes df.queryExecution.executedPlan.treeString on a hot write path;
change it to avoid WARN-level plan dumps by either moving the message to
logDebug or gating it behind a diagnostic flag. Locate the logWarning call
referencing df.queryExecution.executedPlan.treeString in createRdd and replace
it so the plan string is only generated when logDebug is enabled or when a new
boolean diagnostic flag (e.g., enablePlanDump) is true; ensure you check
logger.isDebugEnabled (or the flag) before calling treeString to prevent
unnecessary work.

In
`@hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/GlutenTestUtils.java`:
- Around line 42-54: The applyGlutenConf method currently treats the
gluten.bundle.jar string as a flag; update it to fail-fast by validating the
path and plugin class before mutating SparkConf: 1) check that the
glutenBundleJar string is non-empty and points to an existing readable file
(e.g., new File(glutenBundleJar).isFile()) and if not, throw an
IllegalArgumentException or return early so SparkConf is not modified; 2)
resolve the plugin class by first trying
Class.forName("ai.onehouse.quanton.QuantonPlugin") and if that fails set
pluginClass = "org.apache.gluten.GlutenPlugin", then also verify the fallback
exists via Class.forName(pluginClass) and return/throw if it does not; 3) only
after both the jar file and the chosen pluginClass are validated, set the
SparkConf entries (the code that writes to sparkConf) so misconfigured paths or
missing classes are detected immediately in applyGlutenConf.

In
`@hudi-common/src/test/java/org/apache/hudi/common/table/read/buffer/TestKeyBasedFileGroupRecordBuffer.java`:
- Around line 181-183: The `@Disabled` reason on the test method
readWithCommitTimeOrdering is misleading; update the disable annotation or the
test name so the intent is clear: either change the `@Disabled` message to
explicitly state "Uses custom delete payload (not supported)" or rename the
method (e.g., to readWithCommitTimeOrderingAndCustomDeletePayload) to reflect
that the failure is due to custom delete payloads, not commit-time ordering.
Locate the test method readWithCommitTimeOrdering and apply one of these changes
so the disable reason and test identifier align.

In
`@hudi-common/src/test/java/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderBase.java`:
- Around line 451-452: The test parameterization was narrowed to only the "avro"
log data block format, removing "parquet" coverage; update the parameter list
used by the test (in TestHoodieFileGroupReaderBase, the method supplying
arguments for logFileOnlyCases) to include both "avro" and "parquet" again
(e.g., add corresponding arguments(RecordMergeMode.COMMIT_TIME_ORDERING,
"parquet") and arguments(RecordMergeMode.EVENT_TIME_ORDERING, "parquet") so the
test runs for both formats).

In
`@hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/io/TestMergeHandle.java`:
- Around line 280-283: The test removed CUSTOM/CUSTOM_MERGER branch coverage;
restore explicit coverage by including "CUSTOM" and "CUSTOM_MERGER" in the
`@ValueSource` for testFGReaderBasedMergeHandleInsertUpsertDelete or, if those
modes are genuinely unsupported under the Gluten environment, add a guard inside
testFGReaderBasedMergeHandleInsertUpsertDelete (calling
testFGReaderBasedMergeHandleInsertUpsertDeleteInternal) that detects the Gluten
run (e.g., an existing isGluten()/environment check) and asserts the unsupported
behavior for CUSTOM/CUSTOM_MERGER instead of silently skipping them; update
references to testFGReaderBasedMergeHandleInsertUpsertDeleteInternal, the
ValueSource annotation, and any Gluten-run detection logic accordingly.

In
`@hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/table/functional/TestHoodieSparkMergeOnReadTableRollback.java`:
- Line 1161: Replace the hard-disabled annotations so the rollback failure-path
tests run conditionally: in TestHoodieSparkMergeOnReadTableRollback.java and
TestHoodieSparkCopyOnWriteTableRollbackTableVersionSix.java, change each method
annotated with `@Disabled`("Table version 6 only") to `@Tag`("table-version-6") (the
four rollback test methods around the existing markers). Then add/extend a Maven
profile (e.g., functional-tests-v6) in pom.xml to include the table-version-6
tag via the surefire/failsafe groups configuration, and ensure the Azure CI
pipeline triggers that profile in the appropriate validation job so the
table-version-6 tests execute in CI.

In
`@hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderOnSpark.scala`:
- Around line 677-678: The test is permanently disabled with `@Disabled`("Custom
delete payload not supported"), removing custom-delete coverage for all runs;
instead remove the `@Disabled` and add a conditional skip that only disables the
test when native/Gluten execution is active (e.g., use JUnit Assumptions inside
the test: call Assumptions.assumeFalse(isGlutenNativeExecutionActive()) or
Assumptions.assumeTrue(!isGlutenNativeExecutionActive()), where
isGlutenNativeExecutionActive() checks the presence of the gluten bundle
(gluten.bundle.jar) or existing project helper), so the test in
TestHoodieFileGroupReaderOnSpark runs normally under standard Spark profiles but
is skipped only when native execution is enabled.

In
`@hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala`:
- Around line 1490-1491: The test testRecordTypeCompatibilityWithParquetLog was
changed to force HoodieStorageConfig.LOGFILE_DATA_BLOCK_FORMAT to "avro" (via
readOpts and writeOpts), so it no longer validates parquet log-block behavior;
revert or adjust these assignments so the test actually uses "parquet" (e.g.,
set readOpts and writeOpts to
Map(HoodieStorageConfig.LOGFILE_DATA_BLOCK_FORMAT.key -> "parquet") or remove
the override), or alternatively split/rename the test into separate cases (one
explicitly setting "avro" and another keeping "parquet") so the parquet-log
reader remains covered.
- Around line 2269-2310: The test testGlutenVeloxNativeExecution currently only
checks rows.length > 0 so it passes without Gluten/Velox; update it to first
assumeTrue that the Gluten native bundle is available (skip otherwise) and then
assert that df.queryExecution.executedPlan.toString contains the expected native
operator identifiers (e.g., "Velox" or "Gluten" or the specific operator names
used by your runtime) after collecting/printing the plan; use the existing df
and df.queryExecution.executedPlan references and keep the existing plan
printing but replace or augment the final assertTrue(rows.length > 0) with an
assumeTrue(...) then an assertion that the plan string contains the native
operator marker.

In
`@hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/others/TestMergeModeCommitTimeOrdering.scala`:
- Around line 260-262: Remove the hardcoded absolute path and debug println: in
the TestMergeModeCommitTimeOrdering test where tablePath is constructed using
TestMergeModeCommitTimeOrdering.cnt and System.currentTimeMillis(), replace the
absolute "/home/ubuntu/..." path with a temp or relative test directory (or use
the test framework's temporary folder utility) and delete the println(s"fdss:
${tablePath}") call; ensure tablePath is built from a configurable base (or
tempDir) so the test is not environment-specific.
- Around line 107-109: The test currently uses a hardcoded absolute path via
tablePath built from TestMergeModeCommitTimeOrdering.cnt and currTime and prints
it with println (a debug artifact); replace this by using the test framework's
temporary directory helper (e.g., withTempDir) to create tablePath inside the
temp dir and remove the println line; locate the usage where
currTime/tablePath/Println are set in TestMergeModeCommitTimeOrdering and
restore the original withTempDir pattern so the path is portable and no debug
output remains.

In
`@hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/others/TestMORFileSliceLayouts.scala`:
- Around line 48-55: Replace the hard-coded BASE_OUTPUT_PATH and the unsafe
cleanPath usage with a test-scoped temporary directory: change BASE_OUTPUT_PATH
to be created at runtime (e.g., use java.nio.file.Files.createTempDirectory or
the test harness/tempDir utility) and ensure cleanPath operates on that temp dir
only (or remove cleanPath and rely on the temp directory lifecycle). Update
references to BASE_OUTPUT_PATH and the cleanPath helper so all tests use this
generated path, and ensure the temp directory is deleted/cleaned in an
afterEach/afterAll teardown to avoid leaving artifacts.
- Around line 62-68: getQueryFileSlices currently obtains resources from
getMetaClientAndFileSystemView (which returns a HoodieTableFileSystemView) and
iterates log readers but does not guarantee closure on exception paths; update
getQueryFileSlices to ensure the HoodieTableFileSystemView and any
HoodieLogFileReader instances are closed in all exit paths (use try/finally or a
loan/withResource pattern) so that fsView.close() is always called and each log
reader is closed even if an assertion or read fails; locate
getMetaClientAndFileSystemView and any code that constructs HoodieLogFileReader
(also in the 111-131 region) and wrap their usage in try/finally blocks or a
managed-resource helper to release resources reliably.

In `@pom.xml`:
- Line 173: The pom currently sets the Maven property spark35.version to the
internal coordinate "3.5.2-INTERNAL-0.16" which makes the active-by-default
profile spark3.5 depend on private Onehouse CodeArtifact repos; revert the
default <spark35.version> to the public Spark artifact (e.g., a public 3.5.x
coordinate) and move the internal "3.5.2-INTERNAL-0.16" value and the Onehouse
repository declaration into a new opt-in Maven profile (e.g., profile id
"onehouse-internal" or similar); update references to spark35.version to remain
unchanged so external builds use the public coordinate unless the opt-in profile
is activated.

In `@RCA_Gluten_Velox_MOR_Failures.md`:
- Around line 14-22: Several fenced code blocks in the document are missing
language specifiers; update each triple-backtick fence to include an appropriate
language token (e.g., ```java for the Java stacktrace starting with
java.lang.AssertionError and entries like UnsafeRow.setTotalSize, ```scala for
Scala snippets referencing HoodieSparkUtils.createRdd, or ```text/```cpp as
appropriate) so markdownlint stops flagging them; apply this change to the
reported ranges (including the stacktrace block and the other blocks at 32-36,
82-85, 184-203, 206-210, 245-251, 254-264, 320-327, 347-351) ensuring each fence
uses the correct language tag that matches the contained code/sample.

---

Outside diff comments:
In
`@hudi-common/src/test/java/org/apache/hudi/common/table/read/TestFileGroupReaderSchemaHandler.java`:
- Around line 168-175: The CSV-driven test in TestFileGroupReaderSchemaHandler
lost coverage: restore the removed `@CsvSource` rows that include table version
SIX, merge mode CUSTOM, and the populateMetaFields=false permutations so
pre-v9/backwards-compat scenarios are tested again; if those cases must remain
excluded temporarily, add a concise TODO comment next to the `@CsvSource`
explaining why they are omitted and add/mention a tracking issue ID (create one
if none) so re-enabling (or supporting the underlying features) is tracked.

In
`@hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderOnSpark.scala`:
- Around line 76-106: The test mutates the shared mutable symbol
supportedFileFormats in setup(), making tests order-dependent; capture the
original value before overwriting (e.g., val previousSupported =
supportedFileFormats) inside setup() and then restore it in teardown()
(supportedFileFormats = previousSupported) so the shared state from
TestHoodieFileGroupReaderBase is returned to its prior value; implement these
changes in the setup and teardown methods that currently set
supportedFileFormats = util.Arrays.asList(HoodieFileFormat.PARQUET).

In
`@hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala`:
- Around line 107-122: The test suite in TestMORDataSource.scala has had many
CSV rows removed from the `@CsvSource` for the parameterized test, which
unintentionally shrinks baseline Spark coverage; instead, restore the full
compatibility matrix in the `@CsvSource` and run the reduced Gluten-only matrix
conditionally by gating it behind the Gluten profile/flag (e.g., detect the
Gluten/external bundle flag used in test runtime or use an `@EnabledIf/`@Tag that
checks isGlutenEnabled). Specifically, revert the commented-out rows in the
`@CsvSource` inside the parameterized test and add logic around test execution (or
a second CsvSource variant) so that only when the Gluten flag is true you use
the reduced set; reference the parameterized test in TestMORDataSource.scala and
ensure the same gating is applied to the other affected CsvSource occurrences
noted (around the same test blocks at the other locations).

In
`@hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/others/TestMergeModeEventTimeOrdering.scala`:
- Around line 34-39: The parametrized Seq in TestMergeModeEventTimeOrdering
currently omits the legacy v6/v8 table-version cases, making the
version-specific expectations in the test body unreachable; update the Seq(...)
used where args is iterated to reintroduce the v6 and v8 compatibility rows (so
the test runs the legacy table-version permutations) ensuring the strings/tuples
you add match the shape consumed by the test loop (the same format used by the
other entries) so the existing version-specific assertions in the test body
execute for v6 and v8.

---

Nitpick comments:
In
`@hudi-common/src/test/java/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderBase.java`:
- Around line 359-362: Uncomment the two parameter lines that use
RecordMergeMode.CUSTOM (the commented arguments(...) calls) and instead of
leaving them commented, disable the associated parameterized test using
`@Disabled` with a short reason; locate the parameter provider or the test method
in TestHoodieFileGroupReaderBase that consumes these arguments (the method that
returns or uses arguments(RecordMergeMode..., HoodieFileFormat..., ...)) and add
`@Disabled`("CUSTOM merge mode tests temporarily disabled - <reason>") to that
test or its provider so the CUSTOM cases are recorded as intentionally disabled
rather than commented out.

In
`@hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/table/functional/TestHoodieSparkMergeOnReadTableRollback.java`:
- Line 110: The tests in TestHoodieSparkMergeOnReadTableRollback are still
written to accept a tableVersion parameter even though the CsvSource only
supplies v9; change the parameterization to only vary the boolean and collapse
version handling to a fixed v9: update the two test annotations (the ones
currently using "@CsvSource({ \"true,9\", \"false,9\"})") to supply just the
boolean, remove the tableVersion parameter from the test method signatures, and
replace any use of tableVersion (including conditional path/build logic) with
the fixed v9 value (or a single v9 constant) so the tests no longer carry dead
version branches.

In
`@hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/others/TestMergeModeCommitTimeOrdering.scala`:
- Around line 38-39: The COW variants in TestMergeModeCommitTimeOrdering have
been commented out (the parameter list containing "cow,current,..." is removed)
leaving only MOR cases; either restore the COW entries in the args list inside
TestMergeModeCommitTimeOrdering so the tests run, or instead mark the COW cases
explicitly disabled with a clear reason: add `@Disabled`("reason: Gluten/Velox
incompatibility") on the specific test or branch and include an inline comment
explaining why COW is excluded; locate the parameter list passed to args.foreach
and the test method in class TestMergeModeCommitTimeOrdering to apply the
change.
- Around line 31-33: The companion object TestMergeModeCommitTimeOrdering
currently defines a mutable var cnt used for uniqueness; remove this mutable
state and replace all uses of TestMergeModeCommitTimeOrdering.cnt with a
deterministic, thread-safe alternative such as UUID.randomUUID().toString (or
the test framework's temp directory / unique test name helper) wherever unique
names/paths are constructed (search for references to
TestMergeModeCommitTimeOrdering and cnt in the tests); update any setup/teardown
logic to not rely on incrementing cnt and ensure uniqueness is produced inline
at call sites (e.g., generate a UUID or use temporaryFolder/newTempDir) instead.

In `@hudi-tests-common/src/main/resources/log4j2-surefire-quiet.properties`:
- Around line 27-31: Change the log levels for the quiet surefire profile: set
logger.gluten.level and logger.quanton.level from info to warn so
org.apache.gluten and ai.onehouse.quanton follow the quiet profile (root warn)
and avoid producing verbose test output; update the entries for
logger.gluten.name and logger.quanton.name accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e1ffad95-377b-4b59-ae9e-c5404df9642b

📥 Commits

Reviewing files that changed from the base of the PR and between 35e2bbf and 3c0e970.

📒 Files selected for processing (23)
  • RCA_Gluten_Velox_MOR_Failures.md
  • hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieSparkUtils.scala
  • hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/GlutenTestUtils.java
  • hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/HoodieClientTestUtils.java
  • hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/providers/SparkProvider.java
  • hudi-common/src/test/java/org/apache/hudi/common/table/read/TestFileGroupReaderSchemaHandler.java
  • hudi-common/src/test/java/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderBase.java
  • hudi-common/src/test/java/org/apache/hudi/common/table/read/buffer/TestKeyBasedFileGroupRecordBuffer.java
  • hudi-spark-datasource/hudi-spark/pom.xml
  • hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/TestPositionBasedFileGroupRecordBuffer.java
  • hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/io/TestMergeHandle.java
  • hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/table/functional/TestHoodieSparkCopyOnWriteTableRollbackTableVersionSix.java
  • hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/table/functional/TestHoodieSparkMergeOnReadTableRollback.java
  • hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderOnSpark.scala
  • hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala
  • hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/common/HoodieSparkSqlTestBase.scala
  • hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/others/TestMORFileSliceLayouts.scala
  • hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/others/TestMergeModeCommitTimeOrdering.scala
  • hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/others/TestMergeModeEventTimeOrdering.scala
  • hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/others/TestPartialUpdateForMergeInto.scala
  • hudi-tests-common/src/main/resources/log4j2-surefire-quiet.properties
  • hudi-tests-common/src/main/resources/log4j2-surefire.properties
  • pom.xml

// Additionally, we have to explicitly wrap around resulting [[RDD]] into the one
// injecting [[SQLConf]], which by default isn't propagated by Spark to the executor(s).
// [[SQLConf]] is required by [[AvroSerializer]]
logWarning(s"createRdd executedPlan:\n${df.queryExecution.executedPlan.treeString}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid warning-level plan dumps on the createRdd hot path.

createRdd is on the write path, and executedPlan.treeString eagerly materializes the full plan string. Logging that at WARN for every call will add a lot of noise and overhead in production. Please keep this behind logDebug or a dedicated diagnostic flag instead.

Suggested change
-    logWarning(s"createRdd executedPlan:\n${df.queryExecution.executedPlan.treeString}")
+    logDebug(s"createRdd executedPlan:\n${df.queryExecution.executedPlan.treeString}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logWarning(s"createRdd executedPlan:\n${df.queryExecution.executedPlan.treeString}")
logDebug(s"createRdd executedPlan:\n${df.queryExecution.executedPlan.treeString}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieSparkUtils.scala`
at line 106, The current logWarning in HoodieSparkUtils.createRdd eagerly
materializes df.queryExecution.executedPlan.treeString on a hot write path;
change it to avoid WARN-level plan dumps by either moving the message to
logDebug or gating it behind a diagnostic flag. Locate the logWarning call
referencing df.queryExecution.executedPlan.treeString in createRdd and replace
it so the plan string is only generated when logDebug is enabled or when a new
boolean diagnostic flag (e.g., enablePlanDump) is true; ensure you check
logger.isDebugEnabled (or the flag) before calling treeString to prevent
unnecessary work.

Comment on lines +42 to +54
public static void applyGlutenConf(SparkConf sparkConf) {
String glutenBundleJar = System.getProperty("gluten.bundle.jar");
if (glutenBundleJar == null || glutenBundleJar.isEmpty()) {
return;
}

String pluginClass;
try {
Class.forName("ai.onehouse.quanton.QuantonPlugin");
pluginClass = "ai.onehouse.quanton.QuantonPlugin";
} catch (ClassNotFoundException e) {
pluginClass = "org.apache.gluten.GlutenPlugin";
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

fd -t f "GlutenTestUtils.java"

Repository: yihua/hudi

Length of output: 144


🏁 Script executed:

cat -n hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/GlutenTestUtils.java

Repository: yihua/hudi

Length of output: 3358


🏁 Script executed:

# Search for similar validation patterns in the codebase for system properties
rg "System.getProperty" --type java -A 5 -B 2 | head -100

Repository: yihua/hudi

Length of output: 13994


🏁 Script executed:

# Check if there are any tests or usages of applyGlutenConf
rg "applyGlutenConf" --type java -B 2 -A 5

Repository: yihua/hudi

Length of output: 3052


Add fail-fast validation for gluten.bundle.jar property and fallback plugin class.

The method currently treats gluten.bundle.jar as a boolean flag only. A typo in the path or missing classpath wiring still configures SparkConf and defers the failure to Spark startup, making test diagnostics harder. Validate the file path and ensure the fallback GlutenPlugin class exists before enabling the plugin mode.

Suggested fail-fast check
+import java.io.File;
+
   public static void applyGlutenConf(SparkConf sparkConf) {
     String glutenBundleJar = System.getProperty("gluten.bundle.jar");
     if (glutenBundleJar == null || glutenBundleJar.isEmpty()) {
       return;
     }
+    if (!new File(glutenBundleJar).isFile()) {
+      throw new IllegalArgumentException("gluten.bundle.jar does not point to an existing file: " + glutenBundleJar);
+    }
 
     String pluginClass;
     try {
       Class.forName("ai.onehouse.quanton.QuantonPlugin");
       pluginClass = "ai.onehouse.quanton.QuantonPlugin";
     } catch (ClassNotFoundException e) {
-      pluginClass = "org.apache.gluten.GlutenPlugin";
+      try {
+        Class.forName("org.apache.gluten.GlutenPlugin");
+        pluginClass = "org.apache.gluten.GlutenPlugin";
+      } catch (ClassNotFoundException e2) {
+        throw new IllegalStateException(
+            "gluten.bundle.jar is set but neither Quanton nor Gluten plugin is on the test classpath",
+            e2);
+      }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/GlutenTestUtils.java`
around lines 42 - 54, The applyGlutenConf method currently treats the
gluten.bundle.jar string as a flag; update it to fail-fast by validating the
path and plugin class before mutating SparkConf: 1) check that the
glutenBundleJar string is non-empty and points to an existing readable file
(e.g., new File(glutenBundleJar).isFile()) and if not, throw an
IllegalArgumentException or return early so SparkConf is not modified; 2)
resolve the plugin class by first trying
Class.forName("ai.onehouse.quanton.QuantonPlugin") and if that fails set
pluginClass = "org.apache.gluten.GlutenPlugin", then also verify the fallback
exists via Class.forName(pluginClass) and return/throw if it does not; 3) only
after both the jar file and the chosen pluginClass are validated, set the
SparkConf entries (the code that writes to sparkConf) so misconfigured paths or
missing classes are detected immediately in applyGlutenConf.

Comment on lines +181 to 183
@Disabled("Custom delete payload not supported")
@Test
void readWithCommitTimeOrdering() throws IOException {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Disabled reason may be misleading for test name.

The test readWithCommitTimeOrdering is disabled with "Custom delete payload not supported", which is technically correct (the test uses custom delete markers), but the test name suggests it's primarily about commit time ordering behavior. Consider clarifying the disable reason or renaming the test if custom delete payload support is the actual blocker.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-common/src/test/java/org/apache/hudi/common/table/read/buffer/TestKeyBasedFileGroupRecordBuffer.java`
around lines 181 - 183, The `@Disabled` reason on the test method
readWithCommitTimeOrdering is misleading; update the disable annotation or the
test name so the intent is clear: either change the `@Disabled` message to
explicitly state "Uses custom delete payload (not supported)" or rename the
method (e.g., to readWithCommitTimeOrderingAndCustomDeletePayload) to reflect
that the failure is due to custom delete payloads, not commit-time ordering.
Locate the test method readWithCommitTimeOrdering and apply one of these changes
so the disable reason and test identifier align.

Comment on lines 451 to +452
arguments(RecordMergeMode.COMMIT_TIME_ORDERING, "avro"),
arguments(RecordMergeMode.EVENT_TIME_ORDERING, "parquet"),
arguments(RecordMergeMode.CUSTOM, "avro"));
arguments(RecordMergeMode.EVENT_TIME_ORDERING, "avro"));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Log file only test cases reduced to avro format only.

logFileOnlyCases() now only tests avro log data block format, removing parquet format coverage. If parquet log blocks are supported in production, this reduces test coverage.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-common/src/test/java/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderBase.java`
around lines 451 - 452, The test parameterization was narrowed to only the
"avro" log data block format, removing "parquet" coverage; update the parameter
list used by the test (in TestHoodieFileGroupReaderBase, the method supplying
arguments for logFileOnlyCases) to include both "avro" and "parquet" again
(e.g., add corresponding arguments(RecordMergeMode.COMMIT_TIME_ORDERING,
"parquet") and arguments(RecordMergeMode.EVENT_TIME_ORDERING, "parquet") so the
test runs for both formats).

Comment on lines 280 to 283
@ParameterizedTest
@ValueSource(strings = {"EVENT_TIME_ORDERING", "COMMIT_TIME_ORDERING", "CUSTOM", "CUSTOM_MERGER"})
@ValueSource(strings = {"EVENT_TIME_ORDERING", "COMMIT_TIME_ORDERING"})
public void testFGReaderBasedMergeHandleInsertUpsertDelete(String mergeMode) throws IOException {
testFGReaderBasedMergeHandleInsertUpsertDeleteInternal(mergeMode, new Properties(), false);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep explicit coverage for CUSTOM merge modes.

testFGReaderBasedMergeHandleInsertUpsertDeleteInternal(...) still has dedicated CUSTOM / CUSTOM_MERGER branches, so removing those values from @ValueSource just makes those paths untested. If they are unsupported only in the Gluten run, gate them there or replace them with an explicit unsupported assertion instead of dropping the coverage globally.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/io/TestMergeHandle.java`
around lines 280 - 283, The test removed CUSTOM/CUSTOM_MERGER branch coverage;
restore explicit coverage by including "CUSTOM" and "CUSTOM_MERGER" in the
`@ValueSource` for testFGReaderBasedMergeHandleInsertUpsertDelete or, if those
modes are genuinely unsupported under the Gluten environment, add a guard inside
testFGReaderBasedMergeHandleInsertUpsertDelete (calling
testFGReaderBasedMergeHandleInsertUpsertDeleteInternal) that detects the Gluten
run (e.g., an existing isGluten()/environment check) and asserts the unsupported
behavior for CUSTOM/CUSTOM_MERGER instead of silently skipping them; update
references to testFGReaderBasedMergeHandleInsertUpsertDeleteInternal, the
ValueSource annotation, and any Gluten-run detection logic accordingly.

Comment on lines +260 to +262
val currTime = System.currentTimeMillis()
val tablePath = s"/home/ubuntu/ws3/hudi-rs/tables/table${TestMergeModeCommitTimeOrdering.cnt}_$currTime"
println(s"fdss: ${tablePath}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Same hardcoded path and debug statement in second test.

This is a duplicate of the issue in the first test - hardcoded absolute path and debug println must be removed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/others/TestMergeModeCommitTimeOrdering.scala`
around lines 260 - 262, Remove the hardcoded absolute path and debug println: in
the TestMergeModeCommitTimeOrdering test where tablePath is constructed using
TestMergeModeCommitTimeOrdering.cnt and System.currentTimeMillis(), replace the
absolute "/home/ubuntu/..." path with a temp or relative test directory (or use
the test framework's temporary folder utility) and delete the println(s"fdss:
${tablePath}") call; ensure tablePath is built from a configurable base (or
tempDir) so the test is not environment-specific.

Comment on lines +48 to +55
val BASE_OUTPUT_PATH = "/home/ubuntu/ws3/hudi-rs/tables/fdss"

private def cleanPath(path: String): Unit = {
val hadoopPath = new Path(path)
val fs = hadoopPath.getFileSystem(spark.sessionState.newHadoopConf())
if (fs.exists(hadoopPath)) {
fs.delete(hadoopPath, true)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't hard-code a workstation path into an automated test suite.

These tests assume /home/ubuntu/... exists and cleanPath recursively deletes under it. That will fail on CI/non-Ubuntu hosts and is unsafe on developer machines. Use a harness-provided temp directory instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/others/TestMORFileSliceLayouts.scala`
around lines 48 - 55, Replace the hard-coded BASE_OUTPUT_PATH and the unsafe
cleanPath usage with a test-scoped temporary directory: change BASE_OUTPUT_PATH
to be created at runtime (e.g., use java.nio.file.Files.createTempDirectory or
the test harness/tempDir utility) and ensure cleanPath operates on that temp dir
only (or remove cleanPath and rely on the temp directory lifecycle). Update
references to BASE_OUTPUT_PATH and the cleanPath helper so all tests use this
generated path, and ensure the temp directory is deleted/cleaned in an
afterEach/afterAll teardown to avoid leaving artifacts.

Comment on lines +62 to +68
private def getQueryFileSlices(basePath: String): Seq[FileSlice] = {
val (metaClient, fsView) = getMetaClientAndFileSystemView(basePath)
val queryInstant = metaClient.getActiveTimeline
.filterCompletedInstants().lastInstant().get().requestedTime()
fsView.getLatestMergedFileSlicesBeforeOrOn("", queryInstant)
.iterator().asScala.toSeq
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Close the file-system view and log readers on every exit path.

getMetaClientAndFileSystemView creates a HoodieTableFileSystemView that is never closed, and each HoodieLogFileReader only closes on the happy path. A failing read/assertion here will leak handles across this suite.

Also applies to: 111-131

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/others/TestMORFileSliceLayouts.scala`
around lines 62 - 68, getQueryFileSlices currently obtains resources from
getMetaClientAndFileSystemView (which returns a HoodieTableFileSystemView) and
iterates log readers but does not guarantee closure on exception paths; update
getQueryFileSlices to ensure the HoodieTableFileSystemView and any
HoodieLogFileReader instances are closed in all exit paths (use try/finally or a
loan/withResource pattern) so that fsView.close() is always called and each log
reader is closed even if an assertion or read fails; locate
getMetaClientAndFileSystemView and any code that constructs HoodieLogFileReader
(also in the 111-131 region) and wrap their usage in try/finally blocks or a
managed-resource helper to release resources reliably.

Comment thread pom.xml
<spark33.version>3.3.4</spark33.version>
<spark34.version>3.4.3</spark34.version>
<spark35.version>3.5.5</spark35.version>
<spark35.version>3.5.2-INTERNAL-0.16</spark35.version>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let me examine the pom.xml file structure and the specific lines mentioned
wc -l pom.xml

Repository: yihua/hudi

Length of output: 66


🏁 Script executed:

# Check the area around line 173 for the spark35.version
sed -n '170,180p' pom.xml

Repository: yihua/hudi

Length of output: 631


🏁 Script executed:

# Check the area around lines 1966-1970 for the Onehouse CodeArtifact repository
sed -n '1960,1975p' pom.xml

Repository: yihua/hudi

Length of output: 479


🏁 Script executed:

# Search for profile definitions to verify which is the default profile
rg -n '<profile>' pom.xml | head -20

Repository: yihua/hudi

Length of output: 433


🏁 Script executed:

# Look for activeByDefault or default profile settings
rg -n -A 3 'activeByDefault\|<id>spark3' pom.xml

Repository: yihua/hudi

Length of output: 36


🏁 Script executed:

# Search for spark3.5 profile definition and its activation settings
rg -n -A 15 '<id>spark3\.5' pom.xml

Repository: yihua/hudi

Length of output: 1058


🏁 Script executed:

# Look for all profiles with their IDs and activation settings
rg -n -B 2 -A 8 '<profile>' pom.xml | grep -E '(profile>|<id>|activeByDefault|activation)'

Repository: yihua/hudi

Length of output: 2375


🏁 Script executed:

# Check line 1974 onwards to see the first profile definition after repositories
sed -n '1974,2050p' pom.xml

Repository: yihua/hudi

Length of output: 2346


🏁 Script executed:

# Get the complete spark3.5 profile definition to check for activeByDefault
sed -n '2656,2715p' pom.xml

Repository: yihua/hudi

Length of output: 3074


🏁 Script executed:

# Check if there are any activeByDefault declarations in profiles
rg -n 'activeByDefault' pom.xml

Repository: yihua/hudi

Length of output: 159


🏁 Script executed:

# Check properties section to understand default spark module
sed -n '140,200p' pom.xml

Repository: yihua/hudi

Length of output: 3710


This makes the default Spark 3.5 build depend on private infrastructure.

spark3.5 is the active-by-default profile, and the default properties use ${spark35.version} which now points to 3.5.2-INTERNAL-0.16. With the Onehouse CodeArtifact repository configured in the global repositories section, every default build will attempt to resolve this internal coordinate from private AWS infrastructure. Public CI and external contributors cannot access this repository without Onehouse credentials. Please keep the public Spark coordinate as default and move the internal version and repository behind an explicit opt-in profile.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pom.xml` at line 173, The pom currently sets the Maven property
spark35.version to the internal coordinate "3.5.2-INTERNAL-0.16" which makes the
active-by-default profile spark3.5 depend on private Onehouse CodeArtifact
repos; revert the default <spark35.version> to the public Spark artifact (e.g.,
a public 3.5.x coordinate) and move the internal "3.5.2-INTERNAL-0.16" value and
the Onehouse repository declaration into a new opt-in Maven profile (e.g.,
profile id "onehouse-internal" or similar); update references to spark35.version
to remain unchanged so external builds use the public coordinate unless the
opt-in profile is activated.

Comment on lines +14 to +22
```
java.lang.AssertionError: sizeInBytes (276) should be a multiple of 8
at org.apache.spark.sql.catalyst.expressions.UnsafeRow.setTotalSize(UnsafeRow.java:172)
at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.getRow(UnsafeRowWriter.java:78)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.codegen.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$...$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
at HoodieSparkUtils$.$anonfun$createRdd$2(HoodieSparkUtils.scala:107)
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add languages to the fenced code blocks.

markdownlint is already flagging these fences, so this doc will keep warning until the blocks are tagged (text, scala, java, cpp, etc.).

Also applies to: 32-36, 82-85, 184-203, 206-210, 245-251, 254-264, 320-327, 347-351

🧰 Tools
🪛 markdownlint-cli2 (0.22.0)

[warning] 14-14: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@RCA_Gluten_Velox_MOR_Failures.md` around lines 14 - 22, Several fenced code
blocks in the document are missing language specifiers; update each
triple-backtick fence to include an appropriate language token (e.g., ```java
for the Java stacktrace starting with java.lang.AssertionError and entries like
UnsafeRow.setTotalSize, ```scala for Scala snippets referencing
HoodieSparkUtils.createRdd, or ```text/```cpp as appropriate) so markdownlint
stops flagging them; apply this change to the reported ranges (including the
stacktrace block and the other blocks at 32-36, 82-85, 184-203, 206-210,
245-251, 254-264, 320-327, 347-351) ensuring each fence uses the correct
language tag that matches the contained code/sample.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant