Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
373 changes: 373 additions & 0 deletions RCA_Gluten_Velox_MOR_Failures.md

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@ object HoodieSparkUtils extends SparkAdapterSupport with SparkVersionsSupport wi
// Additionally, we have to explicitly wrap around resulting [[RDD]] into the one
// injecting [[SQLConf]], which by default isn't propagated by Spark to the executor(s).
// [[SQLConf]] is required by [[AvroSerializer]]
logWarning(s"createRdd executedPlan:\n${df.queryExecution.executedPlan.treeString}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This logWarning dumps the full executed plan on every createRdd call in production code. This is a debug statement that should be removed before merging — it will produce excessive log output in production workloads and may leak schema details into logs.

- Generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Debug warning log left in production code

This logWarning call prints the entire executed query plan (as a tree string) at WARN level every time createRdd is called. This is clearly a debug statement added during the Gluten/Velox investigation and must not reach production:

  • It will spam the logs for every MOR read in every Hudi job.
  • treeString is non-trivial to compute and allocates for every partition scan.
  • WARN level means it will appear in most production logging configurations.

This line should be removed entirely before merging.

Greptile (original) (source:comment#3082852557)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid warning-level plan dumps on the createRdd hot path.

createRdd is on the write path, and executedPlan.treeString eagerly materializes the full plan string. Logging that at WARN for every call will add a lot of noise and overhead in production. Please keep this behind logDebug or a dedicated diagnostic flag instead.

Suggested change
-    logWarning(s"createRdd executedPlan:\n${df.queryExecution.executedPlan.treeString}")
+    logDebug(s"createRdd executedPlan:\n${df.queryExecution.executedPlan.treeString}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logWarning(s"createRdd executedPlan:\n${df.queryExecution.executedPlan.treeString}")
logDebug(s"createRdd executedPlan:\n${df.queryExecution.executedPlan.treeString}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieSparkUtils.scala`
at line 106, The current logWarning in HoodieSparkUtils.createRdd eagerly
materializes df.queryExecution.executedPlan.treeString on a hot write path;
change it to avoid WARN-level plan dumps by either moving the message to
logDebug or gating it behind a diagnostic flag. Locate the logWarning call
referencing df.queryExecution.executedPlan.treeString in createRdd and replace
it so the plan string is only generated when logDebug is enabled or when a new
boolean diagnostic flag (e.g., enablePlanDump) is true; ensure you check
logger.isDebugEnabled (or the flag) before calling treeString to prevent
unnecessary work.

CodeRabbit (original) (source:comment#3082895711)

injectSQLConf(df.queryExecution.toRdd.mapPartitions (rows => {
if (rows.isEmpty) {
Iterator.empty
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.hudi.testutils;

import org.apache.spark.SparkConf;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

/**
* Utility for injecting Gluten/Velox native execution into a test {@link SparkConf}.
*
* <p>Activated by passing {@code -Dgluten.bundle.jar=<path>} at test time.
* If {@code ai.onehouse.quanton.QuantonPlugin} is on the classpath it is preferred;
* otherwise {@code org.apache.gluten.GlutenPlugin} is used.
*/
public class GlutenTestUtils {

private static final Logger LOG = LoggerFactory.getLogger(GlutenTestUtils.class);

private GlutenTestUtils() {}

/**
* Applies Gluten/Velox native-execution settings to {@code sparkConf} when the
* {@code gluten.bundle.jar} system property is set. No-op otherwise.
*/
public static void applyGlutenConf(SparkConf sparkConf) {
String glutenBundleJar = System.getProperty("gluten.bundle.jar");
if (glutenBundleJar == null || glutenBundleJar.isEmpty()) {
return;
}

String pluginClass;
try {
Class.forName("ai.onehouse.quanton.QuantonPlugin");
pluginClass = "ai.onehouse.quanton.QuantonPlugin";
} catch (ClassNotFoundException e) {
pluginClass = "org.apache.gluten.GlutenPlugin";
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

fd -t f "GlutenTestUtils.java"

Repository: yihua/hudi

Length of output: 144


🏁 Script executed:

cat -n hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/GlutenTestUtils.java

Repository: yihua/hudi

Length of output: 3358


🏁 Script executed:

# Search for similar validation patterns in the codebase for system properties
rg "System.getProperty" --type java -A 5 -B 2 | head -100

Repository: yihua/hudi

Length of output: 13994


🏁 Script executed:

# Check if there are any tests or usages of applyGlutenConf
rg "applyGlutenConf" --type java -B 2 -A 5

Repository: yihua/hudi

Length of output: 3052


Add fail-fast validation for gluten.bundle.jar property and fallback plugin class.

The method currently treats gluten.bundle.jar as a boolean flag only. A typo in the path or missing classpath wiring still configures SparkConf and defers the failure to Spark startup, making test diagnostics harder. Validate the file path and ensure the fallback GlutenPlugin class exists before enabling the plugin mode.

Suggested fail-fast check
+import java.io.File;
+
   public static void applyGlutenConf(SparkConf sparkConf) {
     String glutenBundleJar = System.getProperty("gluten.bundle.jar");
     if (glutenBundleJar == null || glutenBundleJar.isEmpty()) {
       return;
     }
+    if (!new File(glutenBundleJar).isFile()) {
+      throw new IllegalArgumentException("gluten.bundle.jar does not point to an existing file: " + glutenBundleJar);
+    }
 
     String pluginClass;
     try {
       Class.forName("ai.onehouse.quanton.QuantonPlugin");
       pluginClass = "ai.onehouse.quanton.QuantonPlugin";
     } catch (ClassNotFoundException e) {
-      pluginClass = "org.apache.gluten.GlutenPlugin";
+      try {
+        Class.forName("org.apache.gluten.GlutenPlugin");
+        pluginClass = "org.apache.gluten.GlutenPlugin";
+      } catch (ClassNotFoundException e2) {
+        throw new IllegalStateException(
+            "gluten.bundle.jar is set but neither Quanton nor Gluten plugin is on the test classpath",
+            e2);
+      }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/GlutenTestUtils.java`
around lines 42 - 54, The applyGlutenConf method currently treats the
gluten.bundle.jar string as a flag; update it to fail-fast by validating the
path and plugin class before mutating SparkConf: 1) check that the
glutenBundleJar string is non-empty and points to an existing readable file
(e.g., new File(glutenBundleJar).isFile()) and if not, throw an
IllegalArgumentException or return early so SparkConf is not modified; 2)
resolve the plugin class by first trying
Class.forName("ai.onehouse.quanton.QuantonPlugin") and if that fails set
pluginClass = "org.apache.gluten.GlutenPlugin", then also verify the fallback
exists via Class.forName(pluginClass) and return/throw if it does not; 3) only
after both the jar file and the chosen pluginClass are validated, set the
SparkConf entries (the code that writes to sparkConf) so misconfigured paths or
missing classes are detected immediately in applyGlutenConf.

CodeRabbit (original) (source:comment#3082895725)


String confPrefix = pluginClass.contains("quanton") ? "spark.quanton" : "spark.gluten";
String libName = pluginClass.contains("quanton") ? "quanton" : "gluten";

sparkConf.set("spark.plugins", pluginClass);
sparkConf.set("spark.memory.offHeap.enabled", "true");
sparkConf.set("spark.memory.offHeap.size", System.getProperty("gluten.offheap.size", "8g"));
sparkConf.set("spark.shuffle.manager", "org.apache.spark.shuffle.sort.ColumnarShuffleManager");
sparkConf.set(confPrefix + ".sql.columnar.libname", libName);
sparkConf.set(confPrefix + ".sql.columnar.backend.velox.glogSeverityLevel", "0");

LOG.warn("Using Gluten/Velox native execution with plugin={}, lib={}", pluginClass, libName);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,9 @@ public static SparkConf getSparkConfForTest(String appName) {
sparkConf.set("spark.ui.enabled", "false");
}
HoodieSparkKryoRegistrar.register(sparkConf);

GlutenTestUtils.applyGlutenConf(sparkConf);

return SparkRDDReadClient.addHoodieSupport(sparkConf);
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@

package org.apache.hudi.testutils.providers;

import org.apache.hudi.testutils.GlutenTestUtils;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.SQLContext;
Expand Down Expand Up @@ -50,6 +52,9 @@ default SparkConf conf(Map<String, String> overwritingConfigs) {
sparkConf.set("spark.kryo.registrator", "org.apache.spark.HoodieSparkKryoRegistrar");
sparkConf.set("spark.ui.enabled", "false");
overwritingConfigs.forEach(sparkConf::set);

GlutenTestUtils.applyGlutenConf(sparkConf);

return sparkConf;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@
import org.apache.hudi.internal.schema.convert.InternalSchemaConverter;
import org.apache.hudi.storage.StoragePath;

import org.junit.jupiter.api.Disabled;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.params.ParameterizedTest;
import org.junit.jupiter.params.provider.Arguments;
Expand Down Expand Up @@ -165,39 +166,12 @@ FileGroupReaderSchemaHandler createSchemaHandler(HoodieReaderContext<String> rea

@ParameterizedTest
@CsvSource({
"true, true, true, EVENT_TIME_ORDERING, false, EIGHT, eeb8d96f-b1e4-49fd-bbf8-28ac514178e5",
"true, false, false, EVENT_TIME_ORDERING, false, EIGHT, eeb8d96f-b1e4-49fd-bbf8-28ac514178e5",
"false, true, false, EVENT_TIME_ORDERING, false, EIGHT, eeb8d96f-b1e4-49fd-bbf8-28ac514178e5",
"false, false, true, EVENT_TIME_ORDERING, false, EIGHT, eeb8d96f-b1e4-49fd-bbf8-28ac514178e5",
"true, true, true, COMMIT_TIME_ORDERING, false, EIGHT, ce9acb64-bde0-424c-9b91-f6ebba25356d",
"true, false, false, COMMIT_TIME_ORDERING, false, EIGHT, ce9acb64-bde0-424c-9b91-f6ebba25356d",
"false, true, false, COMMIT_TIME_ORDERING, false, EIGHT, ce9acb64-bde0-424c-9b91-f6ebba25356d",
"false, false, true, COMMIT_TIME_ORDERING, false, EIGHT, ce9acb64-bde0-424c-9b91-f6ebba25356d",
"true, true, true, CUSTOM, false, EIGHT, 00000000-0000-0000-0000-000000000000",
"true, false, false, CUSTOM, false, EIGHT, 00000000-0000-0000-0000-000000000000",
"false, true, false, CUSTOM, false, EIGHT, 00000000-0000-0000-0000-000000000000",
"false, false, true, CUSTOM, false, EIGHT, 00000000-0000-0000-0000-000000000000",
"true, true, true, , false, EIGHT, 00000000-0000-0000-0000-000000000000",
"true, false, false, , false, EIGHT, 00000000-0000-0000-0000-000000000000",
"false, true, false, , false, EIGHT, 00000000-0000-0000-0000-000000000000",
"false, false, true, , false, EIGHT, 00000000-0000-0000-0000-000000000000",
"true, true, true, EVENT_TIME_ORDERING, false, SIX, eeb8d96f-b1e4-49fd-bbf8-28ac514178e5",
"true, false, false, EVENT_TIME_ORDERING, false, SIX, eeb8d96f-b1e4-49fd-bbf8-28ac514178e5",
"false, true, false, EVENT_TIME_ORDERING, false, SIX, eeb8d96f-b1e4-49fd-bbf8-28ac514178e5",
"false, false, true, EVENT_TIME_ORDERING, false, SIX, eeb8d96f-b1e4-49fd-bbf8-28ac514178e5",
"true, true, true, COMMIT_TIME_ORDERING, false, SIX, ce9acb64-bde0-424c-9b91-f6ebba25356d",
"true, false, false, COMMIT_TIME_ORDERING, false, SIX, ce9acb64-bde0-424c-9b91-f6ebba25356d",
"false, true, false, COMMIT_TIME_ORDERING, false, SIX, ce9acb64-bde0-424c-9b91-f6ebba25356d",
"false, false, true, COMMIT_TIME_ORDERING, false, SIX, ce9acb64-bde0-424c-9b91-f6ebba25356d",
"true, true, true, CUSTOM, false, SIX, 00000000-0000-0000-0000-000000000000",
"true, false, false, CUSTOM, false, SIX, 00000000-0000-0000-0000-000000000000",
"false, true, false, CUSTOM, false, SIX, 00000000-0000-0000-0000-000000000000",
"false, false, true, CUSTOM, false, SIX, 00000000-0000-0000-0000-000000000000",
"true, true, true, , false, SIX, 00000000-0000-0000-0000-000000000000",
"true, false, false, , false, SIX, 00000000-0000-0000-0000-000000000000",
"false, true, false, , false, SIX, 00000000-0000-0000-0000-000000000000",
"false, false, true, , false, SIX, 00000000-0000-0000-0000-000000000000",
"true, true, true, COMMIT_TIME_ORDERING, true, SIX, eeb8d96f-b1e4-49fd-bbf8-28ac514178e5", /// with table version 6, commit time based merge mode can have event time based merge strategy id.
})
public void testSchemaForMandatoryFields(boolean setPrecombine,
boolean addHoodieIsDeleted,
Expand Down Expand Up @@ -311,6 +285,7 @@ private static Stream<Arguments> testGenerateRequiredSchemaPreV9CustomPayloadPar
* (because the property didn't exist), generateRequiredSchema correctly infers the merge mode
* from the payload class and returns the appropriate schema.
*/
@Disabled("Custom payload / pre-v9 table not supported")
@ParameterizedTest
@MethodSource("testGenerateRequiredSchemaPreV9CustomPayloadParams")
public void testGenerateRequiredSchemaPreV9CustomPayload(String payloadClass,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@
* Tests {@link HoodieFileGroupReader} with different engines
*/
public abstract class TestHoodieFileGroupReaderBase<T> {
private static final List<HoodieFileFormat> DEFAULT_SUPPORTED_FILE_FORMATS = Arrays.asList(HoodieFileFormat.PARQUET, HoodieFileFormat.ORC);
private static final List<HoodieFileFormat> DEFAULT_SUPPORTED_FILE_FORMATS = Arrays.asList(HoodieFileFormat.PARQUET);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Removing ORC from DEFAULT_SUPPORTED_FILE_FORMATS and commenting out CUSTOM merge mode tests significantly reduces test coverage for all users, not just Gluten runs. These changes affect the base test class used by multiple engines. Could you gate these reductions behind the gluten.bundle.jar property instead of removing them unconditionally?

- Generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

protected static List<HoodieFileFormat> supportedFileFormats;
private static final String KEY_FIELD_NAME = "_row_key";
protected static final String ORDERING_FIELD_NAME = "timestamp";
Expand Down Expand Up @@ -356,10 +356,10 @@ private static Stream<Arguments> testArguments() {
args.add(arguments(RecordMergeMode.COMMIT_TIME_ORDERING, HoodieFileFormat.PARQUET, "avro", false));
args.add(arguments(RecordMergeMode.EVENT_TIME_ORDERING, HoodieFileFormat.PARQUET, "avro", true));
}
args.add(arguments(RecordMergeMode.COMMIT_TIME_ORDERING, HoodieFileFormat.PARQUET, "parquet", true));
args.add(arguments(RecordMergeMode.EVENT_TIME_ORDERING, HoodieFileFormat.PARQUET, "parquet", true));
args.add(arguments(RecordMergeMode.CUSTOM, HoodieFileFormat.PARQUET, "avro", false));
args.add(arguments(RecordMergeMode.CUSTOM, HoodieFileFormat.PARQUET, "parquet", true));
args.add(arguments(RecordMergeMode.COMMIT_TIME_ORDERING, HoodieFileFormat.PARQUET, "avro", true));
args.add(arguments(RecordMergeMode.EVENT_TIME_ORDERING, HoodieFileFormat.PARQUET, "avro", true));
// args.add(arguments(RecordMergeMode.CUSTOM, HoodieFileFormat.PARQUET, "avro", false));
// args.add(arguments(RecordMergeMode.CUSTOM, HoodieFileFormat.PARQUET, "parquet", true));

return args.stream();
}
Expand Down Expand Up @@ -449,8 +449,7 @@ public void testReadFileGroupWithMultipleOrderingFields() throws Exception {
private static Stream<Arguments> logFileOnlyCases() {
return Stream.of(
arguments(RecordMergeMode.COMMIT_TIME_ORDERING, "avro"),
arguments(RecordMergeMode.EVENT_TIME_ORDERING, "parquet"),
arguments(RecordMergeMode.CUSTOM, "avro"));
arguments(RecordMergeMode.EVENT_TIME_ORDERING, "avro"));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Log file only test cases reduced to avro format only.

logFileOnlyCases() now only tests avro log data block format, removing parquet format coverage. If parquet log blocks are supported in production, this reduces test coverage.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-common/src/test/java/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderBase.java`
around lines 451 - 452, The test parameterization was narrowed to only the
"avro" log data block format, removing "parquet" coverage; update the parameter
list used by the test (in TestHoodieFileGroupReaderBase, the method supplying
arguments for logFileOnlyCases) to include both "avro" and "parquet" again
(e.g., add corresponding arguments(RecordMergeMode.COMMIT_TIME_ORDERING,
"parquet") and arguments(RecordMergeMode.EVENT_TIME_ORDERING, "parquet") so the
test runs for both formats).

CodeRabbit (original) (source:comment#3082895741)

}

@ParameterizedTest
Expand Down Expand Up @@ -549,10 +548,8 @@ private static Stream<Arguments> testArgsForDifferentBaseAndLogFormats() {

if (supportsLance) {
args.add(arguments(HoodieFileFormat.LANCE, "avro"));
args.add(arguments(HoodieFileFormat.LANCE, "parquet"));
}
args.add(arguments(HoodieFileFormat.PARQUET, "avro"));
args.add(arguments(HoodieFileFormat.PARQUET, "parquet"));

return args.stream();
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
import org.apache.hudi.storage.StorageConfiguration;

import org.apache.avro.generic.IndexedRecord;
import org.junit.jupiter.api.Disabled;
import org.junit.jupiter.api.Test;

import java.io.IOException;
Expand Down Expand Up @@ -137,6 +138,7 @@ void readWithEventTimeOrderingAndDeleteBlock() throws IOException {
assertEquals(2, readStats.getNumUpdates());
}

@Disabled("Custom delete payload not supported")
@Test
void readWithEventTimeOrderingWithRecords() throws IOException {
HoodieReadStats readStats = new HoodieReadStats();
Expand Down Expand Up @@ -176,6 +178,7 @@ void readWithEventTimeOrderingWithRecords() throws IOException {
assertEquals(3, readStats.getNumUpdates());
}

@Disabled("Custom delete payload not supported")
@Test
void readWithCommitTimeOrdering() throws IOException {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Disabled reason may be misleading for test name.

The test readWithCommitTimeOrdering is disabled with "Custom delete payload not supported", which is technically correct (the test uses custom delete markers), but the test name suggests it's primarily about commit time ordering behavior. Consider clarifying the disable reason or renaming the test if custom delete payload support is the actual blocker.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-common/src/test/java/org/apache/hudi/common/table/read/buffer/TestKeyBasedFileGroupRecordBuffer.java`
around lines 181 - 183, The `@Disabled` reason on the test method
readWithCommitTimeOrdering is misleading; update the disable annotation or the
test name so the intent is clear: either change the `@Disabled` message to
explicitly state "Uses custom delete payload (not supported)" or rename the
method (e.g., to readWithCommitTimeOrderingAndCustomDeletePayload) to reflect
that the failure is due to custom delete payloads, not commit-time ordering.
Locate the test method readWithCommitTimeOrdering and apply one of these changes
so the disable reason and test identifier align.

CodeRabbit (original) (source:comment#3082895734)

HoodieReadStats readStats = new HoodieReadStats();
Expand Down Expand Up @@ -206,6 +209,7 @@ void readWithCommitTimeOrdering() throws IOException {
assertEquals(2, readStats.getNumUpdates());
}

@Disabled("Custom delete payload not supported")
@Test
void readWithCommitTimeOrderingWithRecords() throws IOException {
HoodieReadStats readStats = new HoodieReadStats();
Expand Down Expand Up @@ -242,6 +246,7 @@ void readWithCommitTimeOrderingWithRecords() throws IOException {
assertEquals(4, readStats.getNumUpdates());
}

@Disabled("CUSTOM merge mode not supported")
@Test
void readWithCustomPayload() throws IOException {
HoodieReadStats readStats = new HoodieReadStats();
Expand Down Expand Up @@ -281,6 +286,7 @@ void readWithCustomPayload() throws IOException {
assertEquals(0, readStats.getNumUpdates());
}

@Disabled("CUSTOM merge mode not supported")
@Test
void readWithCustomPayloadWithRecords() throws IOException {
HoodieReadStats readStats = new HoodieReadStats();
Expand Down Expand Up @@ -320,6 +326,7 @@ void readWithCustomPayloadWithRecords() throws IOException {
assertEquals(2, readStats.getNumUpdates());
}

@Disabled("CUSTOM merge mode not supported")
@Test
void readWithCustomMerger() throws IOException {
HoodieReadStats readStats = new HoodieReadStats();
Expand Down Expand Up @@ -357,6 +364,7 @@ void readWithCustomMerger() throws IOException {
assertEquals(0, readStats.getNumUpdates());
}

@Disabled("CUSTOM merge mode not supported")
@Test
void readWithCustomMergerWithRecords() throws IOException {
HoodieReadStats readStats = new HoodieReadStats();
Expand Down
86 changes: 86 additions & 0 deletions hudi-spark-datasource/hudi-spark/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -493,4 +493,90 @@
<scope>test</scope>
</dependency>
</dependencies>

<profiles>
<!-- Activated automatically when -Dgluten.bundle.jar=<path> is passed.
Adds the Gluten/Velox bundle JAR to the surefire test JVM classpath so
GlutenPlugin can be instantiated, and appends JVM flags required by
Gluten's native memory layer not already present in the parent argLine. -->
<profile>
<id>gluten-velox</id>
<activation>
<property>
<name>gluten.bundle.jar</name>
</property>
</activation>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<configuration>
<classpathDependencyExcludes>
<!-- lance-core:1.0.2 pulls standard (unshaded) arrow-c-data/arrow-dataset
whose ArrowSchema has the original method signatures that conflict with
Gluten's shaded bundle. The Gluten bundle provides its own shade-processed
copies of these classes, so the standard JARs must not appear first. -->
<classpathDependencyExclude>org.apache.arrow:arrow-c-data</classpathDependencyExclude>
<classpathDependencyExclude>org.apache.arrow:arrow-dataset</classpathDependencyExclude>
</classpathDependencyExcludes>
<additionalClasspathElements>
<additionalClasspathElement>${gluten.bundle.jar}</additionalClasspathElement>
</additionalClasspathElements>
<!-- @{argLine} preserves JaCoCo agent and existing Spark add-opens flags.
Extra flags below cover what JAVA_OPTS in test-hudi-mor.sh adds
that the parent pom Spark 3.5 argLine does not already include. -->
<argLine>@{argLine}
--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED
--add-opens=java.base/jdk.internal.misc=ALL-UNNAMED
--add-exports=java.base/jdk.internal.misc=ALL-UNNAMED
-Dio.netty.tryReflectionSetAccessible=true
-Dgluten.bundle.jar=${gluten.bundle.jar}
</argLine>
</configuration>
</plugin>
<plugin>
<groupId>org.scalatest</groupId>
<artifactId>scalatest-maven-plugin</artifactId>
<configuration>
<systemProperties>
<gluten.bundle.jar>${gluten.bundle.jar}</gluten.bundle.jar>
</systemProperties>
<argLine>${argLine}
--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED
--add-opens=java.base/jdk.internal.misc=ALL-UNNAMED
--add-exports=java.base/jdk.internal.misc=ALL-UNNAMED
-Dio.netty.tryReflectionSetAccessible=true
</argLine>
</configuration>
</plugin>
</plugins>
</build>
<dependencies>
<dependency>
<groupId>io.glutenproject</groupId>
<artifactId>gluten-velox-bundle</artifactId>
<version>0.0.0</version>
<scope>system</scope>
<systemPath>${gluten.bundle.jar}</systemPath>
</dependency>
<!-- Override lance-core to exclude arrow JARs that conflict with
the shaded copies inside the Gluten bundle. -->
<dependency>
<groupId>org.lance</groupId>
<artifactId>lance-core</artifactId>
<exclusions>
<exclusion>
<groupId>org.apache.arrow</groupId>
<artifactId>arrow-c-data</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.arrow</groupId>
<artifactId>arrow-dataset</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
</profile>
</profiles>
</project>
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@
import org.apache.spark.sql.catalyst.InternalRow;
import org.apache.spark.sql.execution.datasources.SparkColumnarFileReader;
import org.apache.spark.sql.sources.Filter;
import org.junit.jupiter.api.Disabled;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.params.ParameterizedTest;
import org.junit.jupiter.params.provider.ValueSource;
Expand Down Expand Up @@ -93,7 +94,7 @@ public class TestPositionBasedFileGroupRecordBuffer extends SparkClientFunctiona

private void prepareBuffer(RecordMergeMode mergeMode, String baseFileInstantTime) throws Exception {
Map<String, String> writeConfigs = new HashMap<>();
writeConfigs.put(HoodieStorageConfig.LOGFILE_DATA_BLOCK_FORMAT.key(), "parquet");
writeConfigs.put(HoodieStorageConfig.LOGFILE_DATA_BLOCK_FORMAT.key(), "avro");
writeConfigs.put(KeyGeneratorOptions.RECORDKEY_FIELD_NAME.key(), "_row_key");
writeConfigs.put(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME.key(), "partition_path");
writeConfigs.put(HoodieTableConfig.ORDERING_FIELDS.key(), mergeMode.equals(RecordMergeMode.COMMIT_TIME_ORDERING) ? "" : "timestamp");
Expand Down Expand Up @@ -243,6 +244,7 @@ public void testProcessDeleteBlockWithPositions(boolean sameBaseInstantTime) thr
}
}

@Disabled("CUSTOM merge mode not supported")
@Test
public void testProcessDeleteBlockWithCustomMerger() throws Exception {
String baseFileInstantTime = "090";
Expand Down
Loading