[OSS PR #18497] [WIP] feat(lance): round-trip Hudi VECTOR columns as native Lance fixed-size lists by yihua · Pull Request #49 · yihua/hudi

yihua · 2026-04-18T01:20:52Z

Mirror of apache#18497 for automated bot review.

Original author: @rahil-c
Base branch: master

Summary by CodeRabbit

New Features
- Enhanced vector column support with improved metadata preservation across file formats.
- Vector data now properly round-trips when reading and writing storage formats.
Bug Fixes
- Vector metadata is now correctly restored during read operations.
- Vector field metadata is properly enriched before storage conversion.
Tests
- Added comprehensive functional tests for vector data handling and format compatibility.

Translate the Hudi VECTOR logical-type metadata (`hudi_type = "VECTOR(dim[,elem])"`) into the lance-spark metadata key `arrow.fixed-size-list.size` before calling `LanceArrowUtils.toArrowSchema`, so the Lance writer emits a native Arrow FixedSizeList<Float32|Float64, dim> (Lance's vector column encoding) instead of a plain variable-length list. No change needed at `LanceFileWriter.open(...)`; the encoding is driven by the Arrow schema itself. - New private helper `enrichSparkSchemaForLanceVectors` in `HoodieSparkLanceWriter` reuses `VectorConversionUtils.detectVectorColumnsFromMetadata` to find VECTOR fields and attaches the Lance metadata key; non-vector fields pass through unchanged. - Fails fast with `HoodieNotSupportedException` for non-ArrayType or non- Float/Double element types (matches lance-spark's `shouldBeFixedSizeList`). - Tests in `TestLanceDataSource` (COW + MOR): - `testFloatVectorRoundTrip` - `testDoubleVectorRoundTrip` - `testMultipleVectorColumns` Each opens the written `.lance` file via `LanceFileReader` and asserts the field is `ArrowType.FixedSizeList` with the expected `listSize` — the direct regression guard that fails pre-fix and passes post-fix. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Companion to the Lance writer's native FixedSizeList encoding: on read, rehydrate the Hudi `hudi_type = VECTOR(...)` Spark metadata that `LanceArrowUtils.fromArrowSchema` drops, so the read schema matches the Parquet path. Gate the Parquet-only ArrayType→BinaryType vector rewrite in HoodieFileGroupReaderBasedFileFormat on format == PARQUET; Lance returns vectors natively as ArrayType so the rewrite would trigger a spurious cast and break the read. - VectorConversionUtils.restoreVectorMetadataFromArrowSchema walks the Arrow schema and re-attaches VECTOR(dim[,DOUBLE]) for FixedSizeList<Float32|Float64, dim> fields. - HoodieSparkLanceReader.getSchema and SparkLanceReaderBase.read now call it so downstream VECTOR-aware code sees the same schema as on Parquet. - TestLanceDataSource: assert hudi_type metadata is restored on read for float, double, and multi-vector round-trips. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Mirrors the Parquet writer: emit the comma-separated `colName:VECTOR(dim[,elemType])` descriptor list under the existing `hoodie.vector.columns` key in the Lance file-footer key-value metadata. Reader still derives VECTOR identity from the Arrow FixedSizeList type today; this footer entry is insurance for future descriptor fields the Arrow type cannot express (quantization tags, distance metrics, etc.) and keeps Lance files symmetric with Parquet files. - HoodieBaseLanceWriter: new protected `additionalSchemaMetadata()` hook invoked during close(), so subclasses can contribute footer KV entries alongside bloom-filter metadata. - HoodieSparkLanceWriter: override `additionalSchemaMetadata()` to emit `hoodie.vector.columns` when the Spark schema has any VECTOR column. - VectorConversionUtils: add `buildVectorColumnsMetadataValue(StructType)` matching the Parquet-path helper's output format. - TestLanceDataSource: assert footer carries the expected descriptor list for float, double, and multi-vector round-trips. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

SparkFileFormatInternalRowReaderContext.getFileRecordIterator had a second, unconditional rewrite of VECTOR columns from ArrayType to BinaryType (the earlier withVectorRewrite gate in HoodieFileGroupReaderBasedFileFormat only covered the non-FileGroupReader branch). On the MOR / FileGroupReader path this caused Lance reads to fail with scala.MatchError: ArrayType(FloatType,true) in Cast.castToBinaryCode, because Lance returns vectors natively as ArrayType while the caller-supplied schema had been rewritten to BinaryType — the generated UnsafeProjection then injected an unsupported Cast(ArrayType -> BinaryType). Gate the detection + rewrite on the file format: skip it for .lance base files. Hudi log files are always parquet-encoded so they still take the Parquet path. Fixes 14 TestLanceDataSource vector errors (COW + MOR) observed in spark3.5 / spark3.4 CI, including the spark3.4 part2 6h timeout that was the same failure retrying. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

On Scala 2.13, Row.getAs[Seq[Float]] fails at runtime with ClassCastException: scala.collection.mutable.ArraySeq$ofRef cannot be cast to scala.collection.immutable.Seq, because Seq in Scala 2.13 defaults to immutable.Seq while Spark holds array columns as mutable.ArraySeq internally. Row.getSeq[T] is declared as scala.collection.Seq[T] (general), so it works on both 2.12 (where Seq = scala.collection.Seq) and 2.13 (where Seq = scala.collection.immutable.Seq). Same runtime object, no cast. Fixes the 14 TestLanceDataSource errors on java17 CI (scala-2.13, spark3.5 / spark4.0). The earlier VECTOR->BinaryType rewrite fix resolved the scala.MatchError in the read path; this change resolves the subsequent 2.13-only test-side ClassCastException. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… size check - SparkFileFormatInternalRowReaderContext: use tableConfig.getBaseFileFormat instead of filename extension sniff to detect Lance base files - VectorConversionUtils.restoreVectorMetadataFromArrowSchema: remove confusing arrowFields.size != sparkFields.length defensive guard Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

greptile-apps

yihua has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

coderabbitai · 2026-04-18T01:21:13Z

📝 Walkthrough

Walkthrough

This PR enhances vector metadata handling across Hudi to support Lance file format persistence and recovery. It introduces utilities to serialize/restore vector descriptors, adds vector schema enrichment during Lance writes, restores vector type information during reads, renames a format-agnostic metadata key constant, and implements conditional vector schema rewriting based on file format. Test coverage for Lance vectors is substantially expanded.

Changes

Cohort / File(s)	Summary
Vector Metadata Utilities `hudi-client/.../VectorConversionUtils.java`	Added public methods to build vector column footer values from Spark schemas and restore vector metadata from Arrow schemas to Spark `StructType` fields. Includes private helper to derive vector descriptors from Arrow `FixedSizeList` fields.
Lance File Support `hudi-client/.../HoodieSparkLanceReader.java`, `hudi-client/.../HoodieSparkLanceWriter.java`, `hudi-hadoop-common/.../HoodieBaseLanceWriter.java`	Lance reader now restores vector metadata when converting Arrow to Spark schema. Lance writer pre-processes schemas to enrich VECTOR field metadata with dimension information before Arrow conversion. Base writer added hook for subclasses to emit additional schema metadata to Lance footer.
Metadata Key Migration `hudi-common/.../HoodieSchema.java`, `hudi-client/.../HoodieRowParquetWriteSupport.java`, `hudi-hadoop-common/.../HoodieAvroWriteSupport.java`, `hudi-spark-datasource/.../TestVectorDataSource.scala`	Renamed `PARQUET_VECTOR_COLUMNS_METADATA_KEY` to `VECTOR_COLUMNS_METADATA_KEY` for format-agnostic metadata storage. Added `serializeVectorColumnsMetadata()` method to convert vector mappings into canonical descriptor strings. Updated all metadata key references accordingly.
Format-Conditional Vector Handling `hudi-client/.../SparkFileFormatInternalRowReaderContext.scala`, `hudi-spark-datasource/.../HoodieFileGroupReaderBasedFileFormat.scala`, `hudi-spark-datasource/.../SparkLanceReaderBase.scala`	Vector column detection/rewriting now conditional: skipped for Lance base files, retained for Parquet and all log files. Lance reader applies vector metadata restoration after schema conversion.
Vector Testing `hudi-spark-datasource/.../TestLanceDataSource.scala`	Comprehensive new test suite for Lance vector functionality: round-trip validation for float/double vectors, multiple columns, nullable variants, MOR operations, projections, and partitioned tables. Includes utilities to assert vector metadata in Lance footer and field structure.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant SparkSchema as Spark Schema<br/>(with VECTOR metadata)
    participant Writer as HoodieSparkLanceWriter
    participant Enrich as VectorConversionUtils
    participant Arrow as Arrow Schema
    participant Lance as Lance File<br/>(with footer metadata)
    
    User->>Writer: Write DataFrame with VECTOR columns
    Writer->>Enrich: enrichVectorMetadataInSchema()
    Enrich->>Enrich: Detect VECTOR fields<br/>Add dimension to metadata
    Enrich-->>Writer: Enriched StructType
    Writer->>Arrow: Convert to Arrow schema<br/>(with sizing info)
    Arrow->>Lance: Write with metadata
    Writer->>Lance: addSchemaMetadata()<br/>(vector columns footer)
    Lance-->>User: Persisted file

sequenceDiagram
    actor User
    participant Lance as Lance File<br/>(with footer metadata)
    participant Reader as LanceReader
    participant Arrow as Arrow Schema<br/>(FixedSizeList)
    participant Restore as VectorConversionUtils
    participant SparkSchema as Spark Schema<br/>(restored VECTOR metadata)
    
    User->>Reader: Read Lance file
    Reader->>Lance: Load Arrow schema
    Lance-->>Arrow: Return schema<br/>(FixedSizeList fields)
    Reader->>Arrow: Convert to Spark StructType
    Arrow-->>Reader: Basic StructType<br/>(no VECTOR info)
    Reader->>Restore: restoreVectorMetadataFromArrowSchema()
    Restore->>Restore: Detect FixedSizeList of<br/>Float32/Float64
    Restore->>SparkSchema: Attach VECTOR metadata
    SparkSchema-->>User: Return enriched schema

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Poem

🐰 With vectors now flowing through Lance's bright streams,
I restore every dimension from Arrow's sweet dreams,
Metadata enriches each schema we write,
And doubles and floats now dance in the light! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main feature: adding native Lance fixed-size list support for Hudi VECTOR columns with round-trip capability.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch oss-18497

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestLanceDataSource.scala (1)
1109-1121: Consider adding a memory limit to RootAllocator.

new RootAllocator() creates an allocator without a memory limit. While acceptable for tests, adding a limit (e.g., new RootAllocator(64 * 1024 * 1024)) would prevent unbounded memory allocation if test data unexpectedly grows.
♻️ Suggested improvement
-        val allocator = new RootAllocator()
+        val allocator = new RootAllocator(64 * 1024 * 1024) // 64MB limit for test
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestLanceDataSource.scala`
around lines 1109 - 1121, The RootAllocator is created without a memory limit
which can lead to unbounded allocation in tests; change the instantiation of
RootAllocator in TestLanceDataSource (the allocator variable created with new
RootAllocator()) to include a reasonable limit (for example new RootAllocator(64
* 1024 * 1024)), ensuring the allocator is still closed in the existing finally
block; alternatively make the size a named constant or test-config value if you
prefer configurability.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/HoodieSparkLanceWriter.java`:
- Around line 154-183: In enrichSparkSchemaForLanceVectors, validate that any
StructField with a detected HoodieSchema.Vector (from
VectorConversionUtils.detectVectorColumnsFromMetadata) has a matching Spark type
shape before enriching: assert the field.dataType() is an ArrayType whose
elementType is FloatType when vec.getVectorElementType()==FLOAT or DoubleType
when DOUBLE, and optionally verify the fixed length equals vec.getDimension();
if any check fails throw a HoodieNotSupportedException describing the field
name, expected Spark element type/shape and actual type so we fail fast (do this
check for each vec in HoodieSparkLanceWriter.enrichSparkSchemaForLanceVectors
before building the new Metadata and before later calls such as
LanceArrowUtils.toArrowSchema).

In
`@hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/VectorConversionUtils.java`:
- Around line 96-102: The current serialization in VectorConversionUtils builds
"named" by iterating the unordered Map returned from
detectVectorColumnsFromMetadata, causing unstable hoodle.vector.columns order;
instead, iterate over the schema's fields[] in ordinal order (for i =
0..fields.length-1), check detected.get(i) for a non-null HoodieSchema.Vector
and if present put(fields[i].name(), value) into the LinkedHashMap, then call
HoodieSchema.serializeVectorColumnsMetadata(named) so the metadata preserves
schema order.

In `@hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java`:
- Line 219: Add a deprecated alias constant in HoodieSchema for backward
compatibility: retain PARQUET_VECTOR_COLUMNS_METADATA_KEY as a public static
final String that delegates to VECTOR_COLUMNS_METADATA_KEY and mark it
`@Deprecated` with a javadoc pointing to VECTOR_COLUMNS_METADATA_KEY; update any
usages in the class to reference VECTOR_COLUMNS_METADATA_KEY only and keep the
alias for at least one release so downstream code compiling against
PARQUET_VECTOR_COLUMNS_METADATA_KEY continues to work.

---

Nitpick comments:
In
`@hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestLanceDataSource.scala`:
- Around line 1109-1121: The RootAllocator is created without a memory limit
which can lead to unbounded allocation in tests; change the instantiation of
RootAllocator in TestLanceDataSource (the allocator variable created with new
RootAllocator()) to include a reasonable limit (for example new RootAllocator(64
* 1024 * 1024)), ensuring the allocator is still closed in the existing finally
block; alternatively make the size a named constant or test-config value if you
prefer configurability.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 39324687-0e79-4958-b547-f169bf876e59

📥 Commits

Reviewing files that changed from the base of the PR and between 3d0ab80 and 40bef3c.

📒 Files selected for processing (12)

hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/HoodieSparkLanceReader.java
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/HoodieSparkLanceWriter.java
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/VectorConversionUtils.java
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/row/HoodieRowParquetWriteSupport.java
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java
hudi-hadoop-common/src/main/java/org/apache/hudi/avro/HoodieAvroWriteSupport.java
hudi-hadoop-common/src/main/java/org/apache/hudi/io/lance/HoodieBaseLanceWriter.java
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/lance/SparkLanceReaderBase.scala
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedFileFormat.scala
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestLanceDataSource.scala
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestVectorDataSource.scala

coderabbitai · 2026-04-18T01:27:12Z

+  private static StructType enrichSparkSchemaForLanceVectors(StructType sparkSchema) {
+    Map<Integer, HoodieSchema.Vector> vectorColumns =
+        VectorConversionUtils.detectVectorColumnsFromMetadata(sparkSchema);
+    if (vectorColumns.isEmpty()) {
+      return sparkSchema;
+    }
+    StructField[] fields = sparkSchema.fields();
+    StructField[] newFields = new StructField[fields.length];
+    for (int i = 0; i < fields.length; i++) {
+      StructField field = fields[i];
+      HoodieSchema.Vector vec = vectorColumns.get(i);
+      if (vec == null) {
+        newFields[i] = field;
+        continue;
+      }
+      HoodieSchema.Vector.VectorElementType elemType = vec.getVectorElementType();
+      if (elemType != HoodieSchema.Vector.VectorElementType.FLOAT
+          && elemType != HoodieSchema.Vector.VectorElementType.DOUBLE) {
+        throw new HoodieNotSupportedException(
+            "Lance base-file format currently supports FLOAT/DOUBLE VECTOR columns only; "
+                + "got element type " + elemType + " for field '" + field.name() + "'");
+      }
+      Metadata enriched = new MetadataBuilder()
+          .withMetadata(field.metadata())
+          .putLong(LanceArrowUtils.ARROW_FIXED_SIZE_LIST_SIZE_KEY(), vec.getDimension())
+          .build();
+      newFields[i] = new StructField(field.name(), field.dataType(), field.nullable(), enriched);
+    }
+    return new StructType(newFields);
+  }


⚠️ Potential issue | 🟠 Major

Validate the Spark field shape against the VECTOR descriptor before enriching it.

Right now any field carrying hudi_type=VECTOR(...) is accepted here as long as the descriptor says FLOAT/DOUBLE. If the actual Spark type is not ArrayType of the matching primitive element type, Lance will serialize one physical type while Hudi metadata advertises another. Please fail fast on mismatches before calling LanceArrowUtils.toArrowSchema(...).

Suggested guard

+ import org.apache.spark.sql.types.{ArrayType, DoubleType, FloatType} + for (int i = 0; i < fields.length; i++) { StructField field = fields[i]; HoodieSchema.Vector vec = vectorColumns.get(i); if (vec == null) { newFields[i] = field; @@ if (elemType != HoodieSchema.Vector.VectorElementType.FLOAT && elemType != HoodieSchema.Vector.VectorElementType.DOUBLE) { throw new HoodieNotSupportedException( "Lance base-file format currently supports FLOAT/DOUBLE VECTOR columns only; " + "got element type " + elemType + " for field '" + field.name() + "'"); } + if (!(field.dataType() instanceof org.apache.spark.sql.types.ArrayType)) { + throw new HoodieNotSupportedException( + "VECTOR metadata requires ArrayType for field '" + field.name() + "', got " + field.dataType()); + } + org.apache.spark.sql.types.ArrayType arrayType = + (org.apache.spark.sql.types.ArrayType) field.dataType(); + boolean matches = (elemType == HoodieSchema.Vector.VectorElementType.FLOAT && arrayType.elementType().sameType(org.apache.spark.sql.types.DataTypes.FloatType)) + || (elemType == HoodieSchema.Vector.VectorElementType.DOUBLE && arrayType.elementType().sameType(org.apache.spark.sql.types.DataTypes.DoubleType)); + if (!matches) { + throw new HoodieNotSupportedException( + "VECTOR metadata for field '" + field.name() + "' does not match Spark type " + field.dataType()); + } Metadata enriched = new MetadataBuilder() .withMetadata(field.metadata()) .putLong(LanceArrowUtils.ARROW_FIXED_SIZE_LIST_SIZE_KEY(), vec.getDimension()) .build();

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/HoodieSparkLanceWriter.java` around lines 154 - 183, In enrichSparkSchemaForLanceVectors, validate that any StructField with a detected HoodieSchema.Vector (from VectorConversionUtils.detectVectorColumnsFromMetadata) has a matching Spark type shape before enriching: assert the field.dataType() is an ArrayType whose elementType is FloatType when vec.getVectorElementType()==FLOAT or DoubleType when DOUBLE, and optionally verify the fixed length equals vec.getDimension(); if any check fails throw a HoodieNotSupportedException describing the field name, expected Spark element type/shape and actual type so we fail fast (do this check for each vec in HoodieSparkLanceWriter.enrichSparkSchemaForLanceVectors before building the new Metadata and before later calls such as LanceArrowUtils.toArrowSchema).

coderabbitai · 2026-04-18T01:27:12Z

+    StructField[] fields = schema.fields();
+    Map<Integer, HoodieSchema.Vector> detected = detectVectorColumnsFromMetadata(schema);
+    java.util.LinkedHashMap<String, HoodieSchema.Vector> named = new java.util.LinkedHashMap<>();
+    for (Map.Entry<Integer, HoodieSchema.Vector> entry : detected.entrySet()) {
+      named.put(fields[entry.getKey()].name(), entry.getValue());
+    }
+    return HoodieSchema.serializeVectorColumnsMetadata(named);


⚠️ Potential issue | 🟡 Minor

Preserve schema order when serializing vector footer metadata.

detected is a HashMap, so iterating entrySet() makes hoodie.vector.columns order unstable across runs. Since this value is meant to be canonical, build named by walking fields in ordinal order instead.

Suggested fix

StructField[] fields = schema.fields(); Map<Integer, HoodieSchema.Vector> detected = detectVectorColumnsFromMetadata(schema); java.util.LinkedHashMap<String, HoodieSchema.Vector> named = new java.util.LinkedHashMap<>(); - for (Map.Entry<Integer, HoodieSchema.Vector> entry : detected.entrySet()) { - named.put(fields[entry.getKey()].name(), entry.getValue()); + for (int i = 0; i < fields.length; i++) { + HoodieSchema.Vector vector = detected.get(i); + if (vector != null) { + named.put(fields[i].name(), vector); + } } return HoodieSchema.serializeVectorColumnsMetadata(named);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/VectorConversionUtils.java` around lines 96 - 102, The current serialization in VectorConversionUtils builds "named" by iterating the unordered Map returned from detectVectorColumnsFromMetadata, causing unstable hoodle.vector.columns order; instead, iterate over the schema's fields[] in ordinal order (for i = 0..fields.length-1), check detected.get(i) for a non-null HoodieSchema.Vector and if present put(fields[i].name(), value) into the LinkedHashMap, then call HoodieSchema.serializeVectorColumnsMetadata(named) so the metadata preserves schema order.

coderabbitai · 2026-04-18T01:27:12Z

+   * so that any reader can identify vector columns without needing the Hudi schema store.
   */
-  public static final String PARQUET_VECTOR_COLUMNS_METADATA_KEY = "hoodie.vector.columns";
+  public static final String VECTOR_COLUMNS_METADATA_KEY = "hoodie.vector.columns";


⚠️ Potential issue | 🟠 Major

Keep a deprecated alias for the old metadata-key constant.

Replacing a public constant here will break downstream code that still compiles against HoodieSchema.PARQUET_VECTOR_COLUMNS_METADATA_KEY. Please retain the old field as a deprecated alias to VECTOR_COLUMNS_METADATA_KEY for at least one release.

Suggested compatibility shim

public static final String VECTOR_COLUMNS_METADATA_KEY = "hoodie.vector.columns"; + + /** + * `@deprecated` use {`@link` `#VECTOR_COLUMNS_METADATA_KEY`} + */ + `@Deprecated` + public static final String PARQUET_VECTOR_COLUMNS_METADATA_KEY = VECTOR_COLUMNS_METADATA_KEY;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

public static final String VECTOR_COLUMNS_METADATA_KEY = "hoodie.vector.columns";

public static final String VECTOR_COLUMNS_METADATA_KEY = "hoodie.vector.columns";

/**

* `@deprecated` use {`@link` `#VECTOR_COLUMNS_METADATA_KEY`}

*/

`@Deprecated`

public static final String PARQUET_VECTOR_COLUMNS_METADATA_KEY = VECTOR_COLUMNS_METADATA_KEY;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java` at line 219, Add a deprecated alias constant in HoodieSchema for backward compatibility: retain PARQUET_VECTOR_COLUMNS_METADATA_KEY as a public static final String that delegates to VECTOR_COLUMNS_METADATA_KEY and mark it `@Deprecated` with a javadoc pointing to VECTOR_COLUMNS_METADATA_KEY; update any usages in the class to reference VECTOR_COLUMNS_METADATA_KEY only and keep the alias for at least one release so downstream code compiling against PARQUET_VECTOR_COLUMNS_METADATA_KEY continues to work.

rahil-c and others added 9 commits April 13, 2026 09:03

address comments

ff5beb3

intial self review

66276af

fix comment to be concise

e703935

greptile-apps Bot reviewed Apr 18, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 18, 2026

View reviewed changes

yihua mentioned this pull request Apr 18, 2026

feat(lance): round-trip Hudi VECTOR columns as native Lance fixed-size lists apache/hudi#18497

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OSS PR #18497] [WIP] feat(lance): round-trip Hudi VECTOR columns as native Lance fixed-size lists#49

[OSS PR #18497] [WIP] feat(lance): round-trip Hudi VECTOR columns as native Lance fixed-size lists#49
yihua wants to merge 9 commits into
masterfrom
oss-18497

yihua commented Apr 18, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

greptile-apps Bot left a comment

Uh oh!

coderabbitai Bot commented Apr 18, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 18, 2026

Uh oh!

coderabbitai Bot Apr 18, 2026

Uh oh!

coderabbitai Bot Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yihua commented Apr 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yihua commented Apr 18, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 18, 2026 •

edited

Loading