diff --git a/CHANGELOG.md b/CHANGELOG.md index 5aa8b734..2875c879 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -19,6 +19,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Fixed +- `vortex.zstd` segments compressed with a shared (trained) dictionary now decode, via the native `libzstd` dictionary support, instead of being rejected. The upstream `zstd.vortex` compatibility fixture is read end-to-end and matches the Rust reference. ([#104](https://github.com/dfa1/vortex-java/issues/104)) - Writing a nullable `Utf8`/`Binary` column no longer throws `NullPointerException` (or silently drops nulls): nullable string columns now carry their validity like nullable primitives and round-trip through `vortex.masked`. As a result they decode as `MaskedArray` (validity + values child) rather than a bare `VarBinArray`. ([#168](https://github.com/dfa1/vortex-java/pull/168)) - CSV export now handles nullable columns (`MaskedArray`): null rows export as an empty field instead of failing with "unsupported array type for CSV export". ([#168](https://github.com/dfa1/vortex-java/pull/168)) - Zone-map pruning now compares filter values in the *column's* type domain rather than by the boxed value's type. A predicate whose value is boxed at a different width (e.g. `Integer` on an `I64` column) — or any value on a `U64` column — previously pruned nothing and silently degraded to a full scan; it now prunes correctly (unsigned columns by unsigned order). As part of this, a filter value genuinely incomparable to its column (e.g. a `String` against a numeric column) now raises `VortexException` during the scan instead of silently disabling pruning — a behaviour change for callers that relied on the previous silent full scan. ([#159](https://github.com/dfa1/vortex-java/issues/159)) diff --git a/integration/src/test/java/io/github/dfa1/vortex/integration/RustJavaReaderComparisonIntegrationTest.java b/integration/src/test/java/io/github/dfa1/vortex/integration/RustJavaReaderComparisonIntegrationTest.java index 59f79dae..c8194170 100644 --- a/integration/src/test/java/io/github/dfa1/vortex/integration/RustJavaReaderComparisonIntegrationTest.java +++ b/integration/src/test/java/io/github/dfa1/vortex/integration/RustJavaReaderComparisonIntegrationTest.java @@ -356,7 +356,10 @@ private static Double numericSum(Array arr) { // ── Java side ───────────────────────────────────────────────────────────── private static Long stringByteLength(Array arr) { - if (!(arr instanceof VarBinArray v)) { + // Nullable Utf8 columns decode as a MaskedArray over a VarBin values child (null rows + // contribute zero-length entries); unwrap to count the same bytes Rust reports. + Array values = arr instanceof MaskedArray m ? m.inner() : arr; + if (!(values instanceof VarBinArray v)) { return null; } if (!(v.dtype() instanceof DType.Utf8)) { @@ -394,7 +397,7 @@ private static Long stringByteLength(Array arr) { "varbin.vortex", "varbinview.vortex", "zigzag.vortex", - // zstd.vortex excluded: ZstdEncoding dictionary mode not yet implemented + "zstd.vortex", }) void rust_vs_javaReader_statsMatch(String fixture, @TempDir Path tmp) throws Exception { // Given