Skip to content

feat(writer): emit vortex.stats zone-maps (ADR 0017)#121

Merged
dfa1 merged 1 commit into
mainfrom
feat/writer-zone-maps
Jun 21, 2026
Merged

feat(writer): emit vortex.stats zone-maps (ADR 0017)#121
dfa1 merged 1 commit into
mainfrom
feat/writer-zone-maps

Conversation

@dfa1

@dfa1 dfa1 commented Jun 21, 2026

Copy link
Copy Markdown
Owner

What

Honours WriteOptions.enableZoneMaps (previously a no-op flag): the writer now emits vortex.stats (zoned) layouts for fixed-width primitive columns — per-zone (per-chunk) MAX/MIN — matching the wire format the reader and inspector already decode for Rust-written files.

Surfaced while raising VortexInspectorTui coverage: the per-chunk-stats pane looked dead, but it's live for Rust files (TPCH/clickbench use vortex.zoned heavily) — the gap was the missing writer capability.

How

  • Per-chunk min/max come from EncodeResult (already computed); captured into ChunkRef.
  • flushZoneMaps() (before the footer) builds a per-zone stats-table struct array ([max, max_is_truncated, min, min_is_truncated], nullable min/max) and encodes it as the zoned layout's second child; metadata = u32 zone_len + Rust Stat bitset (MAX|MIN).
  • StructEncodingEncoder now routes nullable fields through the masked encoder (prerequisite).
  • New vortex.stats layout spec; ineligible columns (varbin/bool/dict/extension/F16) emit unchanged.

Wire format read from the Rust reference vortex-layout/src/layouts/zoned/{mod,schema}.rs; rationale captured in the code comments (no ADR — fixed external format, no architecture decision to record).

Verification

  • New WriterZoneMapTest: Java write → read round-trip decodes per-zone min/max via the same path the inspector uses for Rust files; flag on/off honoured; data round-trips.
  • Existing ZoneMapPruningTest + 1573 writer/reader and 156 inspector/cli tests stay green.

Follow-ups (not in this PR)

  • JavaWritesRustReads interop test (ground truth that Rust reads Java-written vortex.stats) — needs the JNI/Rust env.
  • Re-enable the inspector per-chunk-stats coverage assertion now that Java files carry zoned layouts.
  • Broaden stats beyond MIN/MAX (sum, null_count, …) if useful.

🤖 Generated with Claude Code

Honour WriteOptions.enableZoneMaps (previously a no-op): for every
fixed-width primitive column whose chunks all carry min/max stats, wrap
the data layout in a vortex.stats layout with a per-zone (per-chunk)
stats-table child holding MAX/MIN. Metadata is u32 zone_len + the Rust
Stat bitset, matching what the reader/inspector already decode for
Rust-written files.

Per-chunk min/max come from EncodeResult; the stats table is a struct
array (nullable max/min + *_is_truncated flags) encoded via the struct
encoder, which now routes nullable fields through the masked encoder.

Java write -> read round-trip verifies per-zone min/max; existing
zone-map pruning and 1573 writer/reader tests stay green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@dfa1 dfa1 force-pushed the feat/writer-zone-maps branch from c237a1d to 9ab2176 Compare June 21, 2026 12:39
@dfa1 dfa1 merged commit 838dba8 into main Jun 21, 2026
6 checks passed
@dfa1 dfa1 deleted the feat/writer-zone-maps branch June 21, 2026 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant