Skip to content

feat(zstd): decode dictionary-compressed segments#166

Merged
dfa1 merged 3 commits into
mainfrom
feat/zstd-dictionary-decode
Jun 26, 2026
Merged

feat(zstd): decode dictionary-compressed segments#166
dfa1 merged 3 commits into
mainfrom
feat/zstd-dictionary-decode

Conversation

@dfa1

@dfa1 dfa1 commented Jun 26, 2026

Copy link
Copy Markdown
Owner

Summary

Adds dictionary-compressed vortex.zstd decode support, unblocked by the native libzstd bindings (#165). The decoder previously rejected any segment with dictionary_size != 0.

What changed

  • ZstdEncodingDecoder — removed the dictionary rejection. When dictionary_size != 0, buffer[0] is the shared dictionary and frames follow at buffer[1..] (mirrors the Rust reference layout); the dictionary is digested once per segment into a ZstdDecompressDict and each frame is decompressed against it. Frame decompression stays zero-copy (native frame → arena output slice); only the small dictionary takes a single heap copy, off the hot path.
  • Unit testdecode_withDictionary_throwsdecode_withDictionary_roundTrips (compress-with-dict → decode → assert values).
  • VortexHttpReaderIT — dropped the rejection test; added zstd.vortex to the decoded published-fixtures list.

Notes

  • The dictionary digest takes one byte[] hop because the 0.1 bindings expose no MemorySegment dictionary factory; it wouldn't help anyway (ZSTD_createDDict re-copies into its own native allocation) and it's once-per-segment, not per-frame.
  • Still open for full vortex.zstd parity (encode-side, follow-up): multi-frame encode and nullable encode (TODO.md).

Testing

  • 11/11 zstd unit tests pass (incl. the new dictionary round-trip).
  • Full reactor compiles. The zstd.vortex IT addition is network-gated (real S3 fixture), not run locally.

🤖 Generated with Claude Code

dfa1 and others added 3 commits June 26, 2026 15:38
The decoder rejected `vortex.zstd` segments carrying a shared dictionary
(`dictionary_size != 0`). With the native libzstd bindings this is now
decodable: digest the dictionary once per segment into a `ZstdDecompressDict`
and decompress each frame against it.

Buffer layout follows the Rust reference — with a dictionary, buffer[0] is the
dictionary and frames follow at buffer[1..]; without one, frames start at
buffer[0]. Frame decompression stays zero-copy (segment to arena slice); only
the small dictionary takes one heap copy, off the hot path.

Flips the unit test from asserting rejection to a dictionary round-trip, and
adds zstd.vortex to the decoded published-fixtures list.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The dictionary-compressed decode path used dictionary_size only as a
presence flag and never checked it against the actual dictionary buffer
size. Digesting a truncated or oversized dictionary would silently
produce wrong output. Fail fast on mismatch, mirroring the Rust
reference invariant (encodings/zstd/src/array.rs). Adds a unit test and
drops a stray blank line in the IT.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
zstd bindings 0.2 expose ZstdDecompressDict(MemorySegment), which hands
the dictionary buffer straight to ZSTD_createDDict with no intermediate
heap byte[]. Drop the toArray() bounce in digestDictionary; native
dictionary buffers (the mmap'd production path) now flow through
unchanged, and the unit-test heap buffers go through the existing
asNative() copy. Keeps the dictionary_size validation.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@dfa1 dfa1 merged commit 777a6fc into main Jun 26, 2026
6 checks passed
@dfa1 dfa1 deleted the feat/zstd-dictionary-decode branch June 26, 2026 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant