feat(zstd): decode dictionary-compressed segments#166
Merged
Conversation
The decoder rejected `vortex.zstd` segments carrying a shared dictionary (`dictionary_size != 0`). With the native libzstd bindings this is now decodable: digest the dictionary once per segment into a `ZstdDecompressDict` and decompress each frame against it. Buffer layout follows the Rust reference — with a dictionary, buffer[0] is the dictionary and frames follow at buffer[1..]; without one, frames start at buffer[0]. Frame decompression stays zero-copy (segment to arena slice); only the small dictionary takes one heap copy, off the hot path. Flips the unit test from asserting rejection to a dictionary round-trip, and adds zstd.vortex to the decoded published-fixtures list. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The dictionary-compressed decode path used dictionary_size only as a presence flag and never checked it against the actual dictionary buffer size. Digesting a truncated or oversized dictionary would silently produce wrong output. Fail fast on mismatch, mirroring the Rust reference invariant (encodings/zstd/src/array.rs). Adds a unit test and drops a stray blank line in the IT. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
zstd bindings 0.2 expose ZstdDecompressDict(MemorySegment), which hands the dictionary buffer straight to ZSTD_createDDict with no intermediate heap byte[]. Drop the toArray() bounce in digestDictionary; native dictionary buffers (the mmap'd production path) now flow through unchanged, and the unit-test heap buffers go through the existing asNative() copy. Keeps the dictionary_size validation. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds dictionary-compressed
vortex.zstddecode support, unblocked by the native libzstd bindings (#165). The decoder previously rejected any segment withdictionary_size != 0.What changed
ZstdEncodingDecoder— removed the dictionary rejection. Whendictionary_size != 0,buffer[0]is the shared dictionary and frames follow atbuffer[1..](mirrors the Rust reference layout); the dictionary is digested once per segment into aZstdDecompressDictand each frame is decompressed against it. Frame decompression stays zero-copy (native frame → arena output slice); only the small dictionary takes a single heap copy, off the hot path.decode_withDictionary_throws→decode_withDictionary_roundTrips(compress-with-dict → decode → assert values).VortexHttpReaderIT— dropped the rejection test; addedzstd.vortexto the decoded published-fixtures list.Notes
byte[]hop because the 0.1 bindings expose noMemorySegmentdictionary factory; it wouldn't help anyway (ZSTD_createDDictre-copies into its own native allocation) and it's once-per-segment, not per-frame.vortex.zstdparity (encode-side, follow-up): multi-frame encode and nullable encode (TODO.md).Testing
zstd.vortexIT addition is network-gated (real S3 fixture), not run locally.🤖 Generated with Claude Code