zstd-java exposes two shapes of API:
byte[]— convenient, for callers whose data is already on the heap.MemorySegment— zero-copy, for callers whose data is already off-heap.
This note explains why the segment shape exists and when it pays off.
FFM downcalls need a stable native pointer. A heap byte[] can be relocated by
the GC, so the FFM runtime copies it into native memory for the duration of the
call — and copies the result back. Two copies per call.
A native MemorySegment already is a native address. You hand
ZSTD_compress / ZSTD_decompress the pointer directly. Zero copies.
byte[] path: heap byte[] ──copy──▶ native scratch ──ZSTD──▶ native scratch ──copy──▶ heap byte[]
segment path: native src ───────────────────────────ZSTD──▶ native dst (no copy)
Zero-copy only helps if the data is already native on both ends. The canonical case is a memory-mapped reader (e.g. Vortex):
- Compressed input — the reader
mmaps the file into oneMemorySegment; the zstd frame is already a zero-copy slice of it. Abyte[]API forcesframe.toArray()→new byte[]just to make the call. The segment API passes the mmap slice straight toZSTD_decompress. - Decompressed output — allocate the output in your arena
(
arena.allocate(n)) and letZSTD_decompresswrite directly into it. That segment becomes the materialized backing buffer as-is — no tempbyte[], noMemorySegment.copy.
The decode path collapses from mmap → byte[] → byte[] → arena (three copies) to mmap-slice → arena (zero copies).
- Zero GC — off-heap, no allocation churn in a scan hot loop.
- No 2 GiB cap —
byte[]maxes atInteger.MAX_VALUE; segments arelong-indexed. - Lifetime safety — bounds-checked, tied to a confined
Arena; the same ownership model as the rest of an FFM reader, cleaner than raw pointers. - Typed reads — read
JAVA_LONG/JAVA_DOUBLEstraight off the decompressed segment with no re-wrap.
If the caller hands you a heap byte[] (the aircompressor fallback path, or
external input), wrapping it with MemorySegment.ofArray(...) still triggers the
copy for the downcall — no free lunch. So the API is segment-first for the
zero-copy fast path, with a thin byte[] overload for the rare heap caller.
| Operation | byte[] (convenience) | MemorySegment (zero-copy) |
|---|---|---|
| compress | ZstdCompressCtx.compress(byte[]) |
ZstdCompressCtx.compress(dst, src) |
| compress + dict | ZstdCompressCtx.compress(byte[], ZstdCompressDict) |
ZstdCompressCtx.compress(dst, src, ZstdCompressDict) |
| decompress | ZstdDecompressCtx.decompress(byte[], int) |
ZstdDecompressCtx.decompress(dst, src) |
| decompress + dict | ZstdDecompressCtx.decompress(byte[], int, ZstdDecompressDict) |
ZstdDecompressCtx.decompress(dst, src, ZstdDecompressDict) |
| size output (no copy) | frame header via Zstd.decompress(byte[]) |
Zstd.decompressedSize(MemorySegment) |
The explicit-dst methods return the number of bytes written. Size dst with
Zstd.compressBound(srcSize) for compression, or Zstd.decompressedSize(frame)
for decompression.
If you don't want to size the destination yourself, pass an Arena and the codec
sizes, allocates, and writes the output for you — still zero-copy, since the
output is allocated in your arena and zstd writes into it directly. The
returned segment is owned by that arena.
MemorySegment frame = cctx.compress(arena, src); // bound-sized, trimmed to frame length
MemorySegment decoded = dctx.decompress(arena, frame); // header-sized, exact length| Operation | explicit dst (you size) | arena (codec sizes) |
|---|---|---|
| compress | compress(dst, src) → bytes written |
compress(arena, src) → frame segment |
| decompress | decompress(dst, src) → bytes written |
decompress(arena, frame) → output segment |
The arena form of decompress requires the frame to store its decompressed size
(one-shot compress always stamps it; a streamed frame only does so when you
pledge the size up front — see Pledged size).
For size-less frames, size dst yourself.
Much of the Java ecosystem speaks ByteBuffer, not MemorySegment — NIO
channels, Netty, and FileChannel.map's MappedByteBuffer. We deliberately do
not add a third set of ByteBuffer overloads: the segment API already
bridges both directions of the FFM↔NIO boundary at zero copy, because FFM defines
the conversions.
ByteBufferin — wrap a direct buffer as a segment withMemorySegment.ofBuffer(buf)(zero copy; a heap-backed buffer copies, the same caveat asbyte[]). Hand the segment tocompress/decompress.MemorySegmentout toByteBuffer—segment.asByteBuffer()returns a buffer view over the native bytes, no copy. The decompressed arena segment is consumable by an existingByteBufferpipeline as-is.
// an mmap'd frame is already a direct ByteBuffer (FileChannel.map)
MemorySegment frame = MemorySegment.ofBuffer(mappedByteBuffer);
MemorySegment out = dctx.decompress(arena, frame); // zero-copy decode
ByteBuffer result = out.asByteBuffer(); // zero-copy hand-offByte order. asByteBuffer() on a native segment already returns a direct
buffer aliasing the same off-heap bytes — there is no copy and nothing to convert.
The one wart is byte order: it comes back BIG_ENDIAN regardless of platform, so a
caller doing multi-byte reads must restore the native order:
import java.nio.ByteOrder;
ByteBuffer result = dctx.decompress(arena, frame)
.asByteBuffer()
.order(ByteOrder.nativeOrder()); // direct buffer, native order, zero copy(For a pure byte payload the order does not matter and even that is unneeded.) The
remaining caveat is lifetime: the buffer borrows the arena's scope, so it must not
outlive the try-with-resources. A thin toByteBuffer() convenience on the
arena-returning results could fold the order(nativeOrder()) call in one place, but
it would be a one-line output adapter, not new capability — the conversion already
exists. We keep the API segment-first (no parallel ByteBuffer surface to maintain).
The one-shot segment methods above need the whole input in one segment. When data
is large or arrives incrementally but both ends are still off-heap, use the
segment stream driver — ZstdCompressStream / ZstdDecompressStream — which
drives ZSTD_compressStream2 / ZSTD_decompressStream directly over native
buffers, in bounded memory, with no heap bounce (unlike ZstdOutputStream /
ZstdInputStream, which copy through byte[] to fit java.io).
Each step compresses/decompresses as much of src as fits in dst and reports a
ZstdStreamResult (bytesConsumed, bytesProduced, remaining). Advance the
source by bytesConsumed, drain bytesProduced from dst, and for compression
finish with ZstdEndDirective.END until isComplete():
try (ZstdCompressStream cs = new ZstdCompressStream(level)) {
long off = 0;
ZstdStreamResult r;
do {
r = cs.compress(dst, src.asSlice(off), ZstdEndDirective.END);
off += r.bytesConsumed();
sink.write(dst.asSlice(0, r.bytesProduced()));
} while (!r.isComplete());
}Both drivers take an optional ZstdDictionary. Decompression mirrors the loop,
calling decompress(dst, src) until a result isComplete() (frame fully decoded).
Streaming compression has a hidden cost the one-shot path does not: a streamed
frame does not record its decompressed size. zstd writes the content-size field
in the frame header only when the encoder knows the total up front — trivially
true for ZSTD_compress, but a streaming encoder is fed incrementally and closes
the frame without ever being told the total.
That field is exactly what the zero-copy decode path reads to size the output
arena. So a plain ZstdOutputStream frame cannot be decoded zero-copy:
byte[] frame = streamCompress(data); // no pledged size
Zstd.decompressedSize(segmentOf(frame)); // throws: "decompressed size not stored in frame"
dctx.decompress(arena, segmentOf(frame)); // same — it can't size the arenaThe consumer is forced back onto the bounded streaming decoder (allocate, decode a
chunk, grow, repeat) or a guessed maxSize — the very heap-bounce the segment API
exists to avoid.
ZstdOutputStream.withPledgedSize(out, level, total) closes the loop. Tell the
encoder the total before the first byte and it stamps the content size into the
header, so a downstream reader can size the output arena exactly and decode in one
shot:
try (var zout = ZstdOutputStream.withPledgedSize(sink, 6, data.length)) {
zout.write(data); // pledge must match the bytes written
}
byte[] frame = sink.toByteArray();
// downstream, in a memory-mapped reader:
MemorySegment src = MemorySegment.ofBuffer(mmap);
MemorySegment out = dctx.decompress(arena, src); // one allocation, zero copiesThis is the case where pledging is not a micro-optimization but a correctness
gate: it is the difference between a frame that participates in the zero-copy
decode path and one that does not. Pledge whenever the producer streams but the
total is known (file length, serialized record count, Content-Length). The pledge
must equal the bytes actually written — a mismatch raises an error on close.