Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,39 @@ All notable changes to this project are documented here. Format loosely follows
[Keep a Changelog](https://keepachangelog.com/); versions are released as `v*`
git tags, which trigger publication to Maven Central.

## [0.5]

### Added
- `ZstdCompressCtx.reset(ZstdResetDirective)` / `ZstdDecompressCtx.reset(...)` —
recycle a context's native state between frames without freeing and recreating
it. `SESSION_ONLY` keeps the level, parameters, and dictionary; `PARAMETERS` /
`SESSION_AND_PARAMETERS` restore the defaults. Binds `ZSTD_CCtx_reset` /
`ZSTD_DCtx_reset`.
- `ZstdCompressCtx.loadDictionary(...)` / `ZstdDecompressCtx.loadDictionary(...)`
(a `ZstdDictionary` or a native `MemorySegment`) and `refDictionary(...)` (a
pre-digested `ZstdCompressDict` / `ZstdDecompressDict`, attached by reference,
no copy). A sticky dictionary on the context lets compression combine a
dictionary with the advanced parameters (checksum, window log, long-distance
matching) — impossible through the per-call `compress(src, dict)` overloads,
which route the legacy dictionary path. A parameter `reset(...)` clears it.
Binds `ZSTD_CCtx_loadDictionary` / `ZSTD_DCtx_loadDictionary` (now on contexts,
not just streams), `ZSTD_CCtx_refCDict`, `ZSTD_DCtx_refDDict`.

### Changed
- `NativeLibrary.classifier()` now throws a clear `UnsatisfiedLinkError` naming
the unsupported CPU arch instead of silently mapping it to x86_64 (which
deferred failure to a cryptic `dlopen` error). Added an explicit `amd64`
branch so Linux JVMs (which report `os.arch=amd64`) still resolve x86_64.
([ea1ac84](https://github.com/dfa1/zstd-java/commit/ea1ac84))

### Fixed
- Native JARs are much smaller. The ELF shared library is now stripped at link
time (`-s`), dropping debug info (`libzstd.so` 4.0M -> ~650K), and the
multi-MB `.pdb` debug database and `.lib` import library that lld emits next
to the Windows `.dll` are no longer bundled (neither is needed at runtime).
Net: linux-x86_64 native jar 1.2M -> 285K, windows-x86_64 1.2M -> 372K.
([ea1ac84](https://github.com/dfa1/zstd-java/commit/ea1ac84))

## [0.4]

### Added
Expand Down
58 changes: 58 additions & 0 deletions docs/how-to.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,64 @@ try (ZstdCompressCtx cctx = new ZstdCompressCtx().level(19);
Pick the level explicitly with `Zstd.maxCompressionLevel()` /
`minCompressionLevel()` when you need the extreme ends.

## Reset a context to recycle it

A context is already reusable across whole `compress` / `decompress` calls. Reset
goes further: it recycles the *native state* of one context — for pooled contexts,
or to abort a half-written frame and start clean — without freeing and recreating
it. Pick what to clear with `ZstdResetDirective`:

```java
try (ZstdCompressCtx cctx = new ZstdCompressCtx().level(19)) {
byte[] a = cctx.compress(first);

// Cheap: drop any unflushed frame state, keep the level and parameters.
cctx.reset(ZstdResetDirective.SESSION_ONLY);
byte[] b = cctx.compress(second);

// Full wipe: parameters back to default, dictionary cleared, level reset to
// Zstd.defaultCompressionLevel(). Only valid between frames, not mid-frame.
cctx.reset(ZstdResetDirective.SESSION_AND_PARAMETERS);
}
```

`ZstdDecompressCtx.reset(...)` works the same way. Reuse alone amortises
allocation; reset lets a long-lived or pooled context return to a known state
without churning native memory.

## Compress with a dictionary *and* advanced parameters

The per-call `compress(src, dict)` overloads take the legacy dictionary path,
which ignores the advanced parameters (checksum, window log, long-distance
matching) set on the context. To combine the two, make the dictionary *sticky*
with `loadDictionary` — then the normal `compress` path honours both:

```java
try (ZstdCompressCtx cctx = new ZstdCompressCtx().level(19).checksum(true)) {
cctx.loadDictionary(dict); // ZstdDictionary, or a native MemorySegment
byte[] frame = cctx.compress(record); // dictionary + checksum, together
}
```

For a dictionary reused across a pool of contexts, digest it once and attach it
by reference — no per-call digesting, no copy. It pairs with `reset` for a
pooled, recycled context:

```java
try (ZstdCompressDict cdict = new ZstdCompressDict(dict, 19)) {
// one cctx per pooled worker, all sharing the one digested dictionary
try (ZstdCompressCtx cctx = new ZstdCompressCtx()) {
cctx.refDictionary(cdict); // borrowed; cdict must outlive cctx
byte[] a = cctx.compress(first);
cctx.reset(ZstdResetDirective.SESSION_ONLY); // recycle, keep the dictionary
byte[] b = cctx.compress(second);
}
}
```

A loaded or referenced dictionary stays until replaced, cleared with `null`, or
dropped by a parameter `reset`. `ZstdDecompressCtx` mirrors all of this.

## Compress many small payloads with a dictionary

For many small, similar payloads (log lines, JSON records, protobufs), a
Expand Down
18 changes: 10 additions & 8 deletions docs/supported.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ rather than the deprecated `ZSTD_getDecompressedSize`.
| Dictionary training (ZDICT) | 8 / 12 | trainFromBuffer, cover/fastCover optimizers, finalizeDictionary, getDictHeaderSize |
| Streaming — compress | 3 / 22 | `ZstdOutputStream` (compressStream2 + buffer sizes) |
| Streaming — decompress | 3 / 15 | `ZstdInputStream` (decompressStream + buffer sizes) |
| Advanced parameters | 8 / 38 | all `ZSTD_cParameter` + `ZSTD_dParameter` via `ZstdCompressParameter`/`ZstdDecompressParameter`; `compress2`, `C/DCtx_setParameter`, `loadDictionary`, `c/dParam_getBounds`; MT inert on single-thread build |
| Advanced parameters | 12 / 38 | all `ZSTD_cParameter` + `ZSTD_dParameter` via `ZstdCompressParameter`/`ZstdDecompressParameter`; `compress2`, `C/DCtx_setParameter`, `C/DCtx_reset`, `C/DCtx_loadDictionary`, `CCtx_refCDict`/`DCtx_refDDict`, `c/dParam_getBounds`; MT inert on single-thread build |
| Frame inspection | 10 / 13 | `ZstdFrame` + getFrameProgression; `_advanced` not bound |
| Memory sizing | 8 / 14 | sizeof_C/DCtx, sizeof_C/DDict, estimate C/DCtx + C/DDict size |
| Low-level block | 0 / 12 | expert block/continue API not bound |
Expand Down Expand Up @@ -63,10 +63,12 @@ rather than the deprecated `ZSTD_getDecompressedSize`.
| `ZSTD_compress2`, `ZSTD_CCtx_setParameter` | `ZstdCompressCtx.parameter` / `checksum` / `longDistanceMatching` / `windowLog` (all of `ZstdCompressParameter`) |
| `ZSTD_DCtx_setParameter` | `ZstdDecompressCtx.parameter` / `windowLogMax` (`ZstdDecompressParameter`) |
| `ZSTD_CCtx_setPledgedSrcSize` | `ZstdOutputStream.withPledgedSize` |
| `ZSTD_CCtx_reset`, `ZSTD_DCtx_reset` | `ZstdCompressCtx.reset` / `ZstdDecompressCtx.reset` (`ZstdResetDirective`) |
| `ZSTD_getDictID_fromCDict`, `ZSTD_getDictID_fromDDict` | `ZstdCompressDict.id()` / `ZstdDecompressDict.id()` |
| `ZSTD_getErrorString` | `ZstdErrorCode.description()` |
| `ZSTD_cParam_getBounds`, `ZSTD_dParam_getBounds` | `ZstdCompressParameter.bounds()` / `ZstdDecompressParameter.bounds()` (`ZstdBounds`) |
| `ZSTD_CCtx_loadDictionary`, `ZSTD_DCtx_loadDictionary` | `ZstdOutputStream` / `ZstdInputStream` dictionary constructors |
| `ZSTD_CCtx_loadDictionary`, `ZSTD_DCtx_loadDictionary` | `ZstdCompressCtx.loadDictionary` / `ZstdDecompressCtx.loadDictionary`; `ZstdOutputStream` / `ZstdInputStream` dictionary constructors |
| `ZSTD_CCtx_refCDict`, `ZSTD_DCtx_refDDict` | `ZstdCompressCtx.refDictionary` / `ZstdDecompressCtx.refDictionary` |
| `ZSTD_isFrame`, `ZSTD_findFrameCompressedSize`, `ZSTD_decompressBound`, `ZSTD_getDictID_fromFrame`, `ZSTD_getFrameHeader`, `ZSTD_isSkippableFrame`, `ZSTD_writeSkippableFrame`, `ZSTD_readSkippableFrame` | `ZstdFrame` (+ `ZstdFrameHeader`, `ZstdFrameType`, `ZstdSkippableContent`) |
| `ZSTD_getErrorCode` | `ZstdException.code()` (+ `ZstdErrorCode`) |
| `ZSTD_getFrameProgression` | `ZstdCompressStream.progress()` (`ZstdFrameProgression`) |
Expand All @@ -90,7 +92,7 @@ zstd-jni's JNI sources (v1.5.7-11, `src/main/native/*.c`). The latter is
symbol-exact, not functional equivalence: zstd-jni may expose an operation through
a different symbol than this library — e.g. it routes one-shot compression through
`ZSTD_compress2`, so `ZSTD_compress` reads `—` for it even though `Zstd.compress`
works. zstd-jni references 53 of these symbols; this library binds 55. They
works. zstd-jni references 53 of these symbols; this library binds 59. They
overlap on the modern context/streaming API and diverge mainly on zstd-jni's
sequence-producer hooks vs this library's frame-inspection and typed-error surface.

Expand Down Expand Up @@ -231,7 +233,7 @@ sequence-producer hooks vs this library's frame-inspection and typed-error surfa
| `ZSTD_resetDStream` | — ᵈ | — |
| `ZSTD_sizeof_DStream` | — | — |

### Advanced parameters (8/38)
### Advanced parameters (12/38)

| Symbol | Bound | zstd-jni |
|---|:---:|:---:|
Expand All @@ -245,11 +247,11 @@ sequence-producer hooks vs this library's frame-inspection and typed-error surfa
| `ZSTD_CCtx_loadDictionary` | ✅ | ✅ |
| `ZSTD_CCtx_loadDictionary_advanced` | — | — |
| `ZSTD_CCtx_loadDictionary_byReference` | — | — |
| `ZSTD_CCtx_refCDict` | | ✅ |
| `ZSTD_CCtx_refCDict` | | ✅ |
| `ZSTD_CCtx_refPrefix` | — | — |
| `ZSTD_CCtx_refPrefix_advanced` | — | — |
| `ZSTD_CCtx_refThreadPool` | — | — |
| `ZSTD_CCtx_reset` | | ✅ |
| `ZSTD_CCtx_reset` | | ✅ |
| `ZSTD_CCtx_setCParams` | — | — |
| `ZSTD_CCtx_setFParams` | — | — |
| `ZSTD_CCtx_setParameter` | ✅ | ✅ |
Expand All @@ -260,10 +262,10 @@ sequence-producer hooks vs this library's frame-inspection and typed-error surfa
| `ZSTD_DCtx_loadDictionary` | ✅ | ✅ |
| `ZSTD_DCtx_loadDictionary_advanced` | — | — |
| `ZSTD_DCtx_loadDictionary_byReference` | — | — |
| `ZSTD_DCtx_refDDict` | | ✅ |
| `ZSTD_DCtx_refDDict` | | ✅ |
| `ZSTD_DCtx_refPrefix` | — | — |
| `ZSTD_DCtx_refPrefix_advanced` | — | — |
| `ZSTD_DCtx_reset` | | ✅ |
| `ZSTD_DCtx_reset` | | ✅ |
| `ZSTD_DCtx_setFormat` | — ᵈ | — |
| `ZSTD_DCtx_setMaxWindowSize` | — | — |
| `ZSTD_DCtx_setParameter` | ✅ | ✅ |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import com.github.luben.zstd.ZstdDictDecompress;
import io.github.dfa1.zstd.Zstd;
import io.github.dfa1.zstd.ZstdCompressCtx;
import io.github.dfa1.zstd.ZstdCompressDict;
import io.github.dfa1.zstd.ZstdDecompressCtx;
import io.github.dfa1.zstd.ZstdDictionary;
import io.github.dfa1.zstd.ZstdInputStream;
Expand Down Expand Up @@ -124,6 +125,39 @@ void jniDictCompressJavaDictDecompress() {
assertThat(restored).isEqualTo(record);
}

@Test
void javaLoadedDictWithChecksumJniDictDecompress() {
// A sticky loaded dictionary combined with an advanced parameter
// (checksum) — the COMPRESS2 path — must still produce a frame zstd-jni
// decodes against the same dictionary.
ZstdDictionary dict = trainDict();
byte[] record = record(33);

byte[] frame;
try (ZstdCompressCtx ctx = new ZstdCompressCtx().checksum(true)) {
ctx.loadDictionary(dict);
frame = ctx.compress(record);
}
ZstdDictDecompress jniDict = new ZstdDictDecompress(dict.toByteArray());
assertThat(com.github.luben.zstd.Zstd.decompress(frame, jniDict, record.length)).isEqualTo(record);
}

@Test
void javaReferencedDigestedDictJniDictDecompress() {
// A frame from a context referencing a digested CDict must decode in zstd-jni.
ZstdDictionary dict = trainDict();
byte[] record = record(44);

byte[] frame;
try (ZstdCompressDict cdict = new ZstdCompressDict(dict, Zstd.defaultCompressionLevel());
ZstdCompressCtx ctx = new ZstdCompressCtx()) {
ctx.refDictionary(cdict);
frame = ctx.compress(record);
}
ZstdDictDecompress jniDict = new ZstdDictDecompress(dict.toByteArray());
assertThat(com.github.luben.zstd.Zstd.decompress(frame, jniDict, record.length)).isEqualTo(record);
}

private ZstdDictionary trainDict() {
List<byte[]> samples = new ArrayList<>();
for (int i = 0; i < 3000; i++) {
Expand Down
18 changes: 18 additions & 0 deletions zstd/src/main/java/io/github/dfa1/zstd/Bindings.java
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,11 @@ final class Bindings {
NativeLibrary.lookup("ZSTD_CCtx_setParameter",
FunctionDescriptor.of(JAVA_LONG, ADDRESS, JAVA_INT, JAVA_INT));

// size_t ZSTD_CCtx_reset(ZSTD_CCtx*, ZSTD_ResetDirective)
static final MethodHandle CCTX_RESET =
NativeLibrary.lookup("ZSTD_CCtx_reset",
FunctionDescriptor.of(JAVA_LONG, ADDRESS, JAVA_INT));

// size_t ZSTD_compress2(ZSTD_CCtx*, void* dst, size_t dstCap, const void* src, size_t srcSize)
// Uses the advanced parameters set on the context (unlike ZSTD_compressCCtx).
static final MethodHandle COMPRESS2 =
Expand All @@ -149,6 +154,11 @@ final class Bindings {
NativeLibrary.lookup("ZSTD_DCtx_setParameter",
FunctionDescriptor.of(JAVA_LONG, ADDRESS, JAVA_INT, JAVA_INT));

// size_t ZSTD_DCtx_reset(ZSTD_DCtx*, ZSTD_ResetDirective)
static final MethodHandle DCTX_RESET =
NativeLibrary.lookup("ZSTD_DCtx_reset",
FunctionDescriptor.of(JAVA_LONG, ADDRESS, JAVA_INT));

// size_t ZSTD_CCtx_setPledgedSrcSize(ZSTD_CCtx*, unsigned long long pledgedSrcSize)
static final MethodHandle CCTX_SET_PLEDGED_SRC_SIZE =
NativeLibrary.lookup("ZSTD_CCtx_setPledgedSrcSize",
Expand Down Expand Up @@ -238,6 +248,10 @@ final class Bindings {
static final MethodHandle COMPRESS_USING_CDICT =
NativeLibrary.lookup("ZSTD_compress_usingCDict",
FunctionDescriptor.of(JAVA_LONG, ADDRESS, ADDRESS, JAVA_LONG, ADDRESS, JAVA_LONG, ADDRESS));
// size_t ZSTD_CCtx_refCDict(ZSTD_CCtx*, const ZSTD_CDict*)
static final MethodHandle CCTX_REF_CDICT =
NativeLibrary.lookup("ZSTD_CCtx_refCDict",
FunctionDescriptor.of(JAVA_LONG, ADDRESS, ADDRESS));

// ZSTD_DDict* ZSTD_createDDict(const void* dict, size_t dictSize)
static final MethodHandle CREATE_DDICT =
Expand All @@ -250,6 +264,10 @@ final class Bindings {
static final MethodHandle DECOMPRESS_USING_DDICT =
NativeLibrary.lookup("ZSTD_decompress_usingDDict",
FunctionDescriptor.of(JAVA_LONG, ADDRESS, ADDRESS, JAVA_LONG, ADDRESS, JAVA_LONG, ADDRESS));
// size_t ZSTD_DCtx_refDDict(ZSTD_DCtx*, const ZSTD_DDict*)
static final MethodHandle DCTX_REF_DDICT =
NativeLibrary.lookup("ZSTD_DCtx_refDDict",
FunctionDescriptor.of(JAVA_LONG, ADDRESS, ADDRESS));

// --- dictionary training (ZDICT, from dictBuilder) ---

Expand Down
7 changes: 7 additions & 0 deletions zstd/src/main/java/io/github/dfa1/zstd/NativeCall.java
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,13 @@ private static String errorName(long code) {
}
}

/// Whether `seg` denotes "no segment": either a Java `null` reference or the
/// [MemorySegment#NULL] zero-address sentinel. Both map to a null pointer in C,
/// which the dictionary entry points read as "clear".
static boolean isNull(MemorySegment seg) {
return seg == null || MemorySegment.NULL.equals(seg);
}

/// Guards a zero-copy entry point: the segment handed to zstd must be backed
/// by native (off-heap) memory, since its address is dereferenced in C. Fails
/// fast with a clear message instead of the FFM linker's cryptic error.
Expand Down
Loading
Loading