From fe611f48bc58cea8b172c7d4579034f9f836321f Mon Sep 17 00:00:00 2001 From: Davide Angelocola Date: Fri, 26 Jun 2026 14:17:09 +0200 Subject: [PATCH] =?UTF-8?q?docs:=20split=20README=20into=20Di=C3=A1taxis?= =?UTF-8?q?=20docs=20pages?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Move how-to guides, reference, and explanation out of the README into docs/how-to.md, docs/reference.md, and docs/explanation.md. README now carries the intro, doc table, quick-start, and license only. Co-Authored-By: Claude Opus 4.8 --- README.md | 180 ++------------------------------------------ docs/explanation.md | 25 ++++++ docs/how-to.md | 87 +++++++++++++++++++++ docs/reference.md | 48 ++++++++++++ 4 files changed, 168 insertions(+), 172 deletions(-) create mode 100644 docs/explanation.md create mode 100644 docs/how-to.md create mode 100644 docs/reference.md diff --git a/README.md b/README.md index 0142208..44f74a6 100644 --- a/README.md +++ b/README.md @@ -17,187 +17,23 @@ straight from your own data. ## Documentation -The docs follow the [Diátaxis](https://diataxis.fr) framework — four kinds of -documentation, each serving a different need: +The docs follow the [Diátaxis](https://diataxis.fr) framework: | | Purpose | Start here | |---|---|---| -| **[Tutorial](docs/tutorial.md)** | Learning by doing | [Getting started](docs/tutorial.md) | -| **[How-to guides](#how-to-guides)** | Solving a specific task | [Hot paths](#compress-on-a-hot-path), [Dictionaries](#compress-many-small-payloads-with-a-dictionary), [Zero-copy](#avoid-heap-copies-with-memorysegment), [Self-built lib](#run-against-a-self-built-libzstd) | -| **[Reference](#reference)** | Looking up facts | [Platforms](#supported-platforms), [API surface](#api-surface), [Symbol coverage](docs/supported.md), [Build](#build-from-source) | -| **[Explanation](#explanation)** | Understanding the why | [Why FFM + Zig](#why-ffm-and-zig), [When zero-copy pays](docs/zero-copy.md), [Benchmarks](docs/benchmarks.md) | - ---- - -## Tutorial: Getting started - -New here? **[docs/tutorial.md](docs/tutorial.md)** takes you from a clean checkout -to your first compress/decompress round-trip, step by step. - -## How-to guides - -Task-focused recipes. Each assumes you have the library on the classpath (see the -[tutorial](#tutorial-getting-started)). - -### Compress on a hot path - -Reuse a context to amortise native allocation across many calls: - -```java -try (ZstdCompressCtx cctx = new ZstdCompressCtx().level(19); - ZstdDecompressCtx dctx = new ZstdDecompressCtx()) { - byte[] packed = cctx.compress(message); - byte[] restored = dctx.decompress(packed, message.length); -} -``` - -Pick the level explicitly with `Zstd.maxCompressionLevel()` / -`minCompressionLevel()` when you need the extreme ends. - -### Compress many small payloads with a dictionary - -For many small, similar payloads (log lines, JSON records, protobufs), a -dictionary compresses each one far smaller than it could be alone. Train one on -representative samples: - -```java -ZstdDictionary dict = ZstdDictionary.train(sampleRecords, 16 * 1024); - -try (ZstdCompressCtx cctx = new ZstdCompressCtx(); - ZstdDecompressCtx dctx = new ZstdDecompressCtx()) { - byte[] packed = cctx.compress(record, dict); - byte[] restored = dctx.decompress(packed, record.length, dict); -} - -byte[] persisted = dict.toByteArray(); // store / ship the dictionary -ZstdDictionary reloaded = ZstdDictionary.of(persisted); -``` - -On a hot path, digest the dictionary once to skip per-call setup: - -```java -try (ZstdCompressDict cdict = new ZstdCompressDict(dict, 19); - ZstdDecompressDict ddict = new ZstdDecompressDict(dict); - ZstdCompressCtx cctx = new ZstdCompressCtx(); - ZstdDecompressCtx dctx = new ZstdDecompressCtx()) { - byte[] packed = cctx.compress(record, cdict); - byte[] restored = dctx.decompress(packed, record.length, ddict); -} -``` - -### Avoid heap copies with `MemorySegment` - -When your data is already off-heap — an `mmap` slice in, an arena buffer out — -use the `MemorySegment` overloads to skip the heap `byte[]` bounce entirely. FFM -hands zstd the segment address directly: no copy in, no copy out, no GC churn. - -```java -try (Arena arena = Arena.ofConfined(); - ZstdDecompressCtx dctx = new ZstdDecompressCtx()) { - MemorySegment frame = reader.mmapSlice(); // already native - long n = Zstd.decompressedSize(frame); // read header, no copy - MemorySegment out = arena.allocate(n); // becomes the backing buffer - dctx.decompress(out, frame); // native → native -} -``` - -There are matching `compress(dst, src)` / `decompress(dst, src)` overloads (plus -dictionary variants) returning the number of bytes written. For *why and when* -this pays off, see the [explanation](docs/zero-copy.md). - -### Run against a self-built libzstd - -To use a `libzstd` you built yourself instead of the bundled one, point the -loader at it: - -```bash -java -Dzstd.lib.path=/path/to/libzstd.dylib --enable-native-access=ALL-UNNAMED ... -``` - -Build any of the six targets from any host: - -```bash -./scripts/build-zstd.sh -# classifier: osx-aarch64 | osx-x86_64 | linux-x86_64 | linux-aarch64 -# | windows-x86_64 | windows-aarch64 -``` - -## Reference - -### Supported platforms - -The library — `io.github.dfa1.zstd:zstd` — ships as a pure-Java module plus one -native artifact per platform: - -| OS | aarch64 | x86_64 | -|---------|:-------:|:------:| -| macOS | ✅ | ✅ | -| Linux | ✅ | ✅ | -| Windows | ✅ | ✅ | - -### API surface - -| Type | Role | -|---|---| -| `Zstd` | one-shot `compress` / `decompress`, level + version queries, `compressBound`, `decompressedSize` | -| `ZstdCompressCtx` / `ZstdDecompressCtx` | reusable contexts; `byte[]` and `MemorySegment` overloads, dictionary variants | -| `ZstdDictionary` | train (`ZDICT`), load, persist, query dict id | -| `ZstdCompressDict` / `ZstdDecompressDict` | pre-digested dictionaries for hot paths | -| `ZstdFrame` | frame inspection: header, sizes, dict id, skippable frames | -| `ZstdException` / `ZstdErrorCode` | typed errors mapped from zstd's sentinels | - -### Symbol coverage - -Which zstd C symbols are bound (and which deprecated ones are intentionally not), -with a per-area breakdown and a comparison against zstd-jni: -[docs/supported.md](docs/supported.md). - -### Runtime requirement - -Native access requires `--enable-native-access=ALL-UNNAMED` (or your module name) -on the JVM command line. - -### Build from source - -Requires JDK 25+, Maven, and [Zig](https://ziglang.org/) on `PATH`. +| **[Tutorial](docs/tutorial.md)** | Learning by doing | Clean checkout → first round-trip | +| **[How-to guides](docs/how-to.md)** | Solving a specific task | Hot paths, dictionaries, zero-copy, self-built lib | +| **[Reference](docs/reference.md)** | Looking up facts | Platforms, API surface, symbol coverage, build | +| **[Explanation](docs/explanation.md)** | Understanding the why | Why FFM + Zig, when zero-copy pays, benchmarks | ```bash git clone --recurse-submodules https://github.com/dfa1/zstd-java.git -cd zstd-java -mvn test +cd zstd-java && mvn test ``` -`scripts/build-zstd.sh` compiles `libzstd.{dylib,so,dll}` from the -`third_party/zstd` submodule (pinned to tag `v1.5.7`) with `zig cc`, cross-compiling -any of the six targets from any host. +Requires JDK 25+ and `--enable-native-access=ALL-UNNAMED` at runtime. -### License +## License [BSD 3-Clause](LICENSE) — the same primary license as zstd, which is bundled under its BSD terms (zstd is dual BSD / GPLv2, © Meta Platforms, Inc.). - -## Explanation - -### Why FFM and Zig - -The bindings use the **Foreign Function & Memory API** rather than JNI: no -hand-written C glue, no separate native compile step for the binding layer, and a -direct path from Java to zstd's addresses — which is what makes the zero-copy -`MemorySegment` API possible. - -The native library itself is built from vendored zstd source via **`zig cc`** as -a drop-in C compiler. zstd is pure C with no build-system dependencies, so the -sources are compiled directly — no autotools, no CMake. Zig bundles clang and -libc for every target, enabling hermetic cross-compilation without a sysroot: -any host can build all six platform artifacts. - -### When zero-copy pays off - -The `MemorySegment` fast path eliminates the heap `byte[]` bounce and the -per-call allocation it implies. The reasoning, and the cases where it does and -does not matter, is in [docs/zero-copy.md](docs/zero-copy.md). - -### Benchmarks - -Throughput and allocation versus zstd-jni (JNI) and aircompressor (pure Java), -including an async-profiler breakdown: [docs/benchmarks.md](docs/benchmarks.md). diff --git a/docs/explanation.md b/docs/explanation.md new file mode 100644 index 0000000..341cdd9 --- /dev/null +++ b/docs/explanation.md @@ -0,0 +1,25 @@ +# Explanation + +## Why FFM and Zig + +The bindings use the **Foreign Function & Memory API** rather than JNI: no +hand-written C glue, no separate native compile step for the binding layer, and a +direct path from Java to zstd's addresses — which is what makes the zero-copy +`MemorySegment` API possible. + +The native library itself is built from vendored zstd source via **`zig cc`** as +a drop-in C compiler. zstd is pure C with no build-system dependencies, so the +sources are compiled directly — no autotools, no CMake. Zig bundles clang and +libc for every target, enabling hermetic cross-compilation without a sysroot: +any host can build all six platform artifacts. + +## When zero-copy pays off + +The `MemorySegment` fast path eliminates the heap `byte[]` bounce and the +per-call allocation it implies. The reasoning, and the cases where it does and +does not matter, is in [zero-copy.md](zero-copy.md). + +## Benchmarks + +Throughput and allocation versus zstd-jni (JNI) and aircompressor (pure Java), +including an async-profiler breakdown: [benchmarks.md](benchmarks.md). diff --git a/docs/how-to.md b/docs/how-to.md new file mode 100644 index 0000000..ce61ea2 --- /dev/null +++ b/docs/how-to.md @@ -0,0 +1,87 @@ +# How-to guides + +Task-focused recipes. Each assumes you have the library on the classpath (see the +[tutorial](tutorial.md)). + +## Compress on a hot path + +Reuse a context to amortise native allocation across many calls: + +```java +try (ZstdCompressCtx cctx = new ZstdCompressCtx().level(19); + ZstdDecompressCtx dctx = new ZstdDecompressCtx()) { + byte[] packed = cctx.compress(message); + byte[] restored = dctx.decompress(packed, message.length); +} +``` + +Pick the level explicitly with `Zstd.maxCompressionLevel()` / +`minCompressionLevel()` when you need the extreme ends. + +## Compress many small payloads with a dictionary + +For many small, similar payloads (log lines, JSON records, protobufs), a +dictionary compresses each one far smaller than it could be alone. Train one on +representative samples: + +```java +ZstdDictionary dict = ZstdDictionary.train(sampleRecords, 16 * 1024); + +try (ZstdCompressCtx cctx = new ZstdCompressCtx(); + ZstdDecompressCtx dctx = new ZstdDecompressCtx()) { + byte[] packed = cctx.compress(record, dict); + byte[] restored = dctx.decompress(packed, record.length, dict); +} + +byte[] persisted = dict.toByteArray(); // store / ship the dictionary +ZstdDictionary reloaded = ZstdDictionary.of(persisted); +``` + +On a hot path, digest the dictionary once to skip per-call setup: + +```java +try (ZstdCompressDict cdict = new ZstdCompressDict(dict, 19); + ZstdDecompressDict ddict = new ZstdDecompressDict(dict); + ZstdCompressCtx cctx = new ZstdCompressCtx(); + ZstdDecompressCtx dctx = new ZstdDecompressCtx()) { + byte[] packed = cctx.compress(record, cdict); + byte[] restored = dctx.decompress(packed, record.length, ddict); +} +``` + +## Avoid heap copies with `MemorySegment` + +When your data is already off-heap — an `mmap` slice in, an arena buffer out — +use the `MemorySegment` overloads to skip the heap `byte[]` bounce entirely. FFM +hands zstd the segment address directly: no copy in, no copy out, no GC churn. + +```java +try (Arena arena = Arena.ofConfined(); + ZstdDecompressCtx dctx = new ZstdDecompressCtx()) { + MemorySegment frame = reader.mmapSlice(); // already native + long n = Zstd.decompressedSize(frame); // read header, no copy + MemorySegment out = arena.allocate(n); // becomes the backing buffer + dctx.decompress(out, frame); // native → native +} +``` + +There are matching `compress(dst, src)` / `decompress(dst, src)` overloads (plus +dictionary variants) returning the number of bytes written. For *why and when* +this pays off, see the [explanation](zero-copy.md). + +## Run against a self-built libzstd + +To use a `libzstd` you built yourself instead of the bundled one, point the +loader at it: + +```bash +java -Dzstd.lib.path=/path/to/libzstd.dylib --enable-native-access=ALL-UNNAMED ... +``` + +Build any of the six targets from any host: + +```bash +./scripts/build-zstd.sh +# classifier: osx-aarch64 | osx-x86_64 | linux-x86_64 | linux-aarch64 +# | windows-x86_64 | windows-aarch64 +``` diff --git a/docs/reference.md b/docs/reference.md new file mode 100644 index 0000000..3f2ea26 --- /dev/null +++ b/docs/reference.md @@ -0,0 +1,48 @@ +# Reference + +## Supported platforms + +The library — `io.github.dfa1.zstd:zstd` — ships as a pure-Java module plus one +native artifact per platform: + +| OS | aarch64 | x86_64 | +|---------|:-------:|:------:| +| macOS | ✅ | ✅ | +| Linux | ✅ | ✅ | +| Windows | ✅ | ✅ | + +## API surface + +| Type | Role | +|---|---| +| `Zstd` | one-shot `compress` / `decompress`, level + version queries, `compressBound`, `decompressedSize` | +| `ZstdCompressCtx` / `ZstdDecompressCtx` | reusable contexts; `byte[]` and `MemorySegment` overloads, dictionary variants | +| `ZstdDictionary` | train (`ZDICT`), load, persist, query dict id | +| `ZstdCompressDict` / `ZstdDecompressDict` | pre-digested dictionaries for hot paths | +| `ZstdFrame` | frame inspection: header, sizes, dict id, skippable frames | +| `ZstdException` / `ZstdErrorCode` | typed errors mapped from zstd's sentinels | + +## Symbol coverage + +Which zstd C symbols are bound (and which deprecated ones are intentionally not), +with a per-area breakdown and a comparison against zstd-jni: +[supported.md](supported.md). + +## Runtime requirement + +Native access requires `--enable-native-access=ALL-UNNAMED` (or your module name) +on the JVM command line. + +## Build from source + +Requires JDK 25+, Maven, and [Zig](https://ziglang.org/) on `PATH`. + +```bash +git clone --recurse-submodules https://github.com/dfa1/zstd-java.git +cd zstd-java +mvn test +``` + +`scripts/build-zstd.sh` compiles `libzstd.{dylib,so,dll}` from the +`third_party/zstd` submodule (pinned to tag `v1.5.7`) with `zig cc`, cross-compiling +any of the six targets from any host.