PoC: Add fdeflate as a new backend by Shnatsel · Pull Request #545 · rust-lang/flate2-rs

Shnatsel · 2026-04-13T22:49:05Z

This is a very early proof-of-concept, just to get a sense for performance and API gaps. It is mostly vibe-coded and most definitely should not be merged.

Notable findings:

Some new APIs for private state need to be exposed from fdeflate, but nothing too crazy. This branch depends on my fork with the required APIs exposed (jankily).
The flate2 wrapper needs to maintain an internal window buffer since fdeflate uses the output buffer as a lookback window for back-references, but flate2 passes fresh output slices on each call. This introduces an extra in-memory copy in the inflate path.
Mid-stream flushing doesn't seem to be supported by fdeflate
Dictionary functionality also seems to be missing

Tests pass, minus the [0] input edge case (fails) and mid-stream flushing added in #498 (unsupported?). CI is erroneously green because I didn't add CI jobs for this backend yet.

crc32fast is also not the fastest CRC32 around (see #523) so performance could conceivably be pushed further, either via #523 or by adapting the zlib-rs implementation.

@fintelia FYI

Implement the fdeflate crate as an optional pure-Rust backend, selectable via the `fdeflate` Cargo feature. fdeflate is a fast DEFLATE implementation that uses only safe Rust code. Key implementation details: - The decompressor maintains an internal window buffer since fdeflate uses the output buffer as a lookback window for back-references, but flate2 passes fresh output slices on each call. - The compressor buffers all output internally and emits it on Finish, since fdeflate writes a single deflate block. - Accounts for fdeflate's bit-buffer over-read via the new Decompressor::unconsumed_bytes() API, ensuring accurate total_in tracking. - Mid-stream flush tests (Partial/Sync/Full) are gated behind cfg(any(feature = "any_zlib", feature = "miniz_oxide")) since fdeflate does not support mid-stream flushing. Known issue: roundtrip through write::DeflateEncoder piped into write::DeflateDecoder fails for some inputs (e.g. single byte [0]). Co-Authored-By: Claude <noreply@anthropic.com>

Instead of buffering all compressed output in a Vec<u8> and draining it later, drain the compressor's inner writer directly after each write_data() call. This avoids unbounded memory growth during compression and reduces unnecessary copying. Uses fdeflate's new get_writer_mut() API to access the inner Vec<u8> writer and drain produced bytes incrementally into the caller's output slice. Co-Authored-By: Claude <noreply@anthropic.com>

Shnatsel · 2026-04-13T23:24:48Z

I've adapted the benchmarking harness @folkertdev shared to also measure fdeflate using this PoC: https://github.com/Shnatsel/flate2_bench/tree/fdeflate

The results are shockingly good, with fdeflate outperforming even zlib-rs at decompression. It's possible that fdeflate is doing less work in this PoC, e.g. by skipping checksums; but still, this is very promising!

run.sh output with fdeflate on desktop Zen4

Ubuntu clang version 14.0.0-1ubuntu1.1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

rustc 1.97.0-nightly (14196dbfa 2026-04-12)
binary: rustc
commit-hash: 14196dbfa3eb7c30195251eac092b1b86c8a2d84
commit-date: 2026-04-12
host: x86_64-unknown-linux-gnu
release: 1.97.0-nightly
LLVM version: 22.1.2

-- inflate (chunks of 4096 bytes) --
target/release/flate2_bench_miniz_oxide
mean runtime 0.028137s (stdev of 0.000248) at 533.374 MB/s, ratio of 0.404
target/release/flate2_bench_fdeflate
mean runtime 0.018406s (stdev of 0.000335) at 815.348 MB/s, ratio of 0.404
target/release/flate2_bench_zlib_ng
mean runtime 0.024131s (stdev of 0.000032) at 621.911 MB/s, ratio of 0.404
target/release/flate2_bench_zlib_rs
mean runtime 0.020800s (stdev of 0.000088) at 721.516 MB/s, ratio of 0.404

-- deflate level 1 (chunks of 4096 bytes) --

target/release/flate2_bench_miniz_oxide
mean runtime 0.058150s (stdev of 0.002528) at 129.649 MB/s, ratio of 1.991
target/release/flate2_bench_fdeflate
mean runtime 0.040613s (stdev of 0.001354) at 237.573 MB/s, ratio of 1.555
target/release/flate2_bench_zlib_ng
mean runtime 0.044328s (stdev of 0.000423) at 183.417 MB/s, ratio of 1.846
target/release/flate2_bench_zlib_rs
mean runtime 0.047411s (stdev of 0.001216) at 171.491 MB/s, ratio of 1.846

-- deflate level 6 (chunks of 4096 bytes) --

target/release/flate2_bench_miniz_oxide
mean runtime 0.443965s (stdev of 0.000692) at 13.772 MB/s, ratio of 2.454
target/release/flate2_bench_fdeflate
mean runtime 0.267347s (stdev of 0.000694) at 23.136 MB/s, ratio of 2.426
target/release/flate2_bench_zlib_ng
mean runtime 0.162743s (stdev of 0.000310) at 37.848 MB/s, ratio of 2.436
target/release/flate2_bench_zlib_rs
mean runtime 0.170855s (stdev of 0.001357) at 36.051 MB/s, ratio of 2.436

-- deflate level 9 (chunks of 4096 bytes) --

target/release/flate2_bench_miniz_oxide
mean runtime 0.751213s (stdev of 0.005024) at 8.101 MB/s, ratio of 2.466
target/release/flate2_bench_fdeflate
mean runtime 0.332334s (stdev of 0.001516) at 18.511 MB/s, ratio of 2.439
target/release/flate2_bench_zlib_ng
mean runtime 0.335950s (stdev of 0.000510) at 18.051 MB/s, ratio of 2.475
target/release/flate2_bench_zlib_rs
mean runtime 0.335970s (stdev of 0.000383) at 18.050 MB/s, ratio of 2.475

run.sh output with fdeflate on Apple M4

Apple clang version 21.0.0 (clang-2100.0.123.102)
Target: arm64-apple-darwin25.4.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

rustc 1.97.0-nightly (14196dbfa 2026-04-12)
binary: rustc
commit-hash: 14196dbfa3eb7c30195251eac092b1b86c8a2d84
commit-date: 2026-04-12
host: aarch64-apple-darwin
release: 1.97.0-nightly
LLVM version: 22.1.2
\n -- inflate (chunks of 4096 bytes) -- \n
target/release/flate2_bench_miniz_oxide
mean runtime 0.036069s (stdev of 0.010199) at 416.072 MB/s, ratio of 0.404
target/release/flate2_bench_fdeflate
mean runtime 0.022164s (stdev of 0.006995) at 677.090 MB/s, ratio of 0.404
target/release/flate2_bench_zlib_ng
mean runtime 0.036059s (stdev of 0.012593) at 416.191 MB/s, ratio of 0.404
target/release/flate2_bench_zlib_rs
mean runtime 0.028451s (stdev of 0.009108) at 527.478 MB/s, ratio of 0.404
\n -- deflate level 1 (chunks of 4096 bytes) -- \n
target/release/flate2_bench_miniz_oxide
mean runtime 0.051375s (stdev of 0.000310) at 146.743 MB/s, ratio of 1.991
target/release/flate2_bench_fdeflate
mean runtime 0.031730s (stdev of 0.000100) at 304.089 MB/s, ratio of 1.555
target/release/flate2_bench_zlib_ng
mean runtime 0.047112s (stdev of 0.000163) at 172.578 MB/s, ratio of 1.846
target/release/flate2_bench_zlib_rs
mean runtime 0.051220s (stdev of 0.000208) at 158.735 MB/s, ratio of 1.846
\n -- deflate level 6 (chunks of 4096 bytes) -- \n
target/release/flate2_bench_miniz_oxide
mean runtime 0.276999s (stdev of 0.001565) at 22.074 MB/s, ratio of 2.454
target/release/flate2_bench_fdeflate
mean runtime 0.194151s (stdev of 0.004169) at 31.858 MB/s, ratio of 2.426
target/release/flate2_bench_zlib_ng
mean runtime 0.129773s (stdev of 0.002081) at 47.464 MB/s, ratio of 2.436
target/release/flate2_bench_zlib_rs
mean runtime 0.145596s (stdev of 0.001938) at 42.306 MB/s, ratio of 2.436
\n -- deflate level 9 (chunks of 4096 bytes) -- \n
target/release/flate2_bench_miniz_oxide
mean runtime 0.456074s (stdev of 0.001685) at 13.344 MB/s, ratio of 2.466
target/release/flate2_bench_fdeflate
mean runtime 0.242042s (stdev of 0.006304) at 25.417 MB/s, ratio of 2.439
target/release/flate2_bench_zlib_ng
mean runtime 0.274085s (stdev of 0.002296) at 22.126 MB/s, ratio of 2.475
target/release/flate2_bench_zlib_rs
mean runtime 0.260981s (stdev of 0.001494) at 23.237 MB/s, ratio of 2.475

Shnatsel · 2026-04-14T11:24:28Z

Compression is not yet as optimized as decompression in fdeflate, so there's still low-hanging fruit. A two-line change in image-rs/fdeflate#74 takes compression performance at level 6 from 23MB/s to 27MB/s, which is double the miniz_oxide performance, and at level 9 from 18MB/s to 22MB/s.

Compression ratios on silesia are slightly inferior to other backends. Compression heuristics have been tuned for PNG data, and might have to be adjusted to better handle other kinds of inputs.

Shnatsel · 2026-04-14T12:26:50Z

I've tried a corpus other than Silesia and found that fdeflate beats zlib-rs at compression speed by nearly 50% at levels 6 and 9. This suggests that zlib-ng and by proxy zlib-rs may be overfitted to perform well on Silesia but not on other kinds of data.

However, at compression level 1 fdeflate performance collapses, dipping even below its own level 6 compression speed, suggesting that level 1 may be overfitted to PNG data and will require changes.

You can find all the details at image-rs/fdeflate#75

fintelia · 2026-04-14T18:39:39Z

The flate2 wrapper needs to maintain an internal window buffer since fdeflate uses the output buffer as a lookback window for back-references, but flate2 passes fresh output slices on each call. This introduces an extra in-memory copy in the inflate path.

I think this is just fundamental to flate2's API. The copying needs to happen somewhere. Other backends just do it internally while fdeflate outsources the responsibility.

Mid-stream flushing doesn't seem to be supported by fdeflate

Started working on support here: image-rs/fdeflate#72. The unit tests in flate2 also assume fixed/stored blocks so image-rs/fdeflate#73 will also be needed to get them to work.

Dictionary functionality also seems to be missing

I think that miniz_oxide doesn't support this either. AFAIK dictionaries aren't really used much

Shnatsel · 2026-04-14T22:46:20Z

The apparent slowdown at compression level 1 on some corpora turned out to be entirely due to the jank in this PR and not fdeflate's fault.

With up to 80% of the computation time being wasted due to the aforementioned jank, a proper implementation should show much higher compression performance than the early benchmarks conducted on this PR.

Shnatsel · 2026-04-14T22:52:12Z

Looking at the inflate profile, both appear to compute checksums, so I don't think fdeflate is doing any less work here. It really does seem to be just faster than zlib-rs, which also mirrors our findings for the png crate.

Shnatsel · 2026-04-18T22:21:45Z

Flushing was added to fdeflate in image-rs/fdeflate#72

The remaining required changes to fdefalte are quite trivial.

This was only ever meant as a proof of concept, and the concept seem sufficiently proven, so I'll go ahead and close this.

Shnatsel and others added 2 commits April 13, 2026 23:25

Shnatsel changed the title ~~PoC: Add fdeflate as a new decompression backend~~ PoC: Add fdeflate as a new backend Apr 13, 2026

Shnatsel mentioned this pull request Apr 14, 2026

Poor performance at compression level 1 on Govdocs1 corpus image-rs/fdeflate#75

Closed

Shnatsel mentioned this pull request Apr 14, 2026

switch to zlib-rs by default #542

Draft

mstange mentioned this pull request Apr 16, 2026

Clarification on Decompressor::read documentation image-rs/fdeflate#76

Closed

Shnatsel mentioned this pull request Apr 17, 2026

Performance regression in zlib-rs backend, more zeroing of memory than before #544

Open

Shnatsel closed this Apr 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PoC: Add fdeflate as a new backend#545

PoC: Add fdeflate as a new backend#545
Shnatsel wants to merge 2 commits intorust-lang:mainfrom
Shnatsel:fdeflate-poc-2

Shnatsel commented Apr 13, 2026 •

edited

Loading

Uh oh!

Shnatsel commented Apr 13, 2026 •

edited

Loading

Uh oh!

Shnatsel commented Apr 14, 2026

Uh oh!

Shnatsel commented Apr 14, 2026

Uh oh!

fintelia commented Apr 14, 2026

Uh oh!

Shnatsel commented Apr 14, 2026

Uh oh!

Shnatsel commented Apr 14, 2026

Uh oh!

Shnatsel commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Shnatsel commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Shnatsel commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Shnatsel commented Apr 14, 2026

Uh oh!

Shnatsel commented Apr 14, 2026

Uh oh!

fintelia commented Apr 14, 2026

Uh oh!

Shnatsel commented Apr 14, 2026

Uh oh!

Shnatsel commented Apr 14, 2026

Uh oh!

Shnatsel commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Shnatsel commented Apr 13, 2026 •

edited

Loading

Shnatsel commented Apr 13, 2026 •

edited

Loading