Implement snappy compression in userland code#11
Conversation
|
Output streaming is a bit challenging, as the compressed data starts with the uncompressed length:
One solution could be to do the following: use io\streams\FileOutputStream;
use io\streams\compress\Snappy;
$snappy= new Snappy();
$out= new FileOutputStream('compressed.sn');
$out->write($snappy->length(strlen($data));
$stream= $snappy->create($out);
$stream->write($data);
$stream->close();...but that feels hacky. We could overload the second parameter to open() as snappy does not use a compression level, which would give us: use io\streams\FileOutputStream;
use io\streams\compress\Snappy;
$snappy= new Snappy();
$stream= $snappy->create(new FileOutputStream('compressed.sn'), strlen($data));
$stream->write($data);
$stream->close();...but that would be inconsistent with other implementations. The classical options-approach would give us us something like this: $stream= $snappy->create(new FileOutputStream('compressed.sn'), ['length' => strlen($data)]);...but that's error prone to its "string-key" nature. We could solve this with an $stream= $snappy->create(new FileOutputStream('compressed.sn'), new Options(length: strlen($data)); |
|
Integration testing buffered vs. unbuffered snappy compression shows the implementation has bugs: # Calls compress()
$ xp snappy.script.php -c pdf.streaming > pdf.sn
pdf.streaming (2207250 -> 876717) 0.064 seconds & 2044.38 kB used / 6550.12 kB peak
# Calls open($out)
$ xp snappy.script.php -buf pdf.streaming pdf.sn
[.]
pdf.streaming (2207250 -> 876717) 0.074 seconds & 1426.95 kB used / 6832.72 kB peak
# Calls open($out, new Options(length: $size))
$ xp snappy.script.php -out pdf.streaming pdf.sn
[.]
pdf.streaming (2207250 -> 89427) 0.190 seconds & 1445.20 kB used / 1786.59 kB peakAll of these yield the following decompression error: $ snappy -d pdf.sn > pdf.return
snappy: pdf.sn: compressed block of length 876717: expecting 2207250 bytes, got 909072For comparison, this is what is expected: $ snappy pdf.streaming > pdf.sn
pdf.streaming: 2207250 -> 2199987 (99.67%) |
Discovered when integration-testing with the official test data from https://github.com/google/snappy/tree/main/testdata
|
Using https://github.com/google/snappy/tree/main/testdata files copied to ./fixtures: Integration testing for compress()for file in $(ls -1 fixtures/* | grep -v baddata); do
echo "== $file =="
xp snappy.script.php -c $file > sn
snappy -d sn > test
diff -u test $file && echo "OK"
rm sn test
done✅ Works |
|
Streaming, while being a bit slower for small files, really shines with large files: The 584 MB video file compresses in 9 seconds instead of 34, and has a peak memory usage of just 1.8 Megabytes vs. 1.1 Gigabytes! |
|
Added to MongoDB in https://github.com/xp-forge/mongodb/releases/tag/v3.6.0 |
See https://en.wikipedia.org/wiki/Snappy_(compression), https://google.github.io/snappy/ and xp-forge/mongodb#62 (comment)