chore(core): implement ZSTD dict based compression#6967
Conversation
There was a problem hiding this comment.
Pull request overview
Implements an experimental ZSTD dictionary–based compression mode for QList to significantly reduce memory usage by sharing a per-thread dictionary state across all lists on the same shard thread.
Changes:
- Adds a new
list_experimental_zstd_dict_thresholdflag and wires it into list promotion/creation to enable dict compression above a malloc-usage threshold. - Refactors
QListcompression flow into a dict-based 3-stage state machine (train → bulk-compress → incremental per-node compress) backed by thread-local dict state with explicit shutdown. - Adds ZSTD-related tests, including a new
QListZstdTestsuite.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/server/rdb_save.cc | Minor formatting-only change in list RDB save path. |
| src/server/list_family.cc | Adds and wires list_experimental_zstd_dict_threshold to configure QList dict compression threshold. |
| src/server/engine_shard.cc | Calls QList::ShutdownThread() during shard thread-local teardown to free TLS dict state. |
| src/core/zstd_test.cc | Adds a test exercising ZSTD CDict memory estimation API (and enables advanced API via macro). |
| src/core/qlist_test.cc | Adds a new QListZstdTest suite covering dict compression behaviors. |
| src/core/qlist.h | Introduces ShutdownThread() API, renames compression helper, and adds bitfields tracking dict compression state. |
| src/core/qlist.cc | Implements TLS-backed dict training/compression and integrates it into CoolOff() logic. |
🤖 Augment PR SummarySummary: Adds an experimental ZSTD dictionary-based compression mode for Changes:
Technical Notes: The dict mode is mutually exclusive with depth-based LZF compression and is synchronous (may block shard thread during training/bulk pass). 🤖 Was this summary useful? React with 👍 or 👎 |
|
augment review |
Move ZstdDictState from per-QList unique_ptr to a thread-local raw pointer shared across all QList instances on the same thread. Motivation: ZSTD_CDict and ZSTD_DDict take 200-300KB each and holding them per list kills the main motivation to save memory. - Rename AllowCompression -> AllowLZFCompression - Split TrainAndCompressZstdDict into TrainZstdDict, CompressWithZstdDict, and CompressNodeWithDict - Add bit fields for tracking compression state: dict_learning_failed, dict_compress_failed, dict_bulk_finished - CoolOff ZSTD branch: 3-state machine (train -> bulk compress -> per-node compress) - Add ShutdownThread() to free thread-local dict on shard teardown - Wire list_experimental_zstd_dict_threshold flag in list_family - Add QListZstdTest suite with 6 tests Test: `./dfly_bench -n 80000 -p 6379 -qps=0 -d 64 --key_maximum=10 --command="lpush __key__ foooooooooobbbbbbbbbbbbbbaaaaaaaaaaaaaaaaaarrrrrrrrrrrrrrrrrrrrrrrrr" --pipeline=5` 1. With list_experimental_zstd_dict_threshold=0 (disabled, default): used_memory_human:2.56GiB QPS: 1723935, P99 lat: 6ms 2. With list_experimental_zstd_dict_threshold=100000: used_memory_human: 32.84MiB, QPS: 1784274 P99 - 5.5ms 75 times less memory consumption for this synthetic use-case. For real-world use-cases I expect to have 3-10x reduction in some cases. Signed-off-by: Roman Gershman <roman@dragonflydb.io>
| ZSTD_DCtx_reset(tl_zstd_dict->dctx, ZSTD_reset_session_only); | ||
| size_t dsz = ZSTD_decompress_usingDDict(tl_zstd_dict->dctx, decompressed, node->sz, | ||
| lzf->compressed, lzf->sz, ddict); | ||
| if (ZSTD_isError(dsz) || dsz != node->sz) { |
There was a problem hiding this comment.
DecompressRaw() can now return false for ZSTD failures, but callers (via TryDecompressInternal()/AccessForReads()) don’t treat that as a hard stop and will likely keep using node->entry as RAW listpack bytes while it’s still compressed; also the compression stats are decremented before success is known. Consider making decompression failure fatal again or propagating the failure so callers can abort and stats/state can be kept consistent.
Severity: high
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
@claude[agent] fix by adding a new commit. do not change the PR description or its title.
Addressed review feedback in 73efcba:
|
|
augment review |
Move ZstdDictState from per-QList unique_ptr to a thread-local raw
pointer shared across all QList instances on the same thread.
Motivation: ZSTD_CDict and ZSTD_DDict take 200-300KB each and holding them
per list kills the main motivation to save memory.
CompressWithZstdDict, and CompressNodeWithDict
dict_learning_failed, dict_compress_failed, dict_bulk_finished
(train -> bulk compress -> per-node compress)
Test:
./dfly_bench -n 80000 -p 6379 -qps=0 -d 64 --key_maximum=10 --command="lpush __key__ foooooooooobbbbbbbbbbbbbbaaaaaaaaaaaaaaaaaarrrrrrrrrrrrrrrrrrrrrrrrr" --pipeline=575 times less memory consumption for this synthetic use-case.
For real-world use-cases I expect to have 3-10x reduction in some cases.
Fixes #6899
Signed-off-by: Roman Gershman roman@dragonflydb.io