Skip to content

chore(core): implement ZSTD dict based compression#6967

Merged
romange merged 1 commit into
mainfrom
List_Compr_POC
Mar 26, 2026
Merged

chore(core): implement ZSTD dict based compression#6967
romange merged 1 commit into
mainfrom
List_Compr_POC

Conversation

@romange
Copy link
Copy Markdown
Collaborator

@romange romange commented Mar 24, 2026

Move ZstdDictState from per-QList unique_ptr to a thread-local raw
pointer shared across all QList instances on the same thread.
Motivation: ZSTD_CDict and ZSTD_DDict take 200-300KB each and holding them
per list kills the main motivation to save memory.

  • Rename AllowCompression -> AllowLZFCompression
  • Split TrainAndCompressZstdDict into TrainZstdDict,
    CompressWithZstdDict, and CompressNodeWithDict
  • Add bit fields for tracking compression state:
    dict_learning_failed, dict_compress_failed, dict_bulk_finished
  • CoolOff ZSTD branch: 3-state machine
    (train -> bulk compress -> per-node compress)
  • Add ShutdownThread() to free thread-local dict on shard teardown
  • Wire list_experimental_zstd_dict_threshold flag in list_family
  • Add QListZstdTest suite with 6 tests

Test: ./dfly_bench -n 80000 -p 6379 -qps=0 -d 64 --key_maximum=10 --command="lpush __key__ foooooooooobbbbbbbbbbbbbbaaaaaaaaaaaaaaaaaarrrrrrrrrrrrrrrrrrrrrrrrr" --pipeline=5

  1. With list_experimental_zstd_dict_threshold=0 (disabled, default): used_memory_human:2.56GiB QPS: 1723935, P99 lat: 6ms
  2. With list_experimental_zstd_dict_threshold=100000: used_memory_human: 32.84MiB, QPS: 1784274 P99 - 5.5ms

75 times less memory consumption for this synthetic use-case.
For real-world use-cases I expect to have 3-10x reduction in some cases.

Fixes #6899

Signed-off-by: Roman Gershman roman@dragonflydb.io

Copilot AI review requested due to automatic review settings March 24, 2026 19:00
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements an experimental ZSTD dictionary–based compression mode for QList to significantly reduce memory usage by sharing a per-thread dictionary state across all lists on the same shard thread.

Changes:

  • Adds a new list_experimental_zstd_dict_threshold flag and wires it into list promotion/creation to enable dict compression above a malloc-usage threshold.
  • Refactors QList compression flow into a dict-based 3-stage state machine (train → bulk-compress → incremental per-node compress) backed by thread-local dict state with explicit shutdown.
  • Adds ZSTD-related tests, including a new QListZstdTest suite.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/server/rdb_save.cc Minor formatting-only change in list RDB save path.
src/server/list_family.cc Adds and wires list_experimental_zstd_dict_threshold to configure QList dict compression threshold.
src/server/engine_shard.cc Calls QList::ShutdownThread() during shard thread-local teardown to free TLS dict state.
src/core/zstd_test.cc Adds a test exercising ZSTD CDict memory estimation API (and enables advanced API via macro).
src/core/qlist_test.cc Adds a new QListZstdTest suite covering dict compression behaviors.
src/core/qlist.h Introduces ShutdownThread() API, renames compression helper, and adds bitfields tracking dict compression state.
src/core/qlist.cc Implements TLS-backed dict training/compression and integrates it into CoolOff() logic.

Comment thread src/core/qlist.cc Outdated
Comment thread src/core/zstd_test.cc
Comment thread src/core/qlist_test.cc
Comment thread src/core/qlist.cc
Comment thread src/core/qlist.cc Outdated
@augmentcode
Copy link
Copy Markdown

augmentcode Bot commented Mar 24, 2026

🤖 Augment PR Summary

Summary: Adds an experimental ZSTD dictionary-based compression mode for QList that shares a per-thread dictionary to significantly reduce memory usage.

Changes:

  • Introduce a thread-local ZstdDictState (CDict/DDict plus reusable CCtx/DCtx) shared by all lists on a shard thread.
  • Add per-list state bits to manage dict training, bulk compression, and steady-state per-node compression.
  • Train dictionaries using core/dict_builder helpers and compress interior nodes with ZSTD_compress_usingCDict.
  • Replace the prior recompress-only helper with RecompressNode() to support dict recompression after reads/mutations.
  • Wire a new flag --list_experimental_zstd_dict_threshold to enable/trigger dict training by list malloc usage.
  • Ensure thread-local dict cleanup via QList::ShutdownThread() on shard teardown and in tests.
  • Update RDB list serialization to avoid persisting ZSTD bytes as RDB-LZF by first decompressing ZSTD nodes to raw.
  • Add a QListZstdTest suite covering read, push/pop, index access, and incremental compression behavior.

Technical Notes: The dict mode is mutually exclusive with depth-based LZF compression and is synchronous (may block shard thread during training/bulk pass).

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 4 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

Comment thread src/core/qlist.cc
Comment thread src/core/qlist.cc
Comment thread src/core/qlist.cc
Comment thread src/core/qlist.cc Outdated
@romange
Copy link
Copy Markdown
Collaborator Author

romange commented Mar 25, 2026

augment review

Move ZstdDictState from per-QList unique_ptr to a thread-local raw
pointer shared across all QList instances on the same thread.
Motivation: ZSTD_CDict and ZSTD_DDict take 200-300KB each and holding them
per list kills the main motivation to save memory.

- Rename AllowCompression -> AllowLZFCompression
- Split TrainAndCompressZstdDict into TrainZstdDict,
  CompressWithZstdDict, and CompressNodeWithDict
- Add bit fields for tracking compression state:
  dict_learning_failed, dict_compress_failed, dict_bulk_finished
- CoolOff ZSTD branch: 3-state machine
  (train -> bulk compress -> per-node compress)
- Add ShutdownThread() to free thread-local dict on shard teardown
- Wire list_experimental_zstd_dict_threshold flag in list_family
- Add QListZstdTest suite with 6 tests

Test: `./dfly_bench  -n 80000 -p 6379 -qps=0  -d 64 --key_maximum=10  --command="lpush __key__ foooooooooobbbbbbbbbbbbbbaaaaaaaaaaaaaaaaaarrrrrrrrrrrrrrrrrrrrrrrrr" --pipeline=5`

1. With list_experimental_zstd_dict_threshold=0 (disabled, default): used_memory_human:2.56GiB QPS: 1723935, P99 lat: 6ms
2. With list_experimental_zstd_dict_threshold=100000: used_memory_human: 32.84MiB, QPS: 1784274 P99 - 5.5ms

75 times less memory consumption for this synthetic use-case.
For real-world use-cases I expect to have 3-10x reduction in some cases.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Copy link
Copy Markdown

@augmentcode augmentcode Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 2 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

Comment thread src/core/qlist.cc
ZSTD_DCtx_reset(tl_zstd_dict->dctx, ZSTD_reset_session_only);
size_t dsz = ZSTD_decompress_usingDDict(tl_zstd_dict->dctx, decompressed, node->sz,
lzf->compressed, lzf->sz, ddict);
if (ZSTD_isError(dsz) || dsz != node->sz) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DecompressRaw() can now return false for ZSTD failures, but callers (via TryDecompressInternal()/AccessForReads()) don’t treat that as a hard stop and will likely keep using node->entry as RAW listpack bytes while it’s still compressed; also the compression stats are decremented before success is known. Consider making decompression failure fatal again or propagating the failure so callers can abort and stats/state can be kept consistent.

Severity: high

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@claude can you fix?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@claude[agent] fix by adding a new commit. do not change the PR description or its title.

Comment thread src/core/qlist.cc
@Claude Claude AI changed the title chore(core): implement ZSTD dict based compression Addressing PR comments Mar 25, 2026
@romange romange changed the title Addressing PR comments chore(core): implement ZSTD dict based compression Mar 25, 2026
@Claude
Copy link
Copy Markdown
Contributor

Claude AI commented Mar 25, 2026

augment review

Addressed review feedback in 73efcba:

  • Fixed brittle test assertion in EstimateDictSize (now checks for valid result and > 0)
  • Added recompression path for ZSTD-dict mode to properly handle nodes marked for recompression after reads

@Claude Claude AI changed the title chore(core): implement ZSTD dict based compression Addressing PR comments Mar 25, 2026
Copy link
Copy Markdown

@augmentcode augmentcode Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 2 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

Comment thread src/core/qlist.cc
Comment thread src/core/qlist.cc Outdated
@romange romange changed the title Addressing PR comments chore(core): implement ZSTD dict based compression Mar 25, 2026
@romange
Copy link
Copy Markdown
Collaborator Author

romange commented Mar 26, 2026

augment review

@romange romange requested a review from vyavdoshenko March 26, 2026 06:28
Copy link
Copy Markdown

@augmentcode augmentcode Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestion posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

Comment thread src/core/qlist.cc
Copy link
Copy Markdown
Contributor

@vyavdoshenko vyavdoshenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@romange romange merged commit a90b685 into main Mar 26, 2026
26 checks passed
@romange romange deleted the List_Compr_POC branch March 26, 2026 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

epic: adaptive list compression

4 participants