Skip to content

[WIP] Add custom compaction scheduler with space amplification trigger#189

Draft
airhorns wants to merge 1 commit intomainfrom
custom-compaction
Draft

[WIP] Add custom compaction scheduler with space amplification trigger#189
airhorns wants to merge 1 commit intomainfrom
custom-compaction

Conversation

@airhorns
Copy link
Copy Markdown
Contributor

@airhorns airhorns commented Mar 18, 2026

Summary

  • Adds a custom SlateDB CompactionScheduler that extends size-tiered compaction with a space amplification trigger: when non-bottom-run data exceeds 25% of the bottom run, forces a full compaction to drop tombstones
  • More aggressive L0 compaction with min_compaction_sources=2 (vs default 4)
  • Three new Prometheus per-shard gauges: silo_shard_l0_sst_count, silo_shard_sorted_run_count, silo_shard_space_amplification_percent
  • WebUI shard page now shows space amplification % with color-coded health indicators (green/amber/red)

Test plan

  • cargo test --test compaction_scheduler_tests — 8 new integration tests
  • cargo test --test compact_shard_tests — existing compaction tests pass
  • cargo test --test shard_cleanup_tests — cleanup tests pass
  • cargo clippy — no new warnings
  • Manual: start dev server, open shard page, verify space amp display and compaction hints

🤖 Generated with Claude Code


Note

Medium Risk
Changes core SlateDB compaction scheduling and wiring, which can materially affect write/read amplification and storage usage under load. Also adds periodic manifest reads for metrics/UI, which could introduce overhead or new failure modes if misused.

Overview
Adds a new custom SlateDB CompactionScheduler (SiloCompactionScheduler) that keeps size-tiered compaction behavior but forces a full compaction when space amplification exceeds a configurable threshold (default 25%), and makes L0 compaction more aggressive (default min_compaction_sources=2). The scheduler is wired into shard DB creation and includes validate/generate handling plus unit tests for option parsing and space-amp calculation.

Exposes compaction health via a new LsmState.space_amplification_percent, three new per-shard Prometheus gauges, and a server-side periodic poll (~5s) to update them. The WebUI shard detail page now displays space amplification with color-coded health and updated warning thresholds.

Written by Cursor Bugbot for commit 7af3dae. This will update automatically on new commits. Configure here.

@airhorns airhorns marked this pull request as ready for review March 19, 2026 00:41
SlateDB's default Size-Tiered Compaction doesn't clean up tombstones
fast enough for Silo's high-churn workload where task keys are created
and deleted within minutes, causing ~96% CPU from scanning accumulated
tombstones. This adds a custom CompactionScheduler that extends STC
with a space amplification safety valve (inspired by RocksDB's Universal
Compaction): when non-bottom-run data exceeds 25% of the bottom run,
a full compaction is forced to drop tombstones.

Key changes:
- Custom scheduler with min_compaction_sources=2 (vs STC's 4) and
  configurable space amplification threshold (default 25%)
- Three new Prometheus gauges per shard: l0_sst_count, sorted_run_count,
  space_amplification_percent (updated every ~5s in the reaper loop)
- WebUI shard page shows space amp % with color-coded health indicators
  (green/amber/red thresholds)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@airhorns airhorns force-pushed the custom-compaction branch from d704326 to 7af3dae Compare March 19, 2026 01:31
@airhorns airhorns marked this pull request as draft March 19, 2026 22:27
@airhorns airhorns changed the title Add custom compaction scheduler with space amplification trigger [WIP] Add custom compaction scheduler with space amplification trigger Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant