diff --git a/docs/design/ADVISOR.md b/docs/design/ADVISOR.md new file mode 100644 index 0000000..cc76eb5 --- /dev/null +++ b/docs/design/ADVISOR.md @@ -0,0 +1,96 @@ +# Design: Per-shard background advisor (expert selection + bounded knob autotuning) + +Issue: #126. Decisions: ADR-0013 (advisor off/shadow default posture), ADR-0008 +(S3-FIFO default eviction). Related: #91 (safety guardrails, ADVISOR_SAFETY.md), +#48 (EvictionPolicy trait, EVICTION.md), #49 (W-TinyLFU filter, WTINYLFU.md), #85 +(config snapshot, CONFIG.md), #13 (no per-request inference), #88 (parent, +decomposed). + +## Goal and scope + +This specifies the per-shard background advisor: an off-path loop that weights +experts over the deterministic policy set and autotunes a bounded knob set, then +publishes its choice through the atomic config-snapshot swap that +ADVISOR_SAFETY.md (#91) defines. The request loop stays inference-free and +deterministic (#13); the advisor only observes counters and proposes snapshots. +The off/shadow default posture is fixed by ADR-0013 and is not re-decided here. + +In scope: the expert-weighting mechanism, the policy set it selects among, the +bounded knob set, the cadence/hysteresis shape, the snapshot-swap coupling, and +the binding to the EvictionPolicy trait (#48). Out of scope: numeric tuning +(retune interval, marginal-gain threshold), deferred to the harness (#8); the +safety envelope (bounds enforcement, rollback, kill-switch, seeding), owned by +#91; the promotion gate (#154). + +## Design + +### Expert weighting + +- Each shard runs a regret-minimizing / contextual-bandit controller that weights + experts off the hot path. LeCaR maintains weights over two experts with regret + minimization and beats ARC by more than 18x at small cache-to-working-set + ratios [lecar-regret-min-18x]; CACHEUS generalizes this to an adaptive mixture + selected per workload primitive [cacheus-experts]. We borrow the controller as + the off-path selector and reject per-request ensemble evaluation, which would + reintroduce hot-path cost (#13). + +### Policy set + +- The experts are the cheap deterministic policies already behind the trait: + SIEVE (one FIFO, a hand, a visited bit) [sieve-algorithm], a W-TinyLFU + admission filter [wtinylfu-caffeine-sketch] (the non-ML floor, WTINYLFU.md + #49), and sampled LRU/LFU. The controller selects among them and tunes their + knobs; it never invents a policy. The default eviction core remains S3-FIFO + with its small/main split [s3fifo-small-main-split] (ADR-0008), which the + advisor may select but does not replace as the baseline. + +### Bounded knob set + +- The advisor tunes a small, bounded set: the active policy, sample count, LFU + log-factor [redis-lfu-log-factor] and decay, ghost size, and + slab/encoding/compression thresholds. The set is deliberately small so the + search space is enumerable and every proposal maps to a documented knob. Bounds + enforcement, clamping, and rejection are the safety spec's job (#91). + +### Cadence and hysteresis + +- The loop retunes on a fixed cadence (interval deferred to #8), proposes at most + the bounded knob deltas, and respects the per-knob hysteresis band and cooldown + from #91 so it cannot flap. A proposal is published only after it beats the + current snapshot on replay (the gate in #154). + +### Snapshot swap and trait binding + +- The advisor never mutates live policy directly. It builds an immutable seeded + snapshot and hands it to the atomic RCU pointer swap (#91), monotonically + versioned and coordinated with #85. The hot path reads the active policy and + knobs through the EvictionPolicy trait (#48); a swap changes which trait impl + and which knob values the shard uses on the next access, with no reader lock and + no torn read. + +## Open questions + +- Retune interval and the marginal-gain threshold for accepting a proposal + (deferred to the harness #8). +- Per-primitive context features the bandit conditions on, and whether the expert + set is fixed or extensible per tenant. +- Whether expert weights persist across restart or reset to the static baseline. +- How shadow-mode recommendations (ADR-0013) are surfaced before active tuning. + +## Acceptance and test hooks + +- The request loop performs no inference and is deterministic under replay (#13). +- The advisor proposes only knobs in the bounded set, each within its #91 bounds. +- A proposal reaches live policy only via the atomic versioned snapshot swap + (#91/#85), never by direct mutation. +- Policy selection routes through the EvictionPolicy trait (#48); a swap changes + the active impl with no reader lock. +- With the advisor off or in shadow the engine behaves identically to the static + baseline (ADR-0013). + +## References + +- ADR-0013, ADR-0008; issues #91, #48, #49, #85, #13, #154, #88; specs + ADVISOR_SAFETY.md, EVICTION.md, WTINYLFU.md, CONFIG.md. +- Claims: [lecar-regret-min-18x], [cacheus-experts], [wtinylfu-caffeine-sketch], + [sieve-algorithm], [s3fifo-small-main-split], [redis-lfu-log-factor]. diff --git a/docs/design/ADVISOR_SAFETY.md b/docs/design/ADVISOR_SAFETY.md new file mode 100644 index 0000000..7278783 --- /dev/null +++ b/docs/design/ADVISOR_SAFETY.md @@ -0,0 +1,108 @@ +# Design: Advisor safety guardrails (bounds, hysteresis, rollback, kill-switch) + +Issue: #91. Decisions: ADR-0013 (advisor off/shadow default posture), ADR-0008 +(S3-FIFO default eviction). Related: #126 (advisor mechanism, ADVISOR.md), #85 +(config sources, CONFIG.md), #48 (EvictionPolicy trait, EVICTION.md), #154 +(promotion gate), #88 (parent, decomposed). + +## Goal and scope + +The background advisor is the only place ML touches IronCache and it never runs +on the hot path. This spec is the safety envelope around it: per-knob bounds, +anti-oscillation, automatic rollback, a hard kill-switch to a known-good static +baseline, and a seeded monotonic-versioned config-snapshot contract the hot path +reads. The governing guarantee is that the advisor can only ever match or improve +the static baseline, never regress below it. With the advisor off the cache is +correct and fast, which the queueing result on hit-path contention motivates +[hit-ratio-can-hurt-throughput]. + +In scope: knob min/max bounds, hysteresis band plus cooldown, the regression +detector and rollback, the kill-switch, and the immutable seeded snapshot the +advisor publishes and the hot path consumes. Out of scope: the advisor objective +function and the expert algorithms (#126); knob storage and reload semantics +(#85); the policies behind the EvictionPolicy trait (#48); the off/shadow default +posture (ADR-0013). + +## Design + +### Per-knob bounds + +- Every tunable knob has a documented, enforced min and max; an out-of-range + proposal is clamped or rejected, never applied. The knob set is the bounded set + ADVISOR.md (#126) defines (active policy, Redis-style sample count + [redis-maxmemory-samples-5], LFU log-factor and decay + [redis-lfu-morris-counter-params], ghost size, slab/encoding/compression + thresholds). Bounds are a property of the snapshot schema, so a malformed or + adversarial proposal cannot widen them. + +### Hysteresis and cooldown + +- Each knob carries a hysteresis band and a cooldown timer. A change applies only + when the measured signal crosses the band, and no further change to that knob is + permitted until the cooldown elapses. This provably bounds change frequency and + stops flapping near a threshold, which a rate limit alone does not. + +### Regression detector and rollback + +- After a swap the detector compares live throughput-per-core and hit ratio + against the pre-change snapshot over the cooldown window. A measured regression + in either rolls the active snapshot back to the immediately prior one. Both + signals matter because a higher hit ratio can still lower throughput on a + relink-bound policy [hit-ratio-can-hurt-throughput]; FIFO-class policies + (S3-FIFO [s3fifo-small-main-split], SIEVE [sieve-algorithm]) avoid that, but the + detector does not assume it. + +### Kill-switch to the static baseline + +- A persistent or repeated breach trips a kill-switch that atomically reverts to a + static baseline of W-TinyLFU admission [wtinylfu-caffeine-sketch] over a + FIFO-class core, the deterministic floor any learned change must first beat + [sieve-simpler-than-lru-nsdi24]. The kill-switch is operator-forceable and is + the boot default (ADR-0013). The baseline is chosen over last-known-good because + a learned snapshot can itself be subtly bad, whereas the static path is + provably correct and fast. + +### Seeded versioned RCU snapshot contract + +- The hot path reads an immutable config snapshot through a single atomic pointer + swap (RCU-style); readers never block and never see a torn set. The advisor + publishes a new snapshot only after a candidate beats the current one on a + sampled replay (the gate detailed in #154). Each snapshot carries a seed and a + strictly monotonic version, coordinated with the config layers in #85. A given + seed plus an input replay yields identical eviction decisions, the determinism + invariant rollback and audit depend on. + +## Open questions + +- Regression thresholds and window length per knob class (throughput vs hit-ratio + sensitivity); numeric values deferred to the harness (#8). +- Cooldown duration and band width per knob, and whether they are themselves + bounded-tunable. +- Whether a kill-switch trip is sticky until operator reset or auto-clears after + a quiet period. +- Seed scope (per-shard vs global) and how it is recorded in the snapshot. +- Maximum knob delta per step (bounded step vs jump to any in-range value). + +## Acceptance and test hooks + +- Every knob has an enforced min/max; an out-of-range proposal is clamped or + rejected (schema test). +- A soak under a shifting workload shows hysteresis and cooldown bound change + frequency with no oscillation. +- An injected throughput or hit-ratio regression triggers rollback to the prior + snapshot. +- The kill-switch reverts to the static baseline atomically, is the boot default, + and is operator-forceable. +- With the advisor disabled the cache is correct and no path regresses below the + static baseline [hit-ratio-can-hurt-throughput]. +- A seeded snapshot plus input replay yields identical eviction decisions, and + snapshots are published and consumed with monotonic versioning (#85). + +## References + +- ADR-0013, ADR-0008; issues #126, #85, #48, #154, #88; specs ADVISOR.md, + CONFIG.md, EVICTION.md, WTINYLFU.md. +- Claims: [hit-ratio-can-hurt-throughput], [s3fifo-small-main-split], + [sieve-algorithm], [wtinylfu-caffeine-sketch], [sieve-simpler-than-lru-nsdi24], + [redis-maxmemory-samples-5], [redis-lfu-morris-counter-params], + [lecar-regret-min-18x], [cacheus-experts]. diff --git a/docs/design/README.md b/docs/design/README.md index 5bb2a24..4972cec 100644 --- a/docs/design/README.md +++ b/docs/design/README.md @@ -100,3 +100,9 @@ Specs added as the M1 milestone progresses. READWRITE, replica routing, bounded staleness surfaced to clients) (#147). - [NODE_LIFECYCLE.md](NODE_LIFECYCLE.md): cluster bootstrap and node lifecycle (seed/MEET join, learner to voter to slot-owner promotion, add/remove-node) (#149). +- [ADVISOR_SAFETY.md](ADVISOR_SAFETY.md): the advisor safety envelope (per-knob + bounds, hysteresis/cooldown, regression detect + rollback, kill-switch, RCU + snapshot contract) (#91). +- [ADVISOR.md](ADVISOR.md): the per-shard background advisor (LeCaR/bandit expert + weighting, bounded knobs, atomic RCU config swap, EvictionPolicy-trait binding, + shadow/off default per ADR-0013) (#126).