ELares · ELares · Jun 14, 2026 · Jun 14, 2026
@@ -0,0 +1,112 @@
+# Design: Advisor decision and audit trail
+
+Issue: #153. Decisions: ADR-0013 (advisor default posture is shadow/off).
+Related: ADVISOR_SAFETY.md (#91, the safety mechanism this records),
+ADVISOR.md (#126, the controller that emits events), ADVISOR_PROMOTION.md
+(#154, the gate whose verdicts are logged), OBSERVABILITY.md (#86/#152, the
+INFO/metrics surfaces), CONFIG.md (#85, the versioned snapshot store).
+
+## Goal and scope
+
+The advisor retunes deterministic knobs (active eviction policy, sampled count,
+LFU log-factor and decay, ghost size, slab/encoding/compression thresholds), so
+an operator must be able to answer "what did it change, why, and did it help?"
+after the fact. This spec owns the durable, tamper-evident decision/audit log and
+its INFO + `/metrics` projection. It is the diagnostic backbone for the #91
+rollback and the record of every #154 promotion verdict. In scope: the event
+schema, durability and tamper-evidence, retention, the queryable surface, and
+shadow-mode emission. Out of scope: the safety mechanism itself (#91), the
+promotion decision (#154), and the metric-registry transport (#86/#152).
+
+## Design
+
+### What an event records
+
+- One append-only record per advisor action and per safety event. A knob-change
+  record carries: monotonic snapshot version (from #91/#85), wall and logical
+  time, knob id, from-value, to-value, the triggering expert or objective delta
+  (which bandit/regret expert won and by how much [cacheus-experts]
+  [lecar-regret-minimization-smallcache]), the replay evidence that it beat the
+  static baseline (the #154 margin), and the seed. Safety records cover rollback
+  and kill-switch trips with cause (which metric regressed, by how much, over
+  which window). The objective the delta is measured against is hit ratio scored
+  off the hot path, never a per-request shadow simulation
+  [hit-ratio-can-hurt-throughput].
+
+### Tamper-evidence and durability
+
+- The log is a hash-chained append-only journal: each record commits the prior
+  record's digest, so any edit or deletion in the middle breaks the chain and is
+  detectable on read. It is written through the same fail-closed io_uring write
+  path the persistence umbrella defines (PERSISTENCE.md, #58), not a side file, so
+  a crash cannot silently lose the tail. The chain is verified at boot and a break
+  is surfaced as a distinct INFO field and metric rather than panicking.
+
+### Surfaced via INFO and /metrics
+
+- Current advisor state lives in the native `# IronCache` INFO section (#152): the
+  posture (off/shadow/active per ADR-0013), the live snapshot version, the active
+  expert, the count of changes/rollbacks/kill-switch trips, and the last verdict.
+  The same counters are Prometheus series in the versioned registry (#152) under a
+  bounded label set (knob id from a fixed allow-list, no free-form cardinality).
+  The decision log is not a high-cardinality metric: `/metrics` exposes aggregate
+  counters and gauges, while the per-record detail is read through the query
+  surface, keeping the scrape cheap (the OBSERVABILITY.md cardinality rule).
+
+### Queryable surface
+
+- A read-only admin verb returns recent records filtered by knob, version range,
+  or event type (rollback/kill-switch/promotion), bounded in count like SLOWLOG.
+  Records are immutable; there is no mutating verb on the journal. The query path
+  is gated by the same auth posture as other introspection (MONITOR/metrics auth
+  decision, SECRETS.md #145), and any secret-bearing field is redacted there too.
+
+### Emitted even in shadow mode
+
+- In shadow mode the advisor mutates nothing live (ADR-0013) yet records every
+  recommendation it would have applied, with the same schema and the would-be
+  from/to and replay evidence. This is the evidence the #90 headroom study and the
+  #154 gate consume to decide whether active tuning is ever justified
+  [wtinylfu-caffeine-sketch]: shadow logging is the safe first rung of the
+  off -> shadow -> active ladder, producing an auditable trail before any knob
+  moves.
+
+### Retention
+
+- Retention is bounded and configurable: a ring of the last N records plus all
+  records since the current snapshot version, whichever is larger, so the full
+  causal history of the live config is always present even after the ring wraps.
+  Rollback and kill-switch records are retained at a higher floor than routine
+  knob changes, because they are the post-incident record. Eviction of old records
+  re-anchors the hash chain with a checkpoint digest so tamper-evidence survives
+  truncation.
+
+## Open questions
+
+- The admin verb's exact name/shape (a SLOWLOG-style RESP reply vs a CONFIG-style
+  subcommand), settled with the #150 admin-command surface.
+- Whether the journal is per-shard (matching the shared-nothing core) with a
+  merged read view, or a single core-0-owned log, and the seed scope this implies
+  (the #91 per-shard-vs-global seed open question).
+- Default retention floors for routine vs safety records, and whether the chain
+  checkpoint digest is itself exported for external verification.
+
+## Acceptance and test hooks
+
+- Every applied knob change and every rollback/kill-switch trip produces exactly
+  one chained record carrying snapshot version, from/to, trigger, margin, seed,
+  and cause; a mid-journal edit is detected as a chain break on read.
+- In shadow mode no knob mutates live yet the recommendation log grows with full
+  schema (asserted against ADR-0013 posture).
+- INFO advisor fields and the `/metrics` counters agree with the journal contents
+  and stay within the #152 cardinality bound under an adversarial knob workload.
+- A seeded replay reproduces an identical event stream (the #91 determinism
+  invariant projected onto the log).
+
+## References
+
+- ADR-0013; issues #153, #91, #126, #154, #90, #85, #86, #152, #150, #145, #58,
+  #1; specs ADVISOR.md, ADVISOR_SAFETY.md, ADVISOR_PROMOTION.md, OBSERVABILITY.md,
+  CONFIG.md, SECRETS.md, PERSISTENCE.md.
+- Claims: [cacheus-experts], [lecar-regret-minimization-smallcache],
+  [hit-ratio-can-hurt-throughput], [wtinylfu-caffeine-sketch].
@@ -0,0 +1,112 @@
+# Design: Advisor evaluation and promotion gate
+
+Issue: #154. Decisions: ADR-0013 (advisor default posture is shadow/off).
+Related: ADVISOR_SAFETY.md (#91, live rollback, distinct from this pre-promotion
+gate), ADVISOR.md (#126, the controller whose candidates are gated),
+ADVISOR_AUDIT.md (#153, which records each verdict), TESTING.md/BENCHMARK.md
+(#95/#96/#93, the replay harness and oracle), CONFIG.md (#85, the snapshot store).
+
+## Goal and scope
+
+The hard project rule is that an advisor change must beat the tuned static
+baseline on replayed traces before it may act. This spec owns that promotion gate:
+an offline-replay plus shadow-A/B pipeline that proves a candidate config beats
+the live static baseline by a quantified, harness-tuned margin before the
+controller is allowed to publish it. It turns the one-time #90 headroom study and
+the #93 offline oracle into a continuous gating pipeline that makes the
+"no regression below baseline" target enforceable. #153 records what happened;
+this decides what is allowed. In scope: the baseline definition, the two gate
+stages, the acceptance margin, and the no-regression sign-off. Out of scope: live
+rollback after promotion (#91), the controller internals (#126), and the oracle
+implementation (#93).
+
+## Design
+
+### The baseline a candidate must beat
+
+- The gate's reference is the tuned static baseline: W-TinyLFU admission
+  [wtinylfu-caffeine-sketch] over the SIEVE/S3-FIFO eviction floor
+  [sieve-simpler-than-lru-nsdi24] [s3fifo-small-main-split], with its own knobs
+  tuned per trace first so the advisor competes against the best deterministic
+  effort, not a strawman (the #90 measurement hazard). This is the same static
+  baseline #91's kill-switch reverts to, so "beats baseline on replay" and
+  "kill-switch target" name one config.
+
+### Stage 1: offline replay against the oracle
+
+- A candidate config is replayed over the trace corpus in the benchmark-only
+  oracle harness (#93), scoring hit ratio at matched cache sizes and reporting the
+  gap to the Belady-MIN ceiling and the per-policy gap table [lhd-hit-density].
+  The candidate must close more of the baseline-to-MIN gap than the tuned baseline
+  by the acceptance margin. Scoring is hit ratio off the hot path only; the gate
+  never runs a per-access shadow simulator on a live request, because a higher hit
+  ratio reached by hot-path surgery can lower throughput
+  [hit-ratio-can-hurt-throughput]. Learned-Belady predictors appear here only as
+  offline ceilings (the #13 non-goal), never as a deployable policy
+  [parrot-imitation-belady-icml20] [lrb-relaxed-belady-gbm].
+
+### Stage 2: shadow A/B against the live baseline
+
+- A candidate that passes Stage 1 runs in shadow against live traffic (ADR-0013):
+  the live baseline serves requests while the candidate is scored on the same
+  access stream off the hot path. The gate compares candidate vs baseline hit
+  ratio over a window and requires the candidate to win by the margin with a
+  no-regression sign-off on the watched throughput-per-core signal. Only a
+  candidate that clears both stages becomes eligible for the controller to publish
+  as a new snapshot; in shadow posture it still publishes nothing live, it only
+  records eligibility (#153).
+
+### Acceptance margin and sign-off
+
+- The margin is harness-tuned, not a slogan: a minimum marginal hit-ratio gain
+  over the tuned baseline at the cache-to-working-set ratios IronCache actually
+  runs, defended against the operational cost of an adaptive component (the #90
+  open question). The margin is set conservatively because the adaptive gain
+  concentrates on small caches and can evaporate or invert on the large,
+  frequency-dominated caches IronCache expects [lecar-regret-minimization-smallcache]
+  [cacheus-experts]; the expert pool here is the cheap O(1) controller, not a
+  per-request ensemble [lecar-regret-min-18x]. A candidate inside the noise band,
+  or that regresses throughput-per-core, is rejected, not promoted.
+
+### Relationship to live rollback
+
+- The promotion gate is pre-action and offline-plus-shadow; #91 rollback is
+  post-action and live. A change must clear this gate to act at all; once acting,
+  #91's regression detector can still revert it and the kill-switch can still drop
+  to baseline. The two compose: this minimizes how often rollback fires by never
+  letting an unproven change act, and rollback covers the residual case where
+  replay and shadow did not predict the live result.
+
+## Open questions
+
+- The exact acceptance margin per knob class and the shadow-A/B window length,
+  shared with #91's threshold/window open decision and calibrated on the corpus.
+- Trace-corpus weighting for the verdict (the #90 in-memory-KV weighting), and
+  whether Stage 1 must pass on every corpus trace or on a weighted majority.
+- Whether shadow A/B is per-shard or global, and how candidate scoring is
+  isolated from the live serving path's cache state.
+- Re-promotion cadence: how often a previously rejected candidate may be re-tried
+  as the workload drifts, without flapping.
+
+## Acceptance and test hooks
+
+- A candidate that does not beat the tuned static baseline by the margin in Stage
+  1 replay is never promoted; the gap-to-MIN table (#93) is recorded for the
+  verdict (#153).
+- A candidate that passes Stage 1 but loses or only ties the shadow A/B, or
+  regresses throughput-per-core, is rejected with a no-regression sign-off failure
+  [hit-ratio-can-hurt-throughput].
+- In shadow posture (ADR-0013) the gate records eligibility but the controller
+  publishes nothing live.
+- A seeded replay of the same candidate and trace yields an identical verdict (the
+  #91 determinism invariant applied to the gate).
+
+## References
+
+- ADR-0013; issues #154, #91, #126, #153, #90, #93, #95, #96, #85, #13, #1; specs
+  ADVISOR.md, ADVISOR_SAFETY.md, ADVISOR_AUDIT.md, TESTING.md, BENCHMARK.md,
+  CONFIG.md.
+- Claims: [wtinylfu-caffeine-sketch], [sieve-simpler-than-lru-nsdi24],
+  [s3fifo-small-main-split], [lhd-hit-density], [hit-ratio-can-hurt-throughput],
+  [parrot-imitation-belady-icml20], [lrb-relaxed-belady-gbm],
+  [lecar-regret-minimization-smallcache], [cacheus-experts], [lecar-regret-min-18x].
@@ -106,3 +106,8 @@ Specs added as the M1 milestone progresses.
 - [ADVISOR.md](ADVISOR.md): the per-shard background advisor (LeCaR/bandit expert
   weighting, bounded knobs, atomic RCU config swap, EvictionPolicy-trait binding,
   shadow/off default per ADR-0013) (#126).
+- [ADVISOR_AUDIT.md](ADVISOR_AUDIT.md): the durable tamper-evident advisor
+  decision/audit log (knob deltas, trigger, snapshot version, replay evidence,
+  rollback/kill events), surfaced via INFO/metrics, emitted even in shadow (#153).
+- [ADVISOR_PROMOTION.md](ADVISOR_PROMOTION.md): the offline-replay + shadow-A/B
+  promotion gate proving a change beats the static baseline before it acts (#154).