Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 50 additions & 0 deletions .github/workflows/adr-governance.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# SPDX-License-Identifier: MIT OR Apache-2.0
#
# ADR governance (online, NON-BLOCKING).
#
# Completes issue #4 rule 6 "bind a closed [DECISION] issue to the existence of
# its ADR" without making the offline docs build depend on the GitHub API. The
# offline, deterministic gate (scripts/ci/check-adr-index.sh, run by docs.yml)
# remains the hard gate on every PR; this job is advisory: it queries the repo's
# own issues via the Actions GITHUB_TOKEN and REPORTS closed decision-needed
# issues that have no matching ADR, and ADR Issue: headers that point at a
# missing, still-open, or mislabeled issue. It never fails the build.
#
# Triggers: a weekly schedule (so the decision trail is reconciled as decisions
# close, even between docs PRs), on demand, and a non-blocking pull_request
# trigger scoped to the binding files so a regression in the report shows on the
# PR that caused it. See docs/adr/README.md.
name: adr-governance

on:
schedule:
# Weekly, Monday 06:17 UTC. Off the hour to avoid the scheduler rush.
- cron: "17 6 * * 1"
workflow_dispatch: {}
# Re-run when the binding logic or the ADR records themselves change, so a
# regression in the report is visible on the PR that introduced it (still
# non-blocking: the script exits 0 on any mismatch, and main requires no
# status check).
pull_request:
paths:
- "docs/adr/**"
- "scripts/ci/check-adr-decision-binding.sh"
- ".github/workflows/adr-governance.yml"

# Read-only: list issues and read the checked-out ADR files. No write scopes.
permissions:
contents: read
issues: read

jobs:
adr-decision-binding:
name: closed decision-needed issues <-> ADR Issue headers
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Reconcile closed decisions against ADRs (report only)
# gh reads GH_TOKEN; the script exits 0 even on mismatch (advisory).
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GH_REPO: ${{ github.repository }}
run: sh scripts/ci/check-adr-decision-binding.sh
159 changes: 159 additions & 0 deletions docs/AI_PIPELINE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
# AI-assisted design pipeline

IronCache is designed by an LLM agent pipeline, not authored freehand. This
runbook documents that pipeline: the loop that mined the prior art into
[`prior-art/claims.yaml`](prior-art/claims.yaml) and produced the
pre-implementation [`AUDIT.md`](AUDIT.md), and the gates that keep every numeric
or version-specific assertion in the design tree sourced, unique, and
human-approved.

This is a process and governance document. It sits beside the
[charter](CHARTER.md) and the [ADR governance](adr/README.md), not under
`docs/design/`: it describes how the design is produced, not a subsystem to be
implemented. It realizes the design ratified in #94 (decomposed from #88), the
AI-driven concern of which is tracked as #127.

## Why a pipeline, not freehand authoring

Cache design lives or dies on numbers: hit ratio, tail latency, bytes saved,
eviction quality. Agent-proposed mechanisms tend to cite numbers that are
plausible but unreproduced, source-free, or pinned to a workload that is not
ours. The pipeline treats each numeric claim as a hypothesis to be falsified,
ties it to a version-pinned source, and refuses to admit it to the design tree
until it survives an adversarial re-check and an offline gate. The reproduction
discipline is borrowed directly from the load-aware-caching literature: a
mechanism enters only after reproduced measurement on independent traces
[lrb-model-and-traffic-reduction]. The broader ML-for-caching framing
([lecar-regret-min-18x], [cacheus-experts], [wtinylfu-caffeine-sketch]) is
adapted, not borrowed wholesale: agent proposals may cite it, but every borrowed
number is re-derived against our own fixtures before it counts (see
"Harness-blocked: the trace-replay reproduction bar").

Per the tenet order (Compatible > Efficient > Simple > Scalable > AI-Driven),
this pipeline is dev-time infrastructure. It is independent of the runtime
advisor (#88's AI-Driven engine feature): no model runs on the request path, and
nothing here ships in the binary.

## The loop

```
prior-art questions
|
v
[1] agent fan-out mining ........ one agent per source/dimension
| -> draft claims with version-pinned sources
v
[2] adversarial verifier ........ independent, refute-by-default
| -> re-checks load-bearing claims vs primary
| sources; verdict per claim
v
claims.yaml (descriptive source of truth, per-claim verification block)
|
v
[3] offline citation/uniqueness gate ... scripts/ci/check-prior-art-claims.sh
| (already live, hard gate in CI)
v
[4] human PR review ............. final authority; no agent auto-merge path
```

### 1. Agent fan-out mining

Research agents fan out, one per source or research dimension, and mine primary
sources (papers, release notes, source code, benchmarks) into draft claims. Each
draft claim is recorded in [`prior-art/claims.yaml`](prior-art/claims.yaml) with
a kebab-case `id`, the `system` and pinned `version` it describes, the `claim`
prose, the measured `value`, a `source_url`, an `accessed_date`, and a
`confidence` with a `confidence_reason`. Claims are strictly **descriptive**:
they record what an upstream system does at a pinned version, never what
IronCache should do. Prescriptive IronCache decisions live in the design issues
and the ADRs, never in the claims file.

### 2. Independent adversarial verifier

A second, independent pass re-checks the load-bearing and lower-confidence
claims with a refute-by-default stance: the verifier tries to break each claim
against a fresh fetch of the primary source rather than confirm the miner's
reading. The verdict and evidence are recorded in each claim's `verification`
block (`confirmed` / `corrected` / `refuted` / `uncertain` / `self-verified`),
with a `best_source_url` and a `note` quoting the supporting text. Where the
verdict is `corrected`, `value` becomes the corrected value and the miner's
original reading is preserved under `original_value`. The same fan-out plus
adversarial-confirmation method was applied to the whole issue tree in the
pre-implementation audit; see [`AUDIT.md`](AUDIT.md) (re-verified claims carry
`verification.reaudited`).

The verifier and the miner are run as distinct passes so the check is genuinely
independent rather than the same agent grading its own homework.

### 3. Offline citation and uniqueness gate (live)

[`scripts/ci/check-prior-art-claims.sh`](../scripts/ci/check-prior-art-claims.sh)
is the hard, offline, deterministic gate and runs on every docs PR (workflow
[`docs.yml`](../.github/workflows/docs.yml)). It asserts:

- every claim `id` in `claims.yaml` is unique; and
- every bracketed `[id]` citation in the prose (PRIOR_ART, CHARTER, GLOSSARY,
INVARIANTS, NON_GOALS, THREAT_MODEL, every `docs/design/*.md`, and every
`docs/experiments/*.md`) resolves to a claim that exists in `claims.yaml`.

It does **not** re-fetch sources: upstream value drift is caught by
`accessed_date` going stale and by periodic re-verification, not by this script.
Its ADR sibling [`check-adr-index.sh`](../scripts/ci/check-adr-index.sh) applies
the same citation rule to ADR records. Together they guarantee the design tree
never cites a claim id that does not exist and never silently duplicates one.
This runbook is a process doc, not a design spec, so it is not in either
script's scan set; it still cites only ids that exist in `claims.yaml`.

### 4. Human merge gate (final authority)

A human PR review is the documented final authority over all agent output. There
is no agent auto-merge path: green CI is necessary but never sufficient. A
reviewer confirms the claim's source supports the stated value, that the
mechanism it backs respects the tenet order, and that any decision it settles is
recorded as an ADR per [adr/README.md](adr/README.md) and #4. A failed
verification quarantines the claim and blocks the mechanism that depends on it;
unsourced numbers are never merged.

## Harness-blocked: the trace-replay reproduction bar

The #94 design also specifies a stronger bar than citation hygiene: a mechanism
should enter the design tree only after its numbers are **reproduced** by
deterministic trace replay on N independent traces, banded
[lrb-model-and-traffic-reduction]. That bar is **deferred** today because it is
harness-blocked: IronCache has no engine code and no built benchmark, test, or
oracle harness yet. The harness is *designed* (the benchmark and memory-model
harness in #8; the conformance/differential/DST stack in #95 and the Valkey
differential oracle in #96), but not *built*, and the Belady oracle that the
reproduction bar needs is still open work (#93). There is nothing to replay
traces against yet.

Until that harness is built, the live pipeline enforces the two gates it *can*
enforce offline: version-pinned sourcing plus the independent adversarial
re-check (steps 2 and 3 above). Numeric claims are admitted as **cited and
adversarially verified**, explicitly **not** as **reproduced**. When the harness
lands, the reproduction bar attaches as an additional, blocking gate on numeric
claims (a `verification.reproduced` verdict over the trace corpus), and this
runbook will be updated to make trace replay a merge requirement rather than a
deferred goal. The harness-blocked experiments are catalogued with the rest of
the deferred research design.

## Provenance summary

- `claims.yaml` is the single descriptive source of truth; prose agrees with it,
and the file wins on any disagreement.
- Every load-bearing number in the prose carries an `[id]` into `claims.yaml`.
- Mining is adversarially verified; verification verdicts are recorded per
claim; the offline gate enforces citation existence and id uniqueness in CI.
- Humans hold the merge gate; agents never auto-merge.
- Trace-replay numeric reproduction is specified (#94) but deferred until the
harness (designed in #8, #95/#96; Belady oracle still open in #93) is built.

## References

- #94: the AI-assisted pipeline design this runbook realizes (decomposed from
#88; the AI-Driven concern tracked into #127).
- #4: ADR index, decision register, and the citation/decision governance these
gates plug into.
- #8, #95, #96, #93: the harness work that unblocks the trace-replay bar.
- [AUDIT.md](AUDIT.md): the pre-implementation application of this same
fan-out-then-adversarial-confirm method to the whole issue tree.
13 changes: 10 additions & 3 deletions docs/adr/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,13 @@ runs in CI and is offline and deterministic. It fails when:
- an ADR file is not listed in `INDEX.md`.

Binding a *closed* `[DECISION]` issue to the existence of its ADR (issue #4
rule 6) requires the GitHub API and is tracked in #166 as a separate
(non-blocking) check rather than gating every offline docs build; the offline
gate above keeps the records themselves honest in the meantime.
rule 6) requires the GitHub API, so it lives in a separate, non-blocking job:
[`../../scripts/ci/check-adr-decision-binding.sh`](../../scripts/ci/check-adr-decision-binding.sh),
run by the [`adr-governance`](../../.github/workflows/adr-governance.yml)
workflow on a weekly schedule, on demand, and on PRs that touch the binding
files. It lists closed issues labeled `decision-needed` and reconciles them
against ADR `Issue:` headers in both directions: a closed decision with no ADR
that names it, and an ADR `Issue:` header pointing at a missing, still-open, or
unlabeled issue. That job is advisory and reports to the run summary; it never
fails the build. The offline gate above remains the hard gate and keeps the ADR
records themselves honest.
Loading
Loading