From 2b5dca69b77cb90c671bdd6d2d2bc2730119506f Mon Sep 17 00:00:00 2001 From: Zeke Date: Sat, 13 Jun 2026 22:16:51 -0700 Subject: [PATCH 1/2] ci+docs: ADR-decision binding gate + AI-pipeline runbook (closes #166, closes #127) #166: a NON-BLOCKING online ADR<->decision binding check. scripts/ci/ check-adr-decision-binding.sh (POSIX sh, gh) reconciles closed decision-needed issues against ADR Issue: headers in both directions; .github/workflows/ adr-governance.yml runs it weekly + on-demand + on PRs touching the binding files, read-only perms, exits 0 on mismatch (advisory), exits 2 only on env failure. The offline check-adr-index.sh remains the hard gate. docs/adr/README.md updated to point at the live job instead of the 'tracked in #166' note. Completes #4 rule 6. #127: docs/AI_PIPELINE.md, the runbook for the AI-assisted design pipeline (agent fan-out mining -> independent adversarial verifier -> offline citation/ uniqueness gate -> human merge gate), with the trace-replay reproduction bar documented as harness-blocked/deferred. Absorbs #94. CI passes. Signed-off-by: Zeke --- .github/workflows/adr-governance.yml | 50 +++++++ docs/AI_PIPELINE.md | 159 +++++++++++++++++++++ docs/adr/README.md | 13 +- scripts/ci/check-adr-decision-binding.sh | 170 +++++++++++++++++++++++ 4 files changed, 389 insertions(+), 3 deletions(-) create mode 100644 .github/workflows/adr-governance.yml create mode 100644 docs/AI_PIPELINE.md create mode 100755 scripts/ci/check-adr-decision-binding.sh diff --git a/.github/workflows/adr-governance.yml b/.github/workflows/adr-governance.yml new file mode 100644 index 0000000..433880d --- /dev/null +++ b/.github/workflows/adr-governance.yml @@ -0,0 +1,50 @@ +# SPDX-License-Identifier: MIT OR Apache-2.0 +# +# ADR governance (online, NON-BLOCKING). +# +# Completes issue #4 rule 6 "bind a closed [DECISION] issue to the existence of +# its ADR" without making the offline docs build depend on the GitHub API. The +# offline, deterministic gate (scripts/ci/check-adr-index.sh, run by docs.yml) +# remains the hard gate on every PR; this job is advisory: it queries the repo's +# own issues via the Actions GITHUB_TOKEN and REPORTS closed decision-needed +# issues that have no matching ADR, and ADR Issue: headers that point at a +# missing, still-open, or mislabeled issue. It never fails the build. +# +# Triggers: a weekly schedule (so the decision trail is reconciled as decisions +# close, even between docs PRs), on demand, and a non-blocking pull_request +# trigger scoped to the binding files so a regression in the report shows on the +# PR that caused it. See docs/adr/README.md. +name: adr-governance + +on: + schedule: + # Weekly, Monday 06:17 UTC. Off the hour to avoid the scheduler rush. + - cron: "17 6 * * 1" + workflow_dispatch: {} + # Re-run when the binding logic or the ADR records themselves change, so a + # regression in the report is visible on the PR that introduced it (still + # non-blocking: the script exits 0 on any mismatch, and main requires no + # status check). + pull_request: + paths: + - "docs/adr/**" + - "scripts/ci/check-adr-decision-binding.sh" + - ".github/workflows/adr-governance.yml" + +# Read-only: list issues and read the checked-out ADR files. No write scopes. +permissions: + contents: read + issues: read + +jobs: + adr-decision-binding: + name: closed decision-needed issues <-> ADR Issue headers + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - name: Reconcile closed decisions against ADRs (report only) + # gh reads GH_TOKEN; the script exits 0 even on mismatch (advisory). + env: + GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} + GH_REPO: ${{ github.repository }} + run: sh scripts/ci/check-adr-decision-binding.sh diff --git a/docs/AI_PIPELINE.md b/docs/AI_PIPELINE.md new file mode 100644 index 0000000..dcaf170 --- /dev/null +++ b/docs/AI_PIPELINE.md @@ -0,0 +1,159 @@ +# AI-assisted design pipeline + +IronCache is designed by an LLM agent pipeline, not authored freehand. This +runbook documents that pipeline: the loop that mined the prior art into +[`prior-art/claims.yaml`](prior-art/claims.yaml) and produced the +pre-implementation [`AUDIT.md`](AUDIT.md), and the gates that keep every numeric +or version-specific assertion in the design tree sourced, unique, and +human-approved. + +This is a process and governance document. It sits beside the +[charter](CHARTER.md) and the [ADR governance](adr/README.md), not under +`docs/design/`: it describes how the design is produced, not a subsystem to be +implemented. It realizes the design ratified in #94 (decomposed from #88), the +AI-driven concern of which is tracked as #127. + +## Why a pipeline, not freehand authoring + +Cache design lives or dies on numbers: hit ratio, tail latency, bytes saved, +eviction quality. Agent-proposed mechanisms tend to cite numbers that are +plausible but unreproduced, source-free, or pinned to a workload that is not +ours. The pipeline treats each numeric claim as a hypothesis to be falsified, +ties it to a version-pinned source, and refuses to admit it to the design tree +until it survives an adversarial re-check and an offline gate. The reproduction +discipline is borrowed directly from the load-aware-caching literature: a +mechanism enters only after reproduced measurement on independent traces +[lrb-model-and-traffic-reduction]. The broader ML-for-caching framing +([lecar-regret-min-18x], [cacheus-experts], [wtinylfu-caffeine-sketch]) is +adapted, not borrowed wholesale: agent proposals may cite it, but every borrowed +number is re-derived against our own fixtures before it counts (see +"Harness-blocked: the trace-replay reproduction bar"). + +Per the tenet order (Compatible > Efficient > Simple > Scalable > AI-Driven), +this pipeline is dev-time infrastructure. It is independent of the runtime +advisor (#88's AI-Driven engine feature): no model runs on the request path, and +nothing here ships in the binary. + +## The loop + +``` + prior-art questions + | + v + [1] agent fan-out mining ........ one agent per source/dimension + | -> draft claims with version-pinned sources + v + [2] adversarial verifier ........ independent, refute-by-default + | -> re-checks load-bearing claims vs primary + | sources; verdict per claim + v + claims.yaml (descriptive source of truth, per-claim verification block) + | + v + [3] offline citation/uniqueness gate ... scripts/ci/check-prior-art-claims.sh + | (already live, hard gate in CI) + v + [4] human PR review ............. final authority; no agent auto-merge path +``` + +### 1. Agent fan-out mining + +Research agents fan out, one per source or research dimension, and mine primary +sources (papers, release notes, source code, benchmarks) into draft claims. Each +draft claim is recorded in [`prior-art/claims.yaml`](prior-art/claims.yaml) with +a kebab-case `id`, the `system` and pinned `version` it describes, the `claim` +prose, the measured `value`, a `source_url`, an `accessed_date`, and a +`confidence` with a `confidence_reason`. Claims are strictly **descriptive**: +they record what an upstream system does at a pinned version, never what +IronCache should do. Prescriptive IronCache decisions live in the design issues +and the ADRs, never in the claims file. + +### 2. Independent adversarial verifier + +A second, independent pass re-checks the load-bearing and lower-confidence +claims with a refute-by-default stance: the verifier tries to break each claim +against a fresh fetch of the primary source rather than confirm the miner's +reading. The verdict and evidence are recorded in each claim's `verification` +block (`confirmed` / `corrected` / `refuted` / `uncertain` / `self-verified`), +with a `best_source_url` and a `note` quoting the supporting text. Where the +verdict is `corrected`, `value` becomes the corrected value and the miner's +original reading is preserved under `original_value`. The same fan-out plus +adversarial-confirmation method was applied to the whole issue tree in the +pre-implementation audit; see [`AUDIT.md`](AUDIT.md) (re-verified claims carry +`verification.reaudited`). + +The verifier and the miner are run as distinct passes so the check is genuinely +independent rather than the same agent grading its own homework. + +### 3. Offline citation and uniqueness gate (live) + +[`scripts/ci/check-prior-art-claims.sh`](../scripts/ci/check-prior-art-claims.sh) +is the hard, offline, deterministic gate and runs on every docs PR (workflow +[`docs.yml`](../.github/workflows/docs.yml)). It asserts: + +- every claim `id` in `claims.yaml` is unique; and +- every bracketed `[id]` citation in the prose (PRIOR_ART, CHARTER, GLOSSARY, + INVARIANTS, NON_GOALS, THREAT_MODEL, every `docs/design/*.md`, and every + `docs/experiments/*.md`) resolves to a claim that exists in `claims.yaml`. + +It does **not** re-fetch sources: upstream value drift is caught by +`accessed_date` going stale and by periodic re-verification, not by this script. +Its ADR sibling [`check-adr-index.sh`](../scripts/ci/check-adr-index.sh) applies +the same citation rule to ADR records. Together they guarantee the design tree +never cites a claim id that does not exist and never silently duplicates one. +This runbook is a process doc, not a design spec, so it is not in either +script's scan set; it still cites only ids that exist in `claims.yaml`. + +### 4. Human merge gate (final authority) + +A human PR review is the documented final authority over all agent output. There +is no agent auto-merge path: green CI is necessary but never sufficient. A +reviewer confirms the claim's source supports the stated value, that the +mechanism it backs respects the tenet order, and that any decision it settles is +recorded as an ADR per [adr/README.md](adr/README.md) and #4. A failed +verification quarantines the claim and blocks the mechanism that depends on it; +unsourced numbers are never merged. + +## Harness-blocked: the trace-replay reproduction bar + +The #94 design also specifies a stronger bar than citation hygiene: a mechanism +should enter the design tree only after its numbers are **reproduced** by +deterministic trace replay on N independent traces, banded +[lrb-model-and-traffic-reduction]. That bar is **deferred** today because it is +harness-blocked: IronCache has no engine code and no built benchmark, test, or +oracle harness yet. The harness is *designed* (the benchmark and memory-model +harness in #8; the conformance/differential/DST stack in #95 and the Valkey +differential oracle in #96), but not *built*, and the Belady oracle that the +reproduction bar needs is still open work (#93). There is nothing to replay +traces against yet. + +Until that harness is built, the live pipeline enforces the two gates it *can* +enforce offline: version-pinned sourcing plus the independent adversarial +re-check (steps 2 and 3 above). Numeric claims are admitted as **cited and +adversarially verified**, explicitly **not** as **reproduced**. When the harness +lands, the reproduction bar attaches as an additional, blocking gate on numeric +claims (a `verification.reproduced` verdict over the trace corpus), and this +runbook will be updated to make trace replay a merge requirement rather than a +deferred goal. The harness-blocked experiments are catalogued with the rest of +the deferred research design. + +## Provenance summary + +- `claims.yaml` is the single descriptive source of truth; prose agrees with it, + and the file wins on any disagreement. +- Every load-bearing number in the prose carries an `[id]` into `claims.yaml`. +- Mining is adversarially verified; verification verdicts are recorded per + claim; the offline gate enforces citation existence and id uniqueness in CI. +- Humans hold the merge gate; agents never auto-merge. +- Trace-replay numeric reproduction is specified (#94) but deferred until the + harness (designed in #8, #95/#96; Belady oracle still open in #93) is built. + +## References + +- #94: the AI-assisted pipeline design this runbook realizes (decomposed from + #88; the AI-Driven concern tracked into #127). +- #4: ADR index, decision register, and the citation/decision governance these + gates plug into. +- #8, #95, #96, #93: the harness work that unblocks the trace-replay bar. +- [AUDIT.md](AUDIT.md): the pre-implementation application of this same + fan-out-then-adversarial-confirm method to the whole issue tree. diff --git a/docs/adr/README.md b/docs/adr/README.md index c5a3634..50af9e6 100644 --- a/docs/adr/README.md +++ b/docs/adr/README.md @@ -62,6 +62,13 @@ runs in CI and is offline and deterministic. It fails when: - an ADR file is not listed in `INDEX.md`. Binding a *closed* `[DECISION]` issue to the existence of its ADR (issue #4 -rule 6) requires the GitHub API and is tracked in #166 as a separate -(non-blocking) check rather than gating every offline docs build; the offline -gate above keeps the records themselves honest in the meantime. +rule 6) requires the GitHub API, so it lives in a separate, non-blocking job: +[`../../scripts/ci/check-adr-decision-binding.sh`](../../scripts/ci/check-adr-decision-binding.sh), +run by the [`adr-governance`](../../.github/workflows/adr-governance.yml) +workflow on a weekly schedule, on demand, and on PRs that touch the binding +files. It lists closed issues labeled `decision-needed` and reconciles them +against ADR `Issue:` headers in both directions: a closed decision with no ADR +that names it, and an ADR `Issue:` header pointing at a missing, still-open, or +unlabeled issue. That job is advisory and reports to the run summary; it never +fails the build. The offline gate above remains the hard gate and keeps the ADR +records themselves honest. diff --git a/scripts/ci/check-adr-decision-binding.sh b/scripts/ci/check-adr-decision-binding.sh new file mode 100755 index 0000000..bfa0e8a --- /dev/null +++ b/scripts/ci/check-adr-decision-binding.sh @@ -0,0 +1,170 @@ +#!/usr/bin/env sh +# SPDX-License-Identifier: MIT OR Apache-2.0 +# +# ONLINE, NON-BLOCKING reconciliation of the decision trail (issue #4 rule 6, +# split out as #166). The offline sibling scripts/ci/check-adr-index.sh is the +# hard gate on ADR records, citations, supersession, and INDEX listing; it +# cannot bind a *closed* [DECISION] issue to the existence of its ADR without +# the GitHub API, so that binding lives here. +# +# This script lists CLOSED issues labeled 'decision-needed' and reconciles them +# against the 'Issue:' back-link headers of the ADR records, in BOTH directions: +# +# A. a closed decision-needed issue with NO ADR whose 'Issue:' header names it +# (a decision was made and closed but never recorded); +# B. an ADR 'Issue:' header that names an issue which does not exist, is still +# OPEN, or is not labeled 'decision-needed' (the ADR's back-link is stale). +# +# It REPORTS mismatches and exits 0 regardless: this is an advisory governance +# signal for human follow-up, not a merge gate. The only failure exits are +# environment problems (no gh, no token, API unreachable), so a silent +# misconfiguration does not read as "all clear". +# +# Requirements: POSIX sh, gh (authenticated via GH_TOKEN/GITHUB_TOKEN), and the +# repo checked out. No network beyond gh against this repository's own API. +set -eu + +LABEL="decision-needed" + +# Repo root = two levels up from scripts/ci. Resolve ADR dir from there. +SCRIPT_DIR=$(CDPATH= cd -- "$(dirname -- "$0")" && pwd) +ROOT=$(CDPATH= cd -- "$SCRIPT_DIR/../.." && pwd) +ADR_DIR="$ROOT/docs/adr" + +# GH_REPO lets gh target the right repo in Actions; fall back to the origin +# remote inferred by gh when run locally. +REPO_ARG="" +if [ "${GH_REPO:-}" != "" ]; then + REPO_ARG="--repo $GH_REPO" +fi + +if ! command -v gh >/dev/null 2>&1; then + echo "ERROR: gh (GitHub CLI) not found; this online check needs it" >&2 + exit 2 +fi +if [ "${GH_TOKEN:-}" = "" ] && [ "${GITHUB_TOKEN:-}" = "" ]; then + echo "ERROR: no GH_TOKEN/GITHUB_TOKEN in the environment for gh" >&2 + exit 2 +fi + +WORK=$(mktemp -d) +trap 'rm -rf "$WORK"' EXIT INT TERM + +closed="$WORK/closed_decisions" # closed decision-needed issue numbers +adr_issues="$WORK/adr_issues" # issue numbers named by ADR Issue: headers +adr_map="$WORK/adr_map" # " " pairs for messages + +# --- 1. closed decision-needed issues (paginated) ------------------------- +# --jq emits one number per line; --limit caps the page set (well above the +# current count). gh paginates internally to satisfy --limit. +if ! gh issue list $REPO_ARG --state closed --label "$LABEL" --limit 1000 \ + --json number --jq '.[].number' >"$closed" 2>"$WORK/err1"; then + echo "ERROR: gh failed listing closed '$LABEL' issues:" >&2 + cat "$WORK/err1" >&2 + exit 2 +fi +sort -n -u "$closed" -o "$closed" + +# --- 2. issue numbers named by ADR 'Issue:' headers ----------------------- +# Header forms seen in the tree: "Issue: #41" and "Issue: #82, #119". Skip the +# template (#N is not a number). Record both a flat number list and the +# number->file map for human-readable messages. +: >"$adr_issues" +: >"$adr_map" +for f in "$ADR_DIR"/[0-9][0-9][0-9][0-9]-*.md; do + [ -e "$f" ] || continue + base=$(basename "$f") + [ "$base" = "0000-template.md" ] && continue + # First 'Issue:' line of the record; pull every #NNN token out of it. + hdr=$(grep -m1 -E '^Issue:' "$f" 2>/dev/null || true) + [ -n "$hdr" ] || continue + nums=$(printf '%s\n' "$hdr" | grep -oE '#[0-9]+' | tr -d '#') + for n in $nums; do + printf '%s\n' "$n" >>"$adr_issues" + printf '%s %s\n' "$n" "$base" >>"$adr_map" + done +done +sort -n -u "$adr_issues" -o "$adr_issues" + +# --- 3. direction A: closed decision with no ADR ------------------------- +# In $closed but not in $adr_issues. +a_miss="$WORK/a_miss" +comm -23 "$closed" "$adr_issues" >"$a_miss" + +# --- 4. direction B: ADR header names a bad issue ------------------------ +# For each issue named by an ADR, confirm it exists, is CLOSED, and carries the +# decision-needed label. Anything else is a stale ADR back-link. +b_miss="$WORK/b_miss" +: >"$b_miss" +while IFS= read -r n; do + [ -n "$n" ] || continue + # One API call per ADR-referenced issue; the set is small (one per ADR). + meta=$(gh issue view "$n" $REPO_ARG --json state,labels \ + --jq '.state + "|" + ([.labels[].name] | join(","))' 2>/dev/null || true) + files=$(awk -v k="$n" '$1==k{printf "%s ", $2}' "$adr_map") + if [ -z "$meta" ]; then + printf '%s|MISSING|%s\n' "$n" "$files" >>"$b_miss" + continue + fi + state=${meta%%|*} + labels=${meta#*|} + if [ "$state" != "CLOSED" ] && [ "$state" != "closed" ]; then + printf '%s|OPEN|%s\n' "$n" "$files" >>"$b_miss" + elif ! printf '%s' "$labels" | tr ',' '\n' | grep -qx "$LABEL"; then + printf '%s|UNLABELED|%s\n' "$n" "$files" >>"$b_miss" + fi +done <"$adr_issues" + +# --- 5. report (Step Summary if present, else stdout) --------------------- +out() { + printf '%s\n' "$1" + if [ "${GITHUB_STEP_SUMMARY:-}" != "" ]; then + printf '%s\n' "$1" >>"$GITHUB_STEP_SUMMARY" + fi +} + +# Count non-empty lines. wc -l is avoided for empty-file edge cases under set -e. +n_closed=$(grep -c . "$closed" 2>/dev/null || true); n_closed=${n_closed:-0} +n_a=$(grep -c . "$a_miss" 2>/dev/null || true); n_a=${n_a:-0} +n_b=$(grep -c . "$b_miss" 2>/dev/null || true); n_b=${n_b:-0} + +out "## ADR <-> decision binding (advisory, non-blocking)" +out "" +out "Closed \`$LABEL\` issues scanned: $n_closed" +out "" + +if [ "$n_a" -eq 0 ]; then + out "### A. Closed decisions with no ADR: none" +else + out "### A. Closed \`$LABEL\` issues with no ADR \`Issue:\` header naming them ($n_a)" + while IFS= read -r n; do + [ -n "$n" ] && out "- #$n closed but no ADR records it" + done <"$a_miss" +fi +out "" + +if [ "$n_b" -eq 0 ]; then + out "### B. ADR Issue: headers that are stale: none" +else + out "### B. ADR \`Issue:\` headers pointing at a missing/open/unlabeled issue ($n_b)" + while IFS='|' read -r n why files; do + [ -n "$n" ] || continue + case "$why" in + MISSING) out "- #$n does not exist (referenced by: ${files})" ;; + OPEN) out "- #$n is still OPEN (referenced by: ${files})" ;; + UNLABELED) out "- #$n is not labeled \`$LABEL\` (referenced by: ${files})" ;; + *) out "- #$n ($why) (referenced by: ${files})" ;; + esac + done <"$b_miss" +fi +out "" + +total=$((n_a + n_b)) +if [ "$total" -eq 0 ]; then + out "OK: every closed decision has an ADR and every ADR back-link is valid." +else + out "NOTE: $total mismatch(es) above are advisory. The offline gate (check-adr-index.sh) still passed if it did; resolve these by adding the missing ADR or fixing the ADR \`Issue:\` header. This job does not fail the build." +fi + +# Advisory: always succeed. Environment failures exited 2 earlier. +exit 0 From 866a2ef0172f74d92a290e8c8e756e3af69cc386 Mon Sep 17 00:00:00 2001 From: Zeke Date: Sat, 13 Jun 2026 22:21:41 -0700 Subject: [PATCH 2/2] fix: lexical sort for comm in the ADR-decision binding check GNU comm (CI ubuntu) aborts with 'not in sorted order' when fed numerically- sorted input; BSD comm (macOS) tolerates it, so the job passed locally but failed in Actions. Sort lexically under LC_ALL=C so sort and comm agree on every runner. Verified exit 0 under both LC_ALL=C sh and dash; the advisory report is unchanged. Signed-off-by: Zeke --- scripts/ci/check-adr-decision-binding.sh | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/scripts/ci/check-adr-decision-binding.sh b/scripts/ci/check-adr-decision-binding.sh index bfa0e8a..3762a8b 100755 --- a/scripts/ci/check-adr-decision-binding.sh +++ b/scripts/ci/check-adr-decision-binding.sh @@ -23,6 +23,10 @@ # Requirements: POSIX sh, gh (authenticated via GH_TOKEN/GITHUB_TOKEN), and the # repo checked out. No network beyond gh against this repository's own API. set -eu +# Stable byte collation so `sort` and `comm` agree: GNU comm (CI) aborts with +# "not in sorted order" if fed numerically-sorted input, while BSD comm (macOS) +# tolerates it. Sort lexically under LC_ALL=C and comm is happy on every runner. +export LC_ALL=C LABEL="decision-needed" @@ -63,7 +67,7 @@ if ! gh issue list $REPO_ARG --state closed --label "$LABEL" --limit 1000 \ cat "$WORK/err1" >&2 exit 2 fi -sort -n -u "$closed" -o "$closed" +sort -u "$closed" -o "$closed" # --- 2. issue numbers named by ADR 'Issue:' headers ----------------------- # Header forms seen in the tree: "Issue: #41" and "Issue: #82, #119". Skip the @@ -84,7 +88,7 @@ for f in "$ADR_DIR"/[0-9][0-9][0-9][0-9]-*.md; do printf '%s %s\n' "$n" "$base" >>"$adr_map" done done -sort -n -u "$adr_issues" -o "$adr_issues" +sort -u "$adr_issues" -o "$adr_issues" # --- 3. direction A: closed decision with no ADR ------------------------- # In $closed but not in $adr_issues.