From 006fe59977868f541058a7944a92b69177830f18 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jose=20Villase=C3=B1or=20Montfort?= <195970+montfort@users.noreply.github.com> Date: Tue, 16 Jun 2026 12:04:40 -0600 Subject: [PATCH] =?UTF-8?q?fix(framework):=20audit-prompt=20hardening=20?= =?UTF-8?q?=E2=80=94=20enforce=20independence=20+=20surface=20output=20con?= =?UTF-8?q?tract=20(#261)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit A real 5-auditor cross-family cycle exposed two prompt-design gaps that let auditors drift from the intended discipline (neither is a CLI mechanics bug). Problem A — auditor independence was not enforced. When audits run sequentially, each report lands in the shared .straymark/audits// dir, and a later auditor can read the earlier ones. One real auditor produced a meta-consolidation of the four prior reports instead of an independent audit — its "convergence" was copied, not independent, which silently inflates both its rating and the consolidated review's confidence. - audit-prompt.md (EN + ES): ABSOLUTE RULE now forbids reading/grepping/ referencing any other report-*.md; "Your role" + "What you must NOT do" reworded (a sibling report may already be on disk — do not open it). - straymark-audit-review (4 runtime copies): contamination guard that detects reports referencing siblings and excludes them from convergence/dedup + rating. A prompt rule is weak; verify at review time too. Problem B — the output contract was buried at the end of a ~3,900-line resolved prompt, and the auditor patterned its frontmatter after the embedded AILOGs (different schema), producing off-contract reports. - New "Output contract (read this first)" block right after the ABSOLUTE RULE: required frontmatter + the four finding categories + an explicit "DELIBERATELY DIFFERENT from the AILOG/AIDEC frontmatter" disambiguation, plus a Frontmatter note beside the embedded AILOGs. (Categories were already defined before the output format, so no reorder was needed.) All audit tests green (skill 12, template 9, charter_audit 21, audit 9). Closes #261. Co-Authored-By: Claude Opus 4.8 (1M context) --- CHANGELOG.md | 23 +++++++++++++++++++ .../workflows/straymark-audit-review.md | 2 ++ .../skills/straymark-audit-review/SKILL.md | 2 ++ .../skills/straymark-audit-review/SKILL.md | 2 ++ .../skills/straymark-audit-review/SKILL.md | 2 ++ dist/.straymark/audit-prompts/audit-prompt.md | 20 ++++++++++++++-- .../audit-prompts/i18n/es/audit-prompt.md | 20 ++++++++++++++-- 7 files changed, 67 insertions(+), 4 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 53cf9164..c2b47795 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,29 @@ and this project uses [independent versioning](README.md#versioning) for Framewo --- +## [Unreleased] + +### Changed (Framework — audit prompt hardening, #261) + +- **Auditor independence is now enforced in the audit prompt** (`audit-prompts/audit-prompt.md`, + EN + ES). The ABSOLUTE RULE forbids reading, grepping, or referencing any other auditor's + `report-*.md` under `.straymark/audits/` — for this Charter or any other — because cross-model + convergence is signal only when each auditor reached it independently. The "Your role" and + "What you must NOT do" sections were reworded to match (a sibling report may already be on + disk; do not open it). +- **The output contract is surfaced near the top** of the prompt (new "Output contract (read + this first)" section, right after the ABSOLUTE RULE) instead of only at the end of a long + prompt: required report frontmatter, the four finding categories, and an explicit warning that + the report frontmatter is **deliberately different** from the embedded AILOG/AIDEC frontmatter + the auditor reads (the mimicry that drifted real reports off-schema). A "Frontmatter note" was + also added beside the embedded AILOGs. +- **`straymark-audit-review`** (skill, all runtimes) gained a **contamination guard**: it now + scans each report for signs it read its siblings (references to another `report-*.md`, a + cross-auditor comparison table, "I verified all N findings from the prior audit") and + excludes contaminated reports from the convergence/dedup math and the auditor rating. + +--- + ## CLI 3.25.0 / Core 0.5.0 — Architecture Plan track A1 (EXPERIMENTAL) The textual + authoring half of Loom's Architecture Plan view (Spec 002) — the operator's "where are we?" answered from the terminal, ahead of the visual overlay (A2). All surfaces are **EXPERIMENTAL (Loom v0)** and may change without a deprecation cycle. diff --git a/dist/.agent/workflows/straymark-audit-review.md b/dist/.agent/workflows/straymark-audit-review.md index 7f52eb03..b415e3a8 100644 --- a/dist/.agent/workflows/straymark-audit-review.md +++ b/dist/.agent/workflows/straymark-audit-review.md @@ -46,6 +46,8 @@ For each `.straymark/audits//report-*.md`: Build a **master finding list** — every unique claim across all auditors, deduplicated when two auditors clearly describe the same thing. +**Independence check (contamination guard).** Before you trust any convergence between auditors, scan each report for signs it read the *other* auditors' reports instead of auditing independently: explicit references to another `report-*.md`, another auditor named by model, a "comparison table of auditors", or language like "I independently verified all N findings from the prior audit". A report that consolidates or cross-checks against its siblings is **contaminated** — its agreement is copied, not independent signal. Flag it, exclude it from the convergence/dedup math and from the auditor rating (step 5), and note the contamination in the review. The audit prompt forbids reading sibling reports, but a prompt rule is weak — verify it here. + ### 3. Verify every finding against actual code This is the substantive step. For EACH finding in the master list: diff --git a/dist/.claude/skills/straymark-audit-review/SKILL.md b/dist/.claude/skills/straymark-audit-review/SKILL.md index 813a4f12..276d3d90 100644 --- a/dist/.claude/skills/straymark-audit-review/SKILL.md +++ b/dist/.claude/skills/straymark-audit-review/SKILL.md @@ -48,6 +48,8 @@ For each `.straymark/audits//report-*.md`: Build a **master finding list** — every unique claim across all auditors, deduplicated when two auditors clearly describe the same thing. +**Independence check (contamination guard).** Before you trust any convergence between auditors, scan each report for signs it read the *other* auditors' reports instead of auditing independently: explicit references to another `report-*.md`, another auditor named by model, a "comparison table of auditors", or language like "I independently verified all N findings from the prior audit". A report that consolidates or cross-checks against its siblings is **contaminated** — its agreement is copied, not independent signal. Flag it, exclude it from the convergence/dedup math and from the auditor rating (step 5), and note the contamination in the review. The audit prompt forbids reading sibling reports, but a prompt rule is weak — verify it here. + ### 3. Verify every finding against actual code This is the substantive step. For EACH finding in the master list: diff --git a/dist/.codex/skills/straymark-audit-review/SKILL.md b/dist/.codex/skills/straymark-audit-review/SKILL.md index 681f2919..1410e5bb 100644 --- a/dist/.codex/skills/straymark-audit-review/SKILL.md +++ b/dist/.codex/skills/straymark-audit-review/SKILL.md @@ -47,6 +47,8 @@ For each `.straymark/audits//report-*.md`: Build a **master finding list** — every unique claim across all auditors, deduplicated when two auditors clearly describe the same thing. +**Independence check (contamination guard).** Before you trust any convergence between auditors, scan each report for signs it read the *other* auditors' reports instead of auditing independently: explicit references to another `report-*.md`, another auditor named by model, a "comparison table of auditors", or language like "I independently verified all N findings from the prior audit". A report that consolidates or cross-checks against its siblings is **contaminated** — its agreement is copied, not independent signal. Flag it, exclude it from the convergence/dedup math and from the auditor rating (step 5), and note the contamination in the review. The audit prompt forbids reading sibling reports, but a prompt rule is weak — verify it here. + ### 3. Verify every finding against actual code This is the substantive step. For EACH finding in the master list: diff --git a/dist/.gemini/skills/straymark-audit-review/SKILL.md b/dist/.gemini/skills/straymark-audit-review/SKILL.md index 681f2919..1410e5bb 100644 --- a/dist/.gemini/skills/straymark-audit-review/SKILL.md +++ b/dist/.gemini/skills/straymark-audit-review/SKILL.md @@ -47,6 +47,8 @@ For each `.straymark/audits//report-*.md`: Build a **master finding list** — every unique claim across all auditors, deduplicated when two auditors clearly describe the same thing. +**Independence check (contamination guard).** Before you trust any convergence between auditors, scan each report for signs it read the *other* auditors' reports instead of auditing independently: explicit references to another `report-*.md`, another auditor named by model, a "comparison table of auditors", or language like "I independently verified all N findings from the prior audit". A report that consolidates or cross-checks against its siblings is **contaminated** — its agreement is copied, not independent signal. Flag it, exclude it from the convergence/dedup math and from the auditor rating (step 5), and note the contamination in the review. The audit prompt forbids reading sibling reports, but a prompt rule is weak — verify it here. + ### 3. Verify every finding against actual code This is the substantive step. For EACH finding in the master list: diff --git a/dist/.straymark/audit-prompts/audit-prompt.md b/dist/.straymark/audit-prompts/audit-prompt.md index e20934b3..3d5633aa 100644 --- a/dist/.straymark/audit-prompts/audit-prompt.md +++ b/dist/.straymark/audit-prompts/audit-prompt.md @@ -56,6 +56,7 @@ Specifically, you are FORBIDDEN from: - Running code generators (`go generate`, `sqlc generate`, `wire`, `cargo build` with filesystem effects, `npm install`, etc.). - Applying "fixes" or "improvements" to the code, even if you believe they are correct. - Reformatting, renaming, or reorganizing existing files. +- Reading, opening, grepping, or referencing **another auditor's report** (`report-*.md`, `auditor-*.md`, or any scratch file) under `.straymark/audits/` — for this Charter or any other. Your audit must be **independent**: an audit that reads, cites, summarizes, or "cross-verifies against" another auditor's report is contaminated and will be discarded. Cross-auditor convergence is signal ONLY when each auditor reached it *without* seeing the others — a copied agreement is worthless. The ONLY thing you may write is your audit report file at the canonical path shown in **Output format** below. That is the ONLY file you have permission to create. @@ -67,11 +68,24 @@ If a test fails, **REPORT IT**. Do NOT repair it. --- +## Output contract (read this first) + +You are about to read a lot — the Charter, the originating AILOGs, the diff — before you reach the full **Output format** at the very end of this prompt. Lock these invariants in now, so the long read does not pull your report toward the wrong shape: + +1. **You write exactly one file**: your audit report, at the canonical path in **Output format**. Nothing else (see the ABSOLUTE RULE). +2. **Required report frontmatter** (validated against `{{schema_path}}`): `audit_role`, `auditor`, `charter_id`, `git_range`, `prompt_used`, `audited_at`, `findings_total`, `findings_by_category` — where `findings_by_category` has exactly the four keys `hallucination`, `implementation_gap`, `real_debt`, `false_positive`. `evidence_citations` and `audit_quality` are optional but recommended. +3. **The four finding categories** (`hallucination`, `implementation_gap`, `real_debt`, `false_positive`) are defined under **Finding categorization** below — *before* the point where you must assign them. +4. **⚠️ Your report frontmatter is DELIBERATELY DIFFERENT from the AILOG/AIDEC frontmatter you are about to read.** The AILOGs embedded below use keys like `id` / `status` / `confidence` / `risk_level` / `agent`. Your report does **not** — it uses the audit keys in (2). Do not mimic the surrounding documents; follow the schema. + +This is a summary. The authoritative, complete format (frontmatter + every body section) is in **Output format** at the end of this prompt — write your report against that, not against this digest. + +--- + ## Your role You are an independent code auditor. Your job is to verify that the implementation of a specific Charter fulfills the declared tasks and files, find real bugs in the code, and identify security risks. **You are NOT a cheerleader** — reporting "no issues" when bugs exist is worse than reporting a false positive. -StrayMark orchestrates cross-model audits: typically another auditor from a **different model family** is reviewing the same Charter in parallel. Your value lies in applying evidence discipline (citing `file:line` of files you actually opened) and severity calibration against the real config, not in cosmetically converging with the other auditor. +StrayMark orchestrates cross-model audits: another auditor from a **different model family** reviews the same Charter — sometimes alongside you, sometimes before you, so their `report-*.md` may already sit in `.straymark/audits/{{charter_id}}/`. **You must not read it** (see the ABSOLUTE RULE). Your value lies in *independent* evidence discipline (citing `file:line` of files you actually opened) and severity calibration against the real config — not in converging with, or even glancing at, another auditor's report. An agreement you reached by reading theirs is not convergence; it is contamination. --- @@ -105,6 +119,8 @@ The authoritative source of scope is the Charter file at `{{charter_path}}`. Rea These AILOGs document the rationale and the emergent risks during execution. **Read them before auditing** — the `R` risks already documented there are NOT new findings, they are consciously accepted trade-offs. +> **Frontmatter note.** These AILOGs carry their own frontmatter (`id`, `status`, `confidence`, `risk_level`, `agent`). That is **not** the shape of your audit report — your report uses the audit schema in **Output format**. Read the AILOGs for their content; do not let their frontmatter become a template for yours. + ``` {{ailog_paths}} ``` @@ -312,7 +328,7 @@ Does the implementation meet the closure criterion declared by `{{charter_id}}`? - **DO NOT declare Critical or High severity** without having verified that the real driver, flag, role, or deployment of the project triggers the bug. See Step 5. Declaring "critical regression" based on a stubbed component or a disabled flag invalidates the audit through false inflation. - **DO NOT report** that a file "does not exist" without having searched with the correct path (including naming-convention variants used by the project). - **DO NOT copy the file structure** without verifying content. -- **DO NOT ignore** the prior-audits folders (typically `audit/` or `.straymark/audits/`) — they contain prior analyses you are NOT meant to audit (they were audited already, or they are meta-evidence of the process, not project code). +- **DO NOT audit, and DO NOT read for cross-reference, the audit folders** (`audit/` or `.straymark/audits/`). They hold other auditors' reports and prior analyses — neither project code for you to audit, nor input to your findings. In particular, do not open this cycle's sibling `report-*.md` files (see the ABSOLUTE RULE on independence): your audit must stand on the code you read yourself. - **DO NOT run** destructive or generative commands. Only read/verify commands (`go vet`, `go build`, `go test`; `cargo check`, `cargo test --no-run`; `npm run lint`, `npm test`; or their equivalents). - **DO NOT consult external sources** beyond what is provided in this prompt and the repository files you open via tool call. The audit must be reproducible from the prompt + the repo + the available read tools. diff --git a/dist/.straymark/audit-prompts/i18n/es/audit-prompt.md b/dist/.straymark/audit-prompts/i18n/es/audit-prompt.md index 78ffb5e5..22c251ea 100644 --- a/dist/.straymark/audit-prompts/i18n/es/audit-prompt.md +++ b/dist/.straymark/audit-prompts/i18n/es/audit-prompt.md @@ -53,6 +53,7 @@ Concretamente, tienes PROHIBIDO: - Ejecutar generadores de código (`go generate`, `sqlc generate`, `wire`, `cargo build` con efectos en el filesystem, npm install, etc.). - Aplicar "fixes" o "mejoras" al código, aunque creas que son correctas. - Reformatear, renombrar o reorganizar archivos existentes. +- Leer, abrir, grepear o referenciar **el reporte de otro auditor** (`report-*.md`, `auditor-*.md`, o cualquier archivo de borrador) bajo `.straymark/audits/` — de este Charter o de cualquier otro. Tu auditoría debe ser **independiente**: una auditoría que lee, cita, resume o "contrasta contra" el reporte de otro auditor está contaminada y será descartada. La convergencia entre auditores es señal SOLO cuando cada uno llegó a ella *sin* ver a los demás — un acuerdo copiado no vale nada. Lo ÚNICO que puedes escribir es tu archivo de reporte de auditoría en la ruta canónica que aparece en la sección **Formato de salida** más abajo. Ese es el ÚNICO archivo que tienes permiso de crear. @@ -64,11 +65,24 @@ Si un test falla, **REPÓRTALO**. NO lo arregles. --- +## Contrato de salida (lee esto primero) + +Vas a leer mucho —el Charter, los AILOGs originantes, el diff— antes de llegar al **Formato de salida** completo al final de este prompt. Fija estos invariantes ahora, para que la lectura larga no arrastre tu reporte hacia la forma equivocada: + +1. **Escribes exactamente un archivo**: tu reporte de auditoría, en la ruta canónica del **Formato de salida**. Nada más (ver la REGLA ABSOLUTA). +2. **Frontmatter requerido del reporte** (validado contra `{{schema_path}}`): `audit_role`, `auditor`, `charter_id`, `git_range`, `prompt_used`, `audited_at`, `findings_total`, `findings_by_category` — donde `findings_by_category` lleva exactamente las cuatro claves `hallucination`, `implementation_gap`, `real_debt`, `false_positive`. `evidence_citations` y `audit_quality` son opcionales pero recomendados. +3. **Las cuatro categorías de hallazgo** (`hallucination`, `implementation_gap`, `real_debt`, `false_positive`) están definidas en **Categorización de hallazgos** más abajo — *antes* del punto donde debes asignarlas. +4. **⚠️ El frontmatter de tu reporte es DELIBERADAMENTE DISTINTO del frontmatter de los AILOG/AIDEC que vas a leer.** Los AILOGs embebidos abajo usan claves como `id` / `status` / `confidence` / `risk_level` / `agent`. Tu reporte **no** — usa las claves de auditoría de (2). No imites los documentos circundantes; sigue el esquema. + +Esto es un resumen. El formato autoritativo y completo (frontmatter + cada sección del cuerpo) está en **Formato de salida** al final de este prompt — escribe tu reporte contra ese, no contra este digesto. + +--- + ## Tu rol Eres un auditor de código independiente. Tu trabajo es verificar que la implementación de un Charter específico cumple con las tareas y archivos declarados, encontrar bugs reales en el código, e identificar riesgos de seguridad. **NO eres un cheerleader** — reportar "sin problemas" cuando existen bugs es peor que reportar un falso positivo. -StrayMark orquesta auditorías cross-modelo: típicamente otro auditor de una **familia de modelo distinta** está revisando el mismo Charter en paralelo. Tu valor está en aplicar la disciplina de evidencia (citar `archivo:línea` de archivos que abriste) y la calibración de severidad contra la config real, no en convergir cosméticamente con el otro auditor. +StrayMark orquesta auditorías cross-modelo: otro auditor de una **familia de modelo distinta** revisa el mismo Charter — a veces a la par tuya, a veces antes que tú, así que su `report-*.md` puede ya estar en `.straymark/audits/{{charter_id}}/`. **No debes leerlo** (ver la REGLA ABSOLUTA). Tu valor está en la disciplina de evidencia *independiente* (citar `archivo:línea` de archivos que abriste) y la calibración de severidad contra la config real — no en convergir con, ni siquiera echar un vistazo a, el reporte de otro auditor. Un acuerdo al que llegaste leyendo el suyo no es convergencia; es contaminación. --- @@ -102,6 +116,8 @@ La fuente autoritativa de alcance es el archivo del Charter en `{{charter_path}} Estos AILOGs documentan la racional y los riesgos emergentes durante la ejecución. **Léelos antes de auditar** — los R que ya están documentados ahí NO son hallazgos nuevos, son trade-offs aceptados conscientemente. +> **Nota sobre el frontmatter.** Estos AILOGs llevan su propio frontmatter (`id`, `status`, `confidence`, `risk_level`, `agent`). Esa **no** es la forma de tu reporte de auditoría — tu reporte usa el esquema de auditoría del **Formato de salida**. Lee los AILOGs por su contenido; no dejes que su frontmatter se vuelva la plantilla del tuyo. + ``` {{ailog_paths}} ``` @@ -309,7 +325,7 @@ Observaciones sobre código que NO es parte del alcance de este Charter pero que - **NO declares severidad Crítica o Alta** sin haber verificado que el driver, flag, rol o deployment real del proyecto dispara el bug. Ver Paso 5. Declarar "regresión crítica" basándote en un componente stub o un flag desactivado invalida la auditoría por falsa inflación. - **NO reportes** que un archivo "no existe" sin haber buscado con la ruta correcta (incluyendo variantes de naming convention del proyecto). - **NO copies la estructura de archivos** sin verificar contenido. -- **NO ignores** las carpetas de auditorías previas (típicamente `audit/` o `.straymark/audits/`) — contienen análisis previos que NO debes auditar (ya fueron auditadas o son meta-evidencia del proceso, no código del proyecto). +- **NO audites, y NO leas para contrastar, las carpetas de auditorías** (`audit/` o `.straymark/audits/`). Contienen reportes de otros auditores y análisis previos — ni código del proyecto que debas auditar, ni insumo para tus hallazgos. En particular, no abras los `report-*.md` hermanos de este ciclo (ver la REGLA ABSOLUTA sobre independencia): tu auditoría debe sostenerse sobre el código que leíste tú mismo. - **NO ejecutes** comandos destructivos o generativos. Solo comandos de lectura/verificación (`go vet`, `go build`, `go test`; `cargo check`, `cargo test --no-run`; `npm run lint`, `npm test`; o sus equivalentes). - **NO consultes fuentes externas** más allá de lo provisto en este prompt y de los archivos del repositorio que abras vía tool call. La auditoría debe ser reproducible desde el prompt + el repo + las herramientas de lectura disponibles.