Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,25 @@ Reference resolution was already slug-language-independent (it normalizes to the
`TYPE-YYYY-MM-DD-NNN` core id), and no extractor uses `\b`/whitespace ID regexes — so those two
risks from #263 needed no change.

### Changed (Framework — audit prompt hardening, #261)

- **Auditor independence is now enforced in the audit prompt** (`audit-prompts/audit-prompt.md`,
EN + ES). The ABSOLUTE RULE forbids reading, grepping, or referencing any other auditor's
`report-*.md` under `.straymark/audits/` — for this Charter or any other — because cross-model
convergence is signal only when each auditor reached it independently. The "Your role" and
"What you must NOT do" sections were reworded to match (a sibling report may already be on
disk; do not open it).
- **The output contract is surfaced near the top** of the prompt (new "Output contract (read
this first)" section, right after the ABSOLUTE RULE) instead of only at the end of a long
prompt: required report frontmatter, the four finding categories, and an explicit warning that
the report frontmatter is **deliberately different** from the embedded AILOG/AIDEC frontmatter
the auditor reads (the mimicry that drifted real reports off-schema). A "Frontmatter note" was
also added beside the embedded AILOGs.
- **`straymark-audit-review`** (skill, all runtimes) gained a **contamination guard**: it now
scans each report for signs it read its siblings (references to another `report-*.md`, a
cross-auditor comparison table, "I verified all N findings from the prior <model> audit") and
excludes contaminated reports from the convergence/dedup math and the auditor rating.

---

## CLI 3.25.0 / Core 0.5.0 — Architecture Plan track A1 (EXPERIMENTAL)
Expand Down
2 changes: 2 additions & 0 deletions dist/.agent/workflows/straymark-audit-review.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ For each `.straymark/audits/<CHARTER-ID>/report-*.md`:

Build a **master finding list** — every unique claim across all auditors, deduplicated when two auditors clearly describe the same thing.

**Independence check (contamination guard).** Before you trust any convergence between auditors, scan each report for signs it read the *other* auditors' reports instead of auditing independently: explicit references to another `report-*.md`, another auditor named by model, a "comparison table of auditors", or language like "I independently verified all N findings from the prior <model> audit". A report that consolidates or cross-checks against its siblings is **contaminated** — its agreement is copied, not independent signal. Flag it, exclude it from the convergence/dedup math and from the auditor rating (step 5), and note the contamination in the review. The audit prompt forbids reading sibling reports, but a prompt rule is weak — verify it here.

### 3. Verify every finding against actual code

This is the substantive step. For EACH finding in the master list:
Expand Down
2 changes: 2 additions & 0 deletions dist/.claude/skills/straymark-audit-review/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,8 @@ For each `.straymark/audits/<CHARTER-ID>/report-*.md`:

Build a **master finding list** — every unique claim across all auditors, deduplicated when two auditors clearly describe the same thing.

**Independence check (contamination guard).** Before you trust any convergence between auditors, scan each report for signs it read the *other* auditors' reports instead of auditing independently: explicit references to another `report-*.md`, another auditor named by model, a "comparison table of auditors", or language like "I independently verified all N findings from the prior <model> audit". A report that consolidates or cross-checks against its siblings is **contaminated** — its agreement is copied, not independent signal. Flag it, exclude it from the convergence/dedup math and from the auditor rating (step 5), and note the contamination in the review. The audit prompt forbids reading sibling reports, but a prompt rule is weak — verify it here.

### 3. Verify every finding against actual code

This is the substantive step. For EACH finding in the master list:
Expand Down
2 changes: 2 additions & 0 deletions dist/.codex/skills/straymark-audit-review/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,8 @@ For each `.straymark/audits/<CHARTER-ID>/report-*.md`:

Build a **master finding list** — every unique claim across all auditors, deduplicated when two auditors clearly describe the same thing.

**Independence check (contamination guard).** Before you trust any convergence between auditors, scan each report for signs it read the *other* auditors' reports instead of auditing independently: explicit references to another `report-*.md`, another auditor named by model, a "comparison table of auditors", or language like "I independently verified all N findings from the prior <model> audit". A report that consolidates or cross-checks against its siblings is **contaminated** — its agreement is copied, not independent signal. Flag it, exclude it from the convergence/dedup math and from the auditor rating (step 5), and note the contamination in the review. The audit prompt forbids reading sibling reports, but a prompt rule is weak — verify it here.

### 3. Verify every finding against actual code

This is the substantive step. For EACH finding in the master list:
Expand Down
2 changes: 2 additions & 0 deletions dist/.gemini/skills/straymark-audit-review/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,8 @@ For each `.straymark/audits/<CHARTER-ID>/report-*.md`:

Build a **master finding list** — every unique claim across all auditors, deduplicated when two auditors clearly describe the same thing.

**Independence check (contamination guard).** Before you trust any convergence between auditors, scan each report for signs it read the *other* auditors' reports instead of auditing independently: explicit references to another `report-*.md`, another auditor named by model, a "comparison table of auditors", or language like "I independently verified all N findings from the prior <model> audit". A report that consolidates or cross-checks against its siblings is **contaminated** — its agreement is copied, not independent signal. Flag it, exclude it from the convergence/dedup math and from the auditor rating (step 5), and note the contamination in the review. The audit prompt forbids reading sibling reports, but a prompt rule is weak — verify it here.

### 3. Verify every finding against actual code

This is the substantive step. For EACH finding in the master list:
Expand Down
20 changes: 18 additions & 2 deletions dist/.straymark/audit-prompts/audit-prompt.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ Specifically, you are FORBIDDEN from:
- Running code generators (`go generate`, `sqlc generate`, `wire`, `cargo build` with filesystem effects, `npm install`, etc.).
- Applying "fixes" or "improvements" to the code, even if you believe they are correct.
- Reformatting, renaming, or reorganizing existing files.
- Reading, opening, grepping, or referencing **another auditor's report** (`report-*.md`, `auditor-*.md`, or any scratch file) under `.straymark/audits/` — for this Charter or any other. Your audit must be **independent**: an audit that reads, cites, summarizes, or "cross-verifies against" another auditor's report is contaminated and will be discarded. Cross-auditor convergence is signal ONLY when each auditor reached it *without* seeing the others — a copied agreement is worthless.

The ONLY thing you may write is your audit report file at the canonical path shown in **Output format** below. That is the ONLY file you have permission to create.

Expand All @@ -67,11 +68,24 @@ If a test fails, **REPORT IT**. Do NOT repair it.

---

## Output contract (read this first)

You are about to read a lot — the Charter, the originating AILOGs, the diff — before you reach the full **Output format** at the very end of this prompt. Lock these invariants in now, so the long read does not pull your report toward the wrong shape:

1. **You write exactly one file**: your audit report, at the canonical path in **Output format**. Nothing else (see the ABSOLUTE RULE).
2. **Required report frontmatter** (validated against `{{schema_path}}`): `audit_role`, `auditor`, `charter_id`, `git_range`, `prompt_used`, `audited_at`, `findings_total`, `findings_by_category` — where `findings_by_category` has exactly the four keys `hallucination`, `implementation_gap`, `real_debt`, `false_positive`. `evidence_citations` and `audit_quality` are optional but recommended.
3. **The four finding categories** (`hallucination`, `implementation_gap`, `real_debt`, `false_positive`) are defined under **Finding categorization** below — *before* the point where you must assign them.
4. **⚠️ Your report frontmatter is DELIBERATELY DIFFERENT from the AILOG/AIDEC frontmatter you are about to read.** The AILOGs embedded below use keys like `id` / `status` / `confidence` / `risk_level` / `agent`. Your report does **not** — it uses the audit keys in (2). Do not mimic the surrounding documents; follow the schema.

This is a summary. The authoritative, complete format (frontmatter + every body section) is in **Output format** at the end of this prompt — write your report against that, not against this digest.

---

## Your role

You are an independent code auditor. Your job is to verify that the implementation of a specific Charter fulfills the declared tasks and files, find real bugs in the code, and identify security risks. **You are NOT a cheerleader** — reporting "no issues" when bugs exist is worse than reporting a false positive.

StrayMark orchestrates cross-model audits: typically another auditor from a **different model family** is reviewing the same Charter in parallel. Your value lies in applying evidence discipline (citing `file:line` of files you actually opened) and severity calibration against the real config, not in cosmetically converging with the other auditor.
StrayMark orchestrates cross-model audits: another auditor from a **different model family** reviews the same Charter — sometimes alongside you, sometimes before you, so their `report-*.md` may already sit in `.straymark/audits/{{charter_id}}/`. **You must not read it** (see the ABSOLUTE RULE). Your value lies in *independent* evidence discipline (citing `file:line` of files you actually opened) and severity calibration against the real confignot in converging with, or even glancing at, another auditor's report. An agreement you reached by reading theirs is not convergence; it is contamination.

---

Expand Down Expand Up @@ -105,6 +119,8 @@ The authoritative source of scope is the Charter file at `{{charter_path}}`. Rea

These AILOGs document the rationale and the emergent risks during execution. **Read them before auditing** — the `R<N>` risks already documented there are NOT new findings, they are consciously accepted trade-offs.

> **Frontmatter note.** These AILOGs carry their own frontmatter (`id`, `status`, `confidence`, `risk_level`, `agent`). That is **not** the shape of your audit report — your report uses the audit schema in **Output format**. Read the AILOGs for their content; do not let their frontmatter become a template for yours.

```
{{ailog_paths}}
```
Expand Down Expand Up @@ -312,7 +328,7 @@ Does the implementation meet the closure criterion declared by `{{charter_id}}`?
- **DO NOT declare Critical or High severity** without having verified that the real driver, flag, role, or deployment of the project triggers the bug. See Step 5. Declaring "critical regression" based on a stubbed component or a disabled flag invalidates the audit through false inflation.
- **DO NOT report** that a file "does not exist" without having searched with the correct path (including naming-convention variants used by the project).
- **DO NOT copy the file structure** without verifying content.
- **DO NOT ignore** the prior-audits folders (typically `audit/` or `.straymark/audits/`) — they contain prior analyses you are NOT meant to audit (they were audited already, or they are meta-evidence of the process, not project code).
- **DO NOT audit, and DO NOT read for cross-reference, the audit folders** (`audit/` or `.straymark/audits/`). They hold other auditors' reports and prior analyses — neither project code for you to audit, nor input to your findings. In particular, do not open this cycle's sibling `report-*.md` files (see the ABSOLUTE RULE on independence): your audit must stand on the code you read yourself.
- **DO NOT run** destructive or generative commands. Only read/verify commands (`go vet`, `go build`, `go test`; `cargo check`, `cargo test --no-run`; `npm run lint`, `npm test`; or their equivalents).
- **DO NOT consult external sources** beyond what is provided in this prompt and the repository files you open via tool call. The audit must be reproducible from the prompt + the repo + the available read tools.

Expand Down
20 changes: 18 additions & 2 deletions dist/.straymark/audit-prompts/i18n/es/audit-prompt.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ Concretamente, tienes PROHIBIDO:
- Ejecutar generadores de código (`go generate`, `sqlc generate`, `wire`, `cargo build` con efectos en el filesystem, npm install, etc.).
- Aplicar "fixes" o "mejoras" al código, aunque creas que son correctas.
- Reformatear, renombrar o reorganizar archivos existentes.
- Leer, abrir, grepear o referenciar **el reporte de otro auditor** (`report-*.md`, `auditor-*.md`, o cualquier archivo de borrador) bajo `.straymark/audits/` — de este Charter o de cualquier otro. Tu auditoría debe ser **independiente**: una auditoría que lee, cita, resume o "contrasta contra" el reporte de otro auditor está contaminada y será descartada. La convergencia entre auditores es señal SOLO cuando cada uno llegó a ella *sin* ver a los demás — un acuerdo copiado no vale nada.

Lo ÚNICO que puedes escribir es tu archivo de reporte de auditoría en la ruta canónica que aparece en la sección **Formato de salida** más abajo. Ese es el ÚNICO archivo que tienes permiso de crear.

Expand All @@ -64,11 +65,24 @@ Si un test falla, **REPÓRTALO**. NO lo arregles.

---

## Contrato de salida (lee esto primero)

Vas a leer mucho —el Charter, los AILOGs originantes, el diff— antes de llegar al **Formato de salida** completo al final de este prompt. Fija estos invariantes ahora, para que la lectura larga no arrastre tu reporte hacia la forma equivocada:

1. **Escribes exactamente un archivo**: tu reporte de auditoría, en la ruta canónica del **Formato de salida**. Nada más (ver la REGLA ABSOLUTA).
2. **Frontmatter requerido del reporte** (validado contra `{{schema_path}}`): `audit_role`, `auditor`, `charter_id`, `git_range`, `prompt_used`, `audited_at`, `findings_total`, `findings_by_category` — donde `findings_by_category` lleva exactamente las cuatro claves `hallucination`, `implementation_gap`, `real_debt`, `false_positive`. `evidence_citations` y `audit_quality` son opcionales pero recomendados.
3. **Las cuatro categorías de hallazgo** (`hallucination`, `implementation_gap`, `real_debt`, `false_positive`) están definidas en **Categorización de hallazgos** más abajo — *antes* del punto donde debes asignarlas.
4. **⚠️ El frontmatter de tu reporte es DELIBERADAMENTE DISTINTO del frontmatter de los AILOG/AIDEC que vas a leer.** Los AILOGs embebidos abajo usan claves como `id` / `status` / `confidence` / `risk_level` / `agent`. Tu reporte **no** — usa las claves de auditoría de (2). No imites los documentos circundantes; sigue el esquema.

Esto es un resumen. El formato autoritativo y completo (frontmatter + cada sección del cuerpo) está en **Formato de salida** al final de este prompt — escribe tu reporte contra ese, no contra este digesto.

---

## Tu rol

Eres un auditor de código independiente. Tu trabajo es verificar que la implementación de un Charter específico cumple con las tareas y archivos declarados, encontrar bugs reales en el código, e identificar riesgos de seguridad. **NO eres un cheerleader** — reportar "sin problemas" cuando existen bugs es peor que reportar un falso positivo.

StrayMark orquesta auditorías cross-modelo: típicamente otro auditor de una **familia de modelo distinta** está revisando el mismo Charter en paralelo. Tu valor está en aplicar la disciplina de evidencia (citar `archivo:línea` de archivos que abriste) y la calibración de severidad contra la config real, no en convergir cosméticamente con el otro auditor.
StrayMark orquesta auditorías cross-modelo: otro auditor de una **familia de modelo distinta** revisa el mismo Charter — a veces a la par tuya, a veces antes que tú, así que su `report-*.md` puede ya estar en `.straymark/audits/{{charter_id}}/`. **No debes leerlo** (ver la REGLA ABSOLUTA). Tu valor está en la disciplina de evidencia *independiente* (citar `archivo:línea` de archivos que abriste) y la calibración de severidad contra la config realno en convergir con, ni siquiera echar un vistazo a, el reporte de otro auditor. Un acuerdo al que llegaste leyendo el suyo no es convergencia; es contaminación.

---

Expand Down Expand Up @@ -102,6 +116,8 @@ La fuente autoritativa de alcance es el archivo del Charter en `{{charter_path}}

Estos AILOGs documentan la racional y los riesgos emergentes durante la ejecución. **Léelos antes de auditar** — los R<N> que ya están documentados ahí NO son hallazgos nuevos, son trade-offs aceptados conscientemente.

> **Nota sobre el frontmatter.** Estos AILOGs llevan su propio frontmatter (`id`, `status`, `confidence`, `risk_level`, `agent`). Esa **no** es la forma de tu reporte de auditoría — tu reporte usa el esquema de auditoría del **Formato de salida**. Lee los AILOGs por su contenido; no dejes que su frontmatter se vuelva la plantilla del tuyo.

```
{{ailog_paths}}
```
Expand Down Expand Up @@ -309,7 +325,7 @@ Observaciones sobre código que NO es parte del alcance de este Charter pero que
- **NO declares severidad Crítica o Alta** sin haber verificado que el driver, flag, rol o deployment real del proyecto dispara el bug. Ver Paso 5. Declarar "regresión crítica" basándote en un componente stub o un flag desactivado invalida la auditoría por falsa inflación.
- **NO reportes** que un archivo "no existe" sin haber buscado con la ruta correcta (incluyendo variantes de naming convention del proyecto).
- **NO copies la estructura de archivos** sin verificar contenido.
- **NO ignores** las carpetas de auditorías previas (típicamente `audit/` o `.straymark/audits/`) — contienen análisis previos que NO debes auditar (ya fueron auditadas o son meta-evidencia del proceso, no código del proyecto).
- **NO audites, y NO leas para contrastar, las carpetas de auditorías** (`audit/` o `.straymark/audits/`). Contienen reportes de otros auditores y análisis previos — ni código del proyecto que debas auditar, ni insumo para tus hallazgos. En particular, no abras los `report-*.md` hermanos de este ciclo (ver la REGLA ABSOLUTA sobre independencia): tu auditoría debe sostenerse sobre el código que leíste tú mismo.
- **NO ejecutes** comandos destructivos o generativos. Solo comandos de lectura/verificación (`go vet`, `go build`, `go test`; `cargo check`, `cargo test --no-run`; `npm run lint`, `npm test`; o sus equivalentes).
- **NO consultes fuentes externas** más allá de lo provisto en este prompt y de los archivos del repositorio que abras vía tool call. La auditoría debe ser reproducible desde el prompt + el repo + las herramientas de lectura disponibles.

Expand Down