diff --git a/docs/benchmark-ownership.md b/docs/benchmark-ownership.md index fd0bbb6..ea7c8c6 100644 --- a/docs/benchmark-ownership.md +++ b/docs/benchmark-ownership.md @@ -37,8 +37,8 @@ the Ethos source tree or rewrite Ethos parser behavior. ## Claim Rule No public performance, footprint, quality, table, heading, or parser-speed superlative claim is -allowed from either repository until signed G1/G2/G3 result files exist for the required hosts -and the claim audit says the wording is supported. +allowed from either repository unless a claim audit maps the exact wording to accepted evidence +and an explicit public-report or release decision allows that surface. Allowed current wording: @@ -63,4 +63,5 @@ cross-platform rendered-crop byte-identity claims 4. Record cross-host or claim-affecting evidence in `ethos/docs/validation/` or in signed `ethos-bench/benchmarks/results/gate-zero/` files, depending on whether the evidence is a product-boundary validation or a benchmark result. -5. Fill ADR-0005 only after the required G1/G2/G3 result files and evidence bundles exist. +5. Amend ADR-0005 or add a successor decision only after the required result files and evidence + bundles exist for that decision. diff --git a/docs/decisions/README.md b/docs/decisions/README.md index dd244c5..39bdf06 100644 --- a/docs/decisions/README.md +++ b/docs/decisions/README.md @@ -5,11 +5,11 @@ Every closing PRD ยง15 open question gets an ADR here. Output-changing merges re | ADR | Title | Status | | --- | --- | --- | | 0000 | Gate Zero decider | Accepted | -| 0001 | Staffing confirmation vs plan schedule | **Proposed โ€” blocks Week 0 exit** | +| 0001 | Staffing confirmation vs plan schedule | Accepted | | 0002 | PDFium two-phase distribution path | Accepted | | 0003 | Deterministic font policy | Accepted | | 0004 | Licensing and dependency policy | Accepted | -| 0005 | Gate Zero decision | Template (filled week 4) | +| 0005 | Gate Zero decision | Accepted - PROCEED | | 0006 | Package identifiers | Accepted | | 0007 | Trust layer first | Accepted | | 0008 | Gate Zero G2 footprint policy | Accepted | diff --git a/docs/execution-status.md b/docs/execution-status.md index e306960..2ece0a5 100644 --- a/docs/execution-status.md +++ b/docs/execution-status.md @@ -2,11 +2,11 @@ Date: 2026-06-16 Owner: product / decider -Status: Pre-alpha / Milestone A implementation. Week 0 governance is accepted, WS-ENGINE Phase 1 has a real narrow PDFium path, WS-VERIFY-ALPHA has real deterministic evidence checks over native Ethos JSON and pinned OpenDataLoader output, WS-HARNESS has fail-closed readiness scaffolding, the Gate Zero corpus/hardware manifest and direct competitor lock are frozen/signed, ADR-0006 closes package identifier/trademark validation, ADR-0007 locks the product direction, and the public-source preflight is green for a source-only pre-alpha GitHub push. Signed host result generation still blocks Gate Zero, public benchmark reports, releases, packages, and all performance/quality claims. The next controlled-run handoff is `docs/gate-zero-evidence-runbook.md`. +Status: Pre-alpha / Milestone B entry. Week 0 governance is accepted, WS-ENGINE Phase 1 has a real narrow PDFium path, WS-VERIFY-ALPHA has real deterministic evidence checks over native Ethos JSON and pinned OpenDataLoader output, WS-HARNESS has fail-closed readiness scaffolding, the Gate Zero corpus/hardware manifest and direct competitor lock are frozen/signed, ADR-0005 records an accepted `PROCEED` decision for internal Milestone B continuation, ADR-0006 closes package identifier/trademark validation, ADR-0007 locks the product direction, and the public-source preflight is green for a source-only pre-alpha GitHub push. Public benchmark reports, releases, packages, production positioning, and all performance/quality/footprint claims remain blocked. The controlled-run handoff remains `docs/gate-zero-evidence-runbook.md`; the accepted decision record is `docs/decisions/ADR-0005-gate-zero-decision.md`. ## Current Reality -The repository is still pre-alpha, but it is no longer only contract/scaffold code. Real parsing and real alpha verification exist. They are narrow, fixture-backed, and not yet Gate Zero-proven. +The repository is still pre-alpha, but it is no longer only contract/scaffold code. Real parsing and real alpha verification exist. They are narrow, fixture-backed, and now have an accepted internal Gate Zero decision for roadmap control. That decision is not a public benchmark report, release approval, package approval, production approval, or claim approval. The committed implementation now includes: @@ -21,23 +21,28 @@ The committed implementation now includes: - `make verify-alpha` is the current alpha trust-loop command: it checks native examples, synthetic OpenDataLoader-style examples, pinned real OpenDataLoader grounded/ungrounded examples, schema validation, usage diagnostics for malformed citations and malformed OpenDataLoader-style inputs, byte-identical repeated verification reports, byte-identical native crop descriptors, summary diagnostics for an ungrounded native case, and foreign fixture manifest hash binding. - Native Ethos verification can emit deterministic, schema-backed crop descriptor JSON artifacts through `--crop-dir`; these bind `document_fingerprint`, page, bbox, and check ids. Native `crop_ref` filenames are logical evidence references derived from document fingerprint, check id, and page, while descriptors still record the exact observed bbox. When `--crop-source-pdf` is supplied, the CLI validates source-PDF fingerprint binding and emits PNG crop artifacts whose filenames, byte hashes, dimensions, and source fingerprint are bound from the descriptor. `make verify-rendered-crops` checks same-host repeated-run stability for the rendered artifact path, and `make compare-rendered-crops` classifies two rendered-crop runs by separating logical evidence identity from rendered artifact byte equality. Cross-platform rendered image determinism is not claimed; the 2026-06-14 macOS arm64 vs Linux x64 validation record in `docs/validation/rendered-crops-2026-06-14.md` preserved document fingerprint and `payload_sha256` but failed rendered artifact byte equality because the evidence bbox differed slightly across platforms. -Still absent or not claimable: reproducible benchmark result JSON, executed competitor comparisons, public speed/quality/footprint claims, OCR/image-only support, real table extraction, mature list/heading/layout semantics, semantic/arithmetic verification beyond deterministic evidence lookup, Phase 2 project-maintained PDFium builds, release packaging, and full frozen-corpus multi-platform determinism evidence. +Still absent or not claimable: public benchmark reports, public competitor-comparison claims, public speed/quality/footprint claims, OCR/image-only support, real table extraction, mature list/heading/layout semantics, semantic/arithmetic verification beyond deterministic evidence lookup, Phase 2 project-maintained PDFium builds, release packaging, and claim-audit approval for any public result wording. ## Human / External Blockers PM execution packet: `benchmarks/gate-zero/FREEZE_PACKET.md`. +Resolved control point: ADR-0005 is accepted with `PROCEED` for internal Milestone B +continuation. Its indexed result files and evidence bundles live in the sibling `ethos-bench` +repository. This does not unblock public benchmark reports, releases, packages, production +positioning, or claim wording. + | ID | Blocker | Required output | Owner | Blocks | | --- | --- | --- | --- | --- | -| H1 | Generate signed Gate Zero host results | `../ethos-bench/benchmarks/results/gate-zero/{macos-arm64,linux-x64}/g1.json` plus G2/G3 result files are produced from the frozen manifest and pinned lock | Benchmark owner / decider | Valid Gate Zero run, public benchmark trust | -| H2 | Execute pinned competitor comparisons | Harness executes the pinned OpenDataLoader, EdgeParse, LiteParse, and PyMuPDF4LLM artifacts and records signed comparison rows where applicable in `ethos-bench` | Benchmark owner | Public competitor comparison | -| H3 | Accept package identifier ADR | Closed by ADR-0006 acceptance on 2026-06-15 | Devrel / decider | Unblocked package identifier/trademark gate; broader public-release checklist still applies | +| H1 | Execute and review public-safe competitor comparison flow | Harness executes the pinned OpenDataLoader, EdgeParse, LiteParse, and PyMuPDF4LLM artifacts where applicable, then records reviewable comparison rows in `ethos-bench` without unsupported wording | Benchmark owner | Public competitor-comparison report | +| H2 | Complete public release/package checklist | `docs/public-release-checklist.md` is complete, release/package artifacts are explicitly approved, and claim-language gates still pass | Devrel / decider | Public releases, packages, and production positioning | +| H3 | Approve public result wording | Claim audit maps any proposed public result sentence to accepted evidence and rejects unsupported wording | Benchmark owner / decider | Public benchmark/result language | -The corpus/hardware freeze and direct competitor pins are recorded in `benchmarks/gate-zero/manifest.json` and `benchmarks/competitors.lock.json`. The remaining blockers are result production and signed evidence, not manifest/pin placeholders. +The corpus/hardware freeze and direct competitor pins are recorded in `benchmarks/gate-zero/manifest.json` and `benchmarks/competitors.lock.json`. The remaining blockers are public-report, public-release, and claim-wording blockers, not manifest/pin placeholders. ## Current Milestone Posture -Milestone A is partially implemented, not complete. The product can demonstrate a narrow parser-backed grounding loop today, but cannot yet claim Gate Zero readiness or public benchmark credibility. +Milestone A has an accepted internal Gate Zero decision for roadmap control, so Milestone B work may proceed internally. The product can demonstrate a narrow parser-backed grounding loop today, but the decision cannot be used as public benchmark credibility. | Work item | Current status | Remaining blocker | | --- | --- | --- | @@ -48,8 +53,8 @@ Milestone A is partially implemented, not complete. The product can demonstrate | Font policy groundwork | Partially landed: substitution table and profile policy are present; fixture output uses deterministic substitution IDs | Bundled fallback asset hashing and broader font/CID validation remain open | | Schema/example validation | Landed: schemas, examples, deterministic profile, referential integrity, and bbox sanity pass the `jsonschema` validation gate | Contract changes still require explicit versioning and compatibility review | | Trust-layer implementation | Landed: `ethos verify` quote/value/presence/table-cell checks, explicit quote-containment labeling, normalized equality for value/table-cell checks, stale and unverifiable fingerprint handling, unsupported claim reporting, structured capability limits, native Ethos JSON path, ODL-style adapter path with synthetic table/cell mapping, pinned real OpenDataLoader 2.4.7 grounded/ungrounded fixtures, foreign fixture manifest hash validation, crop-ref evidence plumbing, stable logical native crop refs, native crop descriptor artifacts, raw BGRA crop rendering in `ethos-pdf`, CLI PNG crop artifact production for bound native source PDFs, same-host rendered crop repeatability check, rendered-crop run comparison helper, strict citation/config input validation, citation input schema, and demo fixtures | Still needed: evidence matching against richer source structures, semantic/arithmetic claim handling by explicit non-v1 design, real OpenDataLoader table-cell grounding, broader adapter hardening against real output, and a decision on whether cross-platform rendered crop artifact equality is worth pursuing after the current macOS/Linux bbox drift finding | -| WS-HARNESS readiness | Partially landed: readiness path is green for frozen corpus/hardware and pinned competitors, and fails closed if those records regress | Actual benchmark runner outputs, install-size/RSS/timing collection, competitor execution, and cross-host determinism evidence are still missing | +| WS-HARNESS readiness | Partially landed: readiness path is green for frozen corpus/hardware and pinned competitors, Gate Zero evidence preflight validates the current `ethos-bench` handoff, and gates fail closed if those records regress | Public-safe comparison report flow, release/package approval, claim-wording approval, and future evidence-refresh workflow still need hardening | ## PM Rule -Public language stays at "pre-alpha / Milestone A implementation" until the remaining external blockers are closed and Gate Zero has reproducible result JSON. Do not describe Ethos as benchmark-validated, release-ready, or broadly parser-complete. Internal parser work should proceed only when it supports Gate Zero evidence or the trust layer; the product-differentiating path remains verification and grounding first, with parser expansion serving that path. +Public language stays at "pre-alpha / Milestone B internal continuation" until the remaining external blockers are closed and the claim audit approves specific wording. Do not describe Ethos as benchmark-validated, release-ready, production-ready, or broadly parser-complete. Internal parser work should proceed only when it supports accepted evidence paths or the trust layer; the product-differentiating path remains verification and grounding first, with parser expansion serving that path. diff --git a/docs/gate-zero-evidence-runbook.md b/docs/gate-zero-evidence-runbook.md index cceaf5b..81f3eef 100644 --- a/docs/gate-zero-evidence-runbook.md +++ b/docs/gate-zero-evidence-runbook.md @@ -136,7 +136,8 @@ For the cross-host G3 bundle, use `GATE_ZERO_PLATFORM=cross-platform` and ## Decision Step -Fill `docs/decisions/ADR-0005-gate-zero-decision.md` only after: +ADR-0005 is accepted for the current internal Gate Zero decision. For a future amendment or +successor decision, update the decision record only after: - required G1 files exist for both recorded hosts; - required G2 files exist for both recorded hosts; @@ -144,7 +145,7 @@ Fill `docs/decisions/ADR-0005-gate-zero-decision.md` only after: - evidence bundles exist for the source result files; - the decider has reviewed the result JSON and reproduction sidecars. -Before filling the ADR, run: +Before amending the decision record, run: ```bash python3 .github/scripts/gate_zero_evidence_preflight.py decision --ethos-bench ../ethos-bench @@ -154,9 +155,12 @@ This checks the expected `ethos-bench` result paths, timestamped evidence bundle complete reproduction environments, and bundle checksum manifests. It does not decide whether Gate Zero passes. -Until that ADR is filled, public language remains: +Even after ADR-0005 acceptance, public language remains: ```text Ethos is pre-alpha. It verifies whether AI citations are grounded in document evidence across native Ethos JSON and supported foreign parser outputs. ``` + +ADR-0005 authorizes internal Milestone B continuation only. It does not authorize public benchmark +reports, releases, packages, production positioning, or unsupported result wording. diff --git a/docs/roadmap.md b/docs/roadmap.md index 65f7a3e..0ebb7d5 100644 --- a/docs/roadmap.md +++ b/docs/roadmap.md @@ -12,16 +12,17 @@ Current PM status and blockers: `docs/execution-status.md`. | Milestone | Window | Contents | Gate | | --- | --- | --- | --- | | Week 0 | pre-kickoff | ADRs, governance, corpus freeze, CI bootstrap, competitor pins | All 11 rows done; clock starts | -| A | weeks 1-8 | Contracts (5 schemas, c14n, deterministic profile), trust-boundary artifacts (`GroundingSource`, verification schemas, OpenDataLoader adapter stub, `ethos verify` CLI stub), PDFium Phase 1 spike, harness + competitor adapters, CLI skeleton | **Gate Zero**: G1 throughput, G2 footprint, G3 determinism - PROCEED / G1_RETRY / FALLBACK (ADR-0005). A is incomplete without the trust-boundary artifacts. | +| A | weeks 1-8 | Contracts (5 schemas, c14n, deterministic profile), trust-boundary artifacts (`GroundingSource`, verification schemas, OpenDataLoader adapter stub, `ethos verify` CLI stub), PDFium Phase 1 spike, harness + competitor adapters, CLI skeleton | **Gate Zero**: ADR-0005 is accepted as `PROCEED` for internal Milestone B continuation. This is not public benchmark, release, package, production, or claim approval. | | B | weeks 9-14 | **`ethos verify` alpha first**: native Ethos JSON + OpenDataLoader verification demo, stale fingerprint checks, capability-limited reports, deterministic evidence matching; then reading order, blocks, headings, lists, Markdown/text exporters, Python wheel scaffold, quality dashboard, Windows x64 nightly determinism | 13-B exit checklist | | C | weeks 15-22 | Simple/bordered tables; RAG chunker + citations; non-text region coordinates; security report + default-chunk exclusion; debug overlay; internal benchmark snapshot | 13-C exit + first checkpoint | | D | weeks 23-30 | `verify_citations` v1; crop API; sandbox/subprocess backend; Node beta and MCP experimental only if staffed or accepted by release-scope ADR | 13-D exit | | E | weeks 31-40 | Public benchmark report (reproducible, labeled tiers); PDFium Phase 2 project-maintained builds; stable CLI/Python docs; proof-of-trust demos; **Public Beta** | Release 1 claim audit + public-beta checkpoint | | F / Release 2 | post-E | Complex tables, formula/LaTeX, chart classification, optional enrichment modules (never base) | Scoped after E from beta fixtures | -Fallback charter: if Gate Zero fails on G2/G3 (or a failed G1 retry), Ethos pivots to the -parser-agnostic trust layer โ€” standalone `ethos-verify` + chunk/citation tooling over foreign -parser output. The trust layer ships as a Milestone B alpha either way. +Fallback charter: ADR-0005 selected `PROCEED`. If a future Gate Zero successor decision rejects +G2/G3 evidence, or rejects G1 after a bounded retry path, Ethos pivots to the parser-agnostic trust +layer โ€” standalone `ethos-verify` + chunk/citation tooling over foreign parser output. The trust +layer remains the first Milestone B product path either way. Surface labels in Release 1: CLI + Python **stable**. Node **beta** and MCP **experimental** ship only if staffed or accepted by release-scope ADR before public claims.