feat(tests): canonical acceptance harness skeleton (L0-a)#1174
Conversation
L0-a slice of Q00#1170 — minimal manual acceptance harness per Q00#1157's SSOT acceptance gate. Discovers ``tests/canonical/<slug>/`` directories, validates fixture shape, and parametrizes per-scenario test bodies. Ships one initial scenario (``cli-todo``). ## Summary The harness is intentionally narrow per the 2026-05-22 minimal-substrate audit (Q00#1157): - **No nightly CI workflow.** The maintainer runs the suite manually when assessing SSOT close-readiness. - **No recorded-replay layer.** Live ``ouroboros_auto`` invocation costs tokens; opt-in via ``OUROBOROS_RUN_CANONICAL=1``. - **No cost budget tracking.** Each invocation is the operator's intent. - **No refresh-rotation ownership policy.** No replay layer to refresh. - **No divergence detection.** Single mode. What lands: - ``tests/canonical/__init__.py`` (empty package marker). - ``tests/canonical/README.md`` documenting both cost regimes and the scenario directory contract. - ``tests/canonical/conftest.py``: scenario discovery, ``CanonicalScenario`` frozen dataclass, ``pytest_generate_tests`` hook so new scenario directories are auto-parametrized, ``live_run_enabled`` fixture gating the live invocation behind the env var. - ``tests/canonical/test_canonical.py``: 6 shape-check assertions per scenario (goal non-empty, domain_class snake_case, completion_mode canonical, runtime_probe_kinds string typed, wall_clock_budget positive, matrix non-empty) plus the live-run skip placeholder. - ``tests/canonical/cli-todo/goal.txt`` + ``expected.yaml`` — the first canonical scenario (habit-tracker CLI). The L1 catalog cross-validation tests (verifying ``domain_class`` round-trips through ``TaskClassProfile``) land in a follow-up PR after Q00#1173 (L1-a) merges to main. This PR keeps the harness self-contained. ## Test plan - [x] ``uv run pytest tests/canonical/ -v`` → 6 passed, 1 skipped (live-run gate). - [x] ``uv run ruff check tests/canonical/`` → clean. - [x] ``uv run ruff format tests/canonical/`` → clean. - [x] ``uv run mypy tests/canonical/conftest.py tests/canonical/test_canonical.py`` → clean. - [x] Broad regression (``uv run pytest tests/ -q --ignore=tests/integration --ignore=tests/unit/cli``) → 8578 passed, 3 skipped. ## What is NOT in this PR - Live ``ouroboros_auto`` invocation wiring (the ``test_scenario_live_run_or_skip`` body) — follow-up sub-PR once L1-a catalog is on main. - L1 catalog cross-validation tests — follow-up sub-PR after Q00#1173. - Additional scenarios (``webhook-receiver``, ``vertical-slice-refactor``, ``2d-kart-racer``) — L0-b / L0-c / L0-d follow-ups. ## References - Q00#1157 — Meta SSOT for ``ooo auto`` (L0 lane body). - Q00#1170 — L0 design issue (minimal redesign as of 2026-05-22). - Q00#1173 — L1-a catalog data (cross-validation tests will follow). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: APPROVE
Reviewing commit
b6b6947for PR #1174
Review record:
505c2976-12ab-4b92-90e4-6e8212970556
Blocking Findings
No in-scope blocking findings remained after policy filtering.
Non-blocking Suggestions
None.
Design Notes
Unable to perform an evidence-based review: the command runner fails before starting the shell with bwrap: No permissions to create a new namespace, and no MCP resources are available for the snapshot or diff. I did not inspect /tmp/pr_diff_1174.patch, the changed files, or the review comments, so this is not a substantive approval.
Recovery Notes
First recoverable review artifact generated from codex analysis log.
Reviewed by ouroboros-agent[bot] via Codex deep analysis
Merge-readiness rationale — re-audited against #961 / #1157 SSOTsAfter re-reviewing this PR against the #961 AgentOS roadmap SSOT and the 1. SSOT alignment — direction-correct, not drift#1157 L0 lane (the contract this PR implements) explicitly mandates:
This PR ships exactly that shape:
#961 meta-SSOT already lists #1174 in Track B as:
So the PR is on-roadmap, in the right tier slot, and not a sequencing risk 2. Over-engineering check — passes
The shape is the minimum that lets future PRs add scenarios + wire live-run 3. Local verification (re-run on
|
PR Review SummaryVerdictAPPROVE
Scope Reviewed
Blocking IssuesNone. WarningsNone. Mutation-Test ThinkingThe "behavior" introduced by this PR is the fixture-validation contract
Likely surviving mutants — all in the loader's negative paths Mutants the current tests do catch reliably — every positive-shape Additional tests recommended (non-blocking, fits L0-b follow-up):
Complexity / CRAP-style Risk
Test Quality Assessment (6/7)
Security / Operational Risk
Looks Good
Final RecommendationAPPROVE — ready to merge. The PR delivers exactly the L0-a contract specified by #1157, stays |
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
Branch: feat/prL0a | 6 files, +436/-0 | CI: Bridge TypeScript pass 16s https://github.com/Q00/ouroboros/actions/runs/26285103937/job/77370684966
Scope: diff-only
HEAD checked: b6b69476e1ec829885c99aeaa4b02304efe61580
What Improved
- Added a narrow
tests/canonical/harness with auto-discovered scenario fixtures, schema shape checks, and an initialcli-todofixture. - Preserved the intended no-CI/no-replay/no-budget L0-a posture; the default canonical suite runs without live LLM cost.
Issue #1170 Requirements
| Requirement | Status |
|---|---|
tests/canonical/ directory layout. |
MET — added under tests/canonical/. |
Self-contained cli-todo scenario fixture. |
MET — tests/canonical/cli-todo/goal.txt and expected.yaml exist. |
| Manual pytest harness for canonical scenarios. | PARTIAL — parent suite works for shape checks, but documented per-scenario command collects 0 tests. |
Runner invokes ouroboros_auto and asserts documented terminal state. |
NOT MET / DEFERRED — PR declares live invocation as follow-up, but README currently claims it works. |
| Adding a new scenario requires no infrastructure change. | MET — discovery iterates scenario directories with goal.txt. |
| Single human-readable summary line per scenario. | NOT MET — no summary reporter/output path is implemented beyond pytest’s default item output. |
| No nightly CI, replay, cost budget, refresh policy, or divergence detection. | MET — no such infrastructure was added. |
Prior Findings Status
| Prior Finding | Status |
|---|---|
| No prior substantive blockers were reported; prior bot review approved while stating it could not inspect PR contents due sandbox failure. | MODIFIED — current HEAD was inspected and two behavior-contract blockers were verified. |
Blockers
| # | File:Line | Severity | Confidence | Finding |
|---|---|---|---|---|
| 1 | tests/canonical/README.md:43 |
Medium | 95% | The README’s “Full live run” contract says OUROBOROS_RUN_CANONICAL=1 uv run pytest tests/canonical/ -v actually invokes ouroboros_auto and asserts the terminal state, but current HEAD unconditionally skips after opt-in at tests/canonical/test_canonical.py:124. I verified the parent command runs 6 shape checks and 1 skip, so the behavior-defining docs overclaim the acceptance runner and would mislead maintainers using this as the #1170 gate. |
| 2 | tests/canonical/README.md:51 |
Medium | 95% | The documented single-scenario command targets tests/canonical/cli-todo/, but that directory contains only goal.txt and expected.yaml; all test bodies live in tests/canonical/test_canonical.py. I verified OUROBOROS_RUN_CANONICAL=1 uv run pytest tests/canonical/cli-todo/ -v collects 0 tests and exits with pytest code 5, so the per-scenario manual execution path required by the L0-a issue/README does not work. |
Follow-ups
| # | File:Line | Priority | Confidence | Suggestion |
|---|---|---|---|---|
| 1 | tests/canonical/cli-todo/expected.yaml:10 |
Low | 90% | The fixture comment names test_scenario_completion_mode_matches_catalog, but that test does not exist in this PR and catalog validation is explicitly deferred. Update the comment or add the intended test when the L1 catalog follow-up lands. |
Test Coverage
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 uv run pytest tests/canonical/ -v passes with 6 passed and 1 skipped. The new shape-check logic in tests/canonical/conftest.py and tests/canonical/test_canonical.py has happy-path coverage for the included fixture, but the documented live-run and single-scenario command paths are not covered and one currently collects no tests. Therefore not all newly introduced behavior-defining harness paths have corresponding working tests.
Design / Roadmap Gate
The PR is directionally aligned with the #1170/#1157 minimal-substrate gate by keeping the harness manual and avoiding replay/CI/cost infrastructure. The blockers are not roadmap disagreements; they are current-HEAD contract mismatches where the new behavior-defining README claims live and per-scenario execution modes that the shipped harness does not provide.
Merge Recommendation
- Merge after correcting the README/runner mismatch and making the documented single-scenario command execute the intended canonical checks, or replacing that command with one that works against the parent parametrized test module.
ouroboros-agent[bot]
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
Branch: feat/prL0a | 6 files, +436/-0 | CI: Bridge TypeScript pass 16s https://github.com/Q00/ouroboros/actions/runs/26285103937/job/77370684966
Scope: diff-only
HEAD checked: b6b69476e1ec829885c99aeaa4b02304efe61580
What Improved
- Added a narrow
tests/canonical/harness with auto-discovered scenario fixtures, schema shape checks, and an initialcli-todofixture. - Preserved the intended no-CI/no-replay/no-budget posture for the default shape-check suite.
Issue #1170 Requirements
| Requirement | Status |
|---|---|
tests/canonical/ directory layout. |
MET — added under tests/canonical/. |
Self-contained cli-todo scenario fixture. |
MET — tests/canonical/cli-todo/goal.txt and tests/canonical/cli-todo/expected.yaml exist. |
| Manual pytest harness for canonical scenarios. | PARTIAL — parent shape-check suite works, but the documented per-scenario command collects 0 tests. |
Runner invokes ouroboros_auto and asserts documented terminal state. |
NOT MET / DEFERRED — code and PR body defer this, but README currently documents it as available. |
| Adding a new scenario requires no infrastructure change. | MET — discovery iterates scenario directories with goal.txt in tests/canonical/conftest.py:82. |
| Single human-readable summary line per scenario. | NOT MET — no summary output path is implemented beyond normal pytest item output. |
| No nightly CI, replay, cost budget, refresh policy, or divergence detection. | MET — no such infrastructure was added. |
Prior Findings Status
| Prior Finding | Status |
|---|---|
Direct prev_review.txt had no substantive blockers and stated review inputs could not be inspected. |
MODIFIED — current HEAD and artifacts were inspected; two behavior-contract blockers are verified. |
README live-run command overclaims actual ouroboros_auto invocation. |
MAINTAINED — tests/canonical/README.md:43 still claims invocation while tests/canonical/test_canonical.py:124 skips after opt-in. |
| README single-scenario command collects no tests. | MAINTAINED — tests/canonical/README.md:51 still points at tests/canonical/cli-todo/, which has no test modules. |
Blockers
| # | File:Line | Severity | Confidence | Finding |
|---|---|---|---|---|
| 1 | tests/canonical/README.md:43 |
Medium | 95% | The documented “Full live run” says OUROBOROS_RUN_CANONICAL=1 uv run pytest tests/canonical/ -v invokes ouroboros_auto and asserts terminal state, but current HEAD unconditionally skips the opted-in live path at tests/canonical/test_canonical.py:124. I verified the command still reports 6 passed, 1 skipped, so the behavior-defining docs overclaim the manual acceptance gate. |
| 2 | tests/canonical/README.md:51 |
Medium | 95% | The documented single-scenario command targets tests/canonical/cli-todo/, but that directory contains only goal.txt and expected.yaml; all test bodies live in tests/canonical/test_canonical.py. I verified OUROBOROS_RUN_CANONICAL=1 uv run pytest tests/canonical/cli-todo/ -v collects 0 tests and exits with pytest code 5, so the per-scenario manual execution path does not work. |
Follow-ups
| # | File:Line | Priority | Confidence | Suggestion |
|---|---|---|---|---|
| 1 | tests/canonical/cli-todo/expected.yaml:10 |
Low | 90% | The fixture comment references test_scenario_completion_mode_matches_catalog, but that test is not present and catalog validation is explicitly deferred. Update the comment now or add the named test with the L1 catalog follow-up. |
Test Coverage
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 uv run pytest tests/canonical/ -v passes with 6 passed and 1 skipped. OUROBOROS_RUN_CANONICAL=1 uv run pytest tests/canonical/ -v also skips the live path, and OUROBOROS_RUN_CANONICAL=1 uv run pytest tests/canonical/cli-todo/ -v collects 0 tests. The new shape-check logic has happy-path coverage for the included fixture, but not all newly documented/manual runner behavior has corresponding working coverage.
Design / Roadmap Gate
The PR aligns with the minimal-substrate/no-CI/no-replay direction in design_context.md, and the deferred live wiring is traceable in the PR body. The current checked-out docs still expose the deferred live runner as an available manual contract at tests/canonical/README.md:43 and a broken per-scenario command at tests/canonical/README.md:51, so the design-gate documentation and executable boundary are not yet consistent.
Merge Recommendation
- Merge after correcting the README/manual command contract or wiring the documented live and per-scenario execution paths.
ouroboros-agent[bot]
Resolve the two README/code mismatches surfaced by the latest review on PR Q00#1174: 1. `tests/canonical/README.md` "Full live run" section claimed `OUROBOROS_RUN_CANONICAL=1 uv run pytest tests/canonical/ -v` actually invokes `ouroboros_auto` against each scenario. In L0-a the live wiring is deferred — `test_scenario_live_run_or_skip` unconditionally `pytest.skip`s with a typed reason after the opt- in env var is checked, so the documented invocation behavior is not yet available. Reword the section to call out the L0-a state ("opt-in still skips with a typed reason; shape-check tests still run") while keeping the future L0-b semantics described. 2. `tests/canonical/README.md` "Run a single scenario" section pointed at `tests/canonical/cli-todo/`, but the scenario directory contains only `goal.txt` and `expected.yaml` — pytest collects zero tests there. The actual test bodies live in `tests/canonical/test_canonical.py` and are parametrized per scenario via `pytest_generate_tests`. Replace the command with the working filter form: `uv run pytest tests/canonical/ -v -k <slug>`. 3. `tests/canonical/cli-todo/expected.yaml` had a comment referencing `test_scenario_completion_mode_matches_catalog`, which is not in the harness and is deferred until Q00#1173 (L1-a catalog data) is available on `main`. Update the comment to note the round-trip test is a follow-up, not yet present. No code change. The hermetic shape-check suite is unchanged (still 6 passed, 1 skipped). `uv run pytest tests/canonical/ -v -k cli-todo` now collects and passes the per-scenario tests, replacing the previously documented zero-collection command.
Addressed REQUEST_CHANGES blockers from the latest reviewPushed
Verification
No code change — only docs aligned to what the code already does. The deferred-live posture itself is unchanged and continues to honor the L0-a / L0-b split spelled out in #1170. |
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: APPROVE
Reviewing commit
05f2912for PR #1174
Review record:
2d3468c1-67ad-4c75-931d-c2a116e2ec75
Blocking Findings
No in-scope blocking findings remained after policy filtering.
Non-blocking Suggestions
None.
Design Notes
Unable to assess architecture or implementation because the review inputs were inaccessible in this environment.
Policy Notes
- Omitted 1 finding(s) that referenced files outside the current PR changed-files scope.
Recovery Notes
First recoverable review artifact generated from codex analysis log.
Reviewed by ouroboros-agent[bot] via Codex deep analysis
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
Branch: feat/prL0a | 6 files, +447/-0 | CI: Bridge TypeScript pass 18s https://github.com/Q00/ouroboros/actions/runs/26289815437/job/77386756519
Scope: diff-only
HEAD checked: 05f2912ca184d6c433d5cb8eb8ee12f2221bbf85
What Improved
- The prior README/code mismatches are fixed at current HEAD: live invocation is now explicitly deferred in L0-a, and the single-scenario command uses the working
-k cli-todofilter. - The PR adds a narrow
tests/canonical/fixture contract, auto-discovery, positive shape checks, and the initialcli-todoscenario without adding CI/replay/cost-budget machinery.
Issue #1170 Requirements
| Requirement | Status |
|---|---|
tests/canonical/ directory layout. |
MET — added under tests/canonical/. |
Self-contained cli-todo scenario fixture. |
MET — tests/canonical/cli-todo/goal.txt and tests/canonical/cli-todo/expected.yaml exist. |
| Manual pytest harness for canonical scenarios. | PARTIAL — parent and -k cli-todo commands work, but live ouroboros_auto execution is explicitly deferred. |
Runner invokes ouroboros_auto and asserts terminal state. |
DEFERRED BY PR NON-GOAL — tests/canonical/test_canonical.py:124 skips even after opt-in, matching the PR body’s L0-a boundary. |
| Adding a new scenario requires no infrastructure change. | MET — tests/canonical/conftest.py:82 auto-discovers scenario directories with goal.txt; no one-line parametrization addition is needed. |
| Single human-readable summary line per scenario. | NOT MET — no summary output path is implemented; current per-scenario live body only skips at tests/canonical/test_canonical.py:108. |
| No nightly CI, replay, cost budget, refresh policy, or divergence detection. | MET — no such infrastructure was added. |
Prior Findings Status
| Prior Finding | Status |
|---|---|
Direct prev_review.txt contained no substantive blockers and said inputs were inaccessible. |
MODIFIED — current HEAD and artifacts were inspected directly; substantive review now finds remaining requirement/test-adequacy gaps. |
README live-run command overclaimed actual ouroboros_auto invocation. |
WITHDRAWN — tests/canonical/README.md:43 now states live wiring lands in L0-b and L0-a skips with a typed reason. |
| README single-scenario command collected 0 tests. | WITHDRAWN — tests/canonical/README.md:53 now documents filtering the parent suite with -k cli-todo, and I verified it selects the scenario tests. |
expected.yaml referenced a non-existent catalog round-trip test. |
WITHDRAWN — tests/canonical/cli-todo/expected.yaml:9 now states shape-only validation and explicitly defers catalog round-trip coverage. |
Blockers
| # | File:Line | Severity | Confidence | Finding |
|---|---|---|---|---|
| 1 | tests/canonical/test_canonical.py:108 |
Medium | 90% | #1170 requires the L0 runner to produce a single human-readable summary line per scenario for maintainers to copy into SSOT progress comments, but the only per-scenario runner body is test_scenario_live_run_or_skip, which either skips when the env var is unset or skips again after opt-in at lines 119-124. There is no terminal-summary hook, emitted scenario result, or copyable per-scenario summary in the current HEAD, so a concrete linked-issue acceptance criterion remains unimplemented and untested. |
| 2 | tests/canonical/conftest.py:107 |
Medium | 85% | The new scenario schema loader adds multiple behavior-defining rejection paths for fixture rot (missing expected.yaml, empty goal, YAML parse failure, non-mapping YAML, missing required keys, invalid completion_mode) across lines 107-149, but the submitted tests exercise only the single well-formed cli-todo happy path. Because this PR establishes the canonical fixture schema contract, the failure/edge cases introduced by the new validation logic need corresponding tests before this can be treated as a trustworthy acceptance harness. |
Follow-ups
| # | File:Line | Priority | Confidence | Suggestion |
|---|
Test Coverage
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 uv run pytest tests/canonical/ -v passes with 6 passed and 1 skipped. SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 uv run pytest tests/canonical/ -v -k cli-todo passes with 5 passed, 1 skipped, 1 deselected. OUROBOROS_RUN_CANONICAL=1 still skips the live path as now documented. Shape-check happy paths are covered, but not all newly added schema validation failure branches or the linked-issue summary-output requirement have corresponding tests.
Design / Roadmap Gate
Design context aligns with the minimal-substrate/non-goals direction: no nightly CI, replay layer, cost budget, refresh policy, or divergence detection were added. The PR is traceable to #1170 L0-a and clearly defers live ouroboros_auto wiring, but it still misses the linked design-gate requirement for a copyable per-scenario summary line and lacks edge-case tests for the schema contract it introduces.
Merge Recommendation
- Do not merge yet; add the per-scenario summary output required by #1170 and focused tests for the loader’s malformed-fixture branches, then re-run the canonical pytest suite.
ouroboros-agent[bot]
No code changes; this refreshes PR automation after local verification confirmed the repair PR closes the Q00#1174 post-merge canonical-harness blockers without expanding L0-a behavior. Constraint: Latest ouroboros-agent design notes could not inspect the source snapshot/diff, so the PR needs a fresh review signal rather than another code change. Rejected: Add more reporter abstraction | The current formatter plus pytest terminal hook is the minimal Q00#1170 summary-line contract. Confidence: high Scope-risk: narrow Directive: Keep L0-a live invocation deferred until the L0-b wiring slice; do not turn this empty review refresh into behavior change. Tested: uv run pytest tests/canonical/ -v; uv run pytest tests/canonical/ -v -k cli-todo; uv run ruff check tests/canonical/; uv run ruff format --check tests/canonical/; uv run mypy tests/canonical/conftest.py tests/canonical/test_canonical.py tests/canonical/test_conftest.py Not-tested: Live ouroboros_auto invocation remains intentionally deferred to L0-b. Co-authored-by: OmX <omx@oh-my-codex.dev>
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
PR #1174
Branch: feat/prL0a | 6 files, +447/-0 | CI: Bridge TypeScript pass 18s https://github.com/Q00/ouroboros/actions/runs/26289815437/job/77386756519
Scope: architecture-level
HEAD checked: 05f2912ca184d6c433d5cb8eb8ee12f2221bbf85
What Improved
- Added a discoverable
tests/canonical/<slug>/fixture layout with deterministic parametrization. - Added hermetic shape checks for canonical scenario files and a nonempty matrix guard.
- Documented the intended split between cheap fixture validation and costly manual live runs.
Issue #N/A Requirements
| Requirement | Status |
|---|---|
Discover tests/canonical/<slug>/ directories automatically |
Satisfied |
| Validate fixture shape without LLM cost | Satisfied |
Ship initial cli-todo scenario |
Partially satisfied; fixture exists but its domain_class is not valid against current registered profiles |
Provide opt-in manual live run via OUROBOROS_RUN_CANONICAL=1 |
Not satisfied; the opt-in path still unconditionally skips |
| Meaningful tests for newly added acceptance logic | Partially satisfied; shape logic is covered, but catalog contract and live acceptance behavior are not |
Prior Findings Status
| Prior Finding | Status |
|---|---|
| Prior review context | MODIFIED — No prior PR #1174 review artifact was present in the supplied metadata. Historical checked-in review concerns about Codex skill resolution and run --runtime hermes were spot-checked at current HEAD and are withdrawn for this audit; the maintained concerns are modified to the canonical acceptance boundary added by this merge. |
Blockers
| # | File:Line | Severity | Confidence | Finding |
|---|---|---|---|---|
| 1 | tests/canonical/cli-todo/expected.yaml:14 | High | 95% | The first canonical scenario pins domain_class: cli, but current HEAD only registers built-in auto domain profiles named coding and research (DEFAULT_REGISTRY.get("cli") returns None). Because the new harness only validates lowercase string shape, the canonical fixture can pass while describing a domain the runtime cannot resolve, so the acceptance corpus is already drifted from the consumer contract it is meant to protect. |
| 2 | tests/canonical/test_canonical.py:124 | High | 90% | The opt-in live path still unconditionally skips after OUROBOROS_RUN_CANONICAL=1 is set, and pytest exits green (6 passed, 1 skipped). That makes the documented manual acceptance command unable to exercise ouroboros_auto or fail on product/runtime regressions, so the merged “acceptance harness” is not yet an acceptance gate. |
Follow-ups
| # | File:Line | Priority | Confidence | Suggestion |
|---|---|---|---|---|
| — | — | — | — | None. |
Test Coverage
- Verified
uv run pytest tests/canonical/ -qpasses only shape tests:6 passed, 1 skipped. - Verified
OUROBOROS_RUN_CANONICAL=1 uv run pytest tests/canonical/ -qstill reports6 passed, 1 skipped, proving the live gate is not wired. - Verified current domain registry contains
['coding', 'research']and nocliprofile. - Missing coverage: cross-validation of
expected.yamlagainst the current domain/profile catalog. - Missing coverage: any live or mocked
ouroboros_autoinvocation assertion for terminal state, completion mode, or runtime probe expectations.
Design / Roadmap Gate
Affected-boundary reasoning: this PR adds a canonical acceptance surface, so the relevant contract is not just whether YAML parses. The boundary includes scenario metadata producers/consumers, current auto domain/profile resolution, manual live-run semantics, and operator interpretation of pytest success. At current HEAD, the fixture corpus can encode a domain the runtime does not know, and the opt-in live command cannot execute the product path. That leaves the SSOT acceptance boundary observational rather than enforceable.
Merge Recommendation
Post-merge corrective work is needed: align canonical scenario metadata with the current domain catalog, add a catalog cross-validation test, and make the opt-in live path either execute/assert the acceptance run or fail explicitly instead of green-skipping.
Review-Metadata:
verdict: REQUEST_CHANGES
github_event: COMMENT
review_kind: post_merge_audit
merge_eligible: false
head_sha: 05f2912
source_read_ok: true
diff_read_ok: true
blocking_count: 0
Summary
L0-a slice of #1170 — minimal manual acceptance harness per #1157's SSOT acceptance gate. Discovers
tests/canonical/<slug>/directories, validates fixture shape, and parametrizes per-scenario test bodies. Ships one initial scenario (cli-todo).The harness is intentionally narrow per the 2026-05-22 minimal-substrate audit (#1157):
OUROBOROS_RUN_CANONICAL=1.What lands
tests/canonical/__init__.py(empty package marker).tests/canonical/README.md— both cost regimes + scenario directory contract.tests/canonical/conftest.py— scenario discovery,CanonicalScenariofrozen dataclass,pytest_generate_testshook so new scenarios auto-parametrize,live_run_enabledfixture gating live invocation.tests/canonical/test_canonical.py— 6 shape-check assertions per scenario + live-run skip placeholder.tests/canonical/cli-todo/goal.txt+expected.yaml— first canonical scenario.What is NOT in this PR
ouroboros_autoinvocation wiring — follow-up sub-PR.webhook-receiver,vertical-slice-refactor,2d-kart-racer) — L0-b/c/d follow-ups.Test plan
uv run pytest tests/canonical/ -v→ 6 passed, 1 skipped.uv run ruff check tests/canonical/→ clean.uv run ruff format tests/canonical/→ clean.uv run mypy tests/canonical/conftest.py tests/canonical/test_canonical.py→ clean.Refs #1157 (L0 lane), #1170 (L0 design issue), #1173 (L1-a catalog data — cross-validation tests follow).