test(canonical): emit L0 summary and lock fixture failures#1182
Conversation
Address the post-merge Q00#1174 review by adding the missing L0-a copyable scenario summary output and by locking malformed fixture rejection paths with focused tests. Keep the live ouroboros_auto invocation deferred to L0-b so the fix stays within the minimal-substrate boundary.\n\nConstraint: PR Q00#1174 was already merged, so follow-up work must land through a new repair PR.\nRejected: Wire live ouroboros_auto execution now | That would expand beyond the L0-a boundary and reintroduce token-cost behavior before L0-b.\nConfidence: high\nScope-risk: narrow\nDirective: Keep future live-run changes behind OUROBOROS_RUN_CANONICAL and preserve the copyable summary-line contract.\nTested: uv run pytest tests/canonical/ -v; uv run ruff check tests/canonical/; uv run ruff format --check tests/canonical/; uv run mypy tests/canonical/conftest.py tests/canonical/test_canonical.py tests/canonical/test_conftest.py\nNot-tested: Live ouroboros_auto invocation remains intentionally deferred to L0-b.
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: APPROVE
Reviewing commit
77a49d8for PR #1182
Review record:
acea178c-8ffe-4b41-9d04-168694108600
Blocking Findings
No in-scope blocking findings remained after policy filtering.
Non-blocking Suggestions
None.
Design Notes
Unable to assess architecture or implementation because the source snapshot and diff could not be read in this environment.
Policy Notes
- Omitted 1 finding(s) that referenced files outside the current PR changed-files scope.
Recovery Notes
First recoverable review artifact generated from codex analysis log.
Reviewed by ouroboros-agent[bot] via Codex deep analysis
Merge-readiness rationale — follow-up to merged #1174I re-audited merged #1174 against the #961 AgentOS roadmap SSOT, #1157 What this PR fixesThe latest #1174 bot review identified two remaining issues:
This PR addresses both directly:
Why this stays aligned with the SSOTs
Over-engineering checkThis is intentionally small: one formatter, one pytest terminal-summary hook, one focused test module, and one README example. It does not introduce a custom reporter framework or premature live-run plumbing. The VerificationLocal verification on
GitHub checks are also all green: Ruff Lint, MyPy, Python tests for 3.12/3.13/3.14, Bridge TypeScript, enforce-envelope, and enforce-boundary. Bot / review status
VerdictReady to merge. This PR closes the actionable post-merge gaps from #1174 without expanding L0-a beyond the approved minimal-substrate boundary. |
PR Review SummaryVerdictApprove Scope Reviewed
Blocking IssuesNone. WarningsNone. Mutation-Test Thinking
Complexity / CRAP-style Risk
Test Quality Assessment (6/7)
Security / Operational RiskNone. The PR remains test/docs-only, uses the existing Looks Good
Final RecommendationAPPROVE — ready to merge. The PR is narrow, evidence-backed, and fixes the merged #1174 gaps without introducing over-engineering or roadmap drift. |
No code changes; this refreshes PR automation after local verification confirmed the repair PR closes the Q00#1174 post-merge canonical-harness blockers without expanding L0-a behavior. Constraint: Latest ouroboros-agent design notes could not inspect the source snapshot/diff, so the PR needs a fresh review signal rather than another code change. Rejected: Add more reporter abstraction | The current formatter plus pytest terminal hook is the minimal Q00#1170 summary-line contract. Confidence: high Scope-risk: narrow Directive: Keep L0-a live invocation deferred until the L0-b wiring slice; do not turn this empty review refresh into behavior change. Tested: uv run pytest tests/canonical/ -v; uv run pytest tests/canonical/ -v -k cli-todo; uv run ruff check tests/canonical/; uv run ruff format --check tests/canonical/; uv run mypy tests/canonical/conftest.py tests/canonical/test_canonical.py tests/canonical/test_conftest.py Not-tested: Live ouroboros_auto invocation remains intentionally deferred to L0-b. Co-authored-by: OmX <omx@oh-my-codex.dev>
Final merge-readiness rationale — PR #1182 @
|
PR Review SummaryVerdictApprove Scope Reviewed
Blocking IssuesNone. WarningsNone. Mutation-Test Thinking
Complexity / CRAP-style Risk
Test Quality Assessment (6/7)
Security / Operational RiskNone. This is a test/docs-only repair. It introduces no network calls, subprocess execution, credential handling, live LLM invocation, filesystem mutation outside pytest fixture loading, or production runtime behavior. YAML parsing remains on the existing Looks Good
Final RecommendationAPPROVE — ready to merge. PR #1182 is a narrow, evidence-backed repair that makes the canonical harness evidence copyable and locks the loader failure contract without over-engineering or drifting from the AgentOS SSOT direction. |
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Automated queue-authoritative review run completed for this PR.
Summary
- Trigger: manual.requeue
- Queue reason: manual.requeue
- Review kind: recovery
- GitHub review event: COMMENT
- Merge eligible: false
Blockers
None identified in this automated pass.
Design / Roadmap Gate
Review scope uses affected-boundary analysis, not changed-lines-only. State machine, persistence, replay, and consumer contract surfaces remain in scope even when the exact lines are outside the diff.
Review-Metadata:
verdict: REQUEST_CHANGES
github_event: COMMENT
review_kind: recovery
merge_eligible: false
head_sha: b1f6126
source_read_ok: true
diff_read_ok: true
blocking_count: 1
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Automated queue-authoritative review run completed for this PR.
Summary
- Trigger: review.stale_requeue
- Queue reason: stale_head_sha_requeued
- Review kind: full
- GitHub review event: COMMENT
- Merge eligible: false
Blockers
None identified in this automated pass.
Design / Roadmap Gate
Review scope uses affected-boundary analysis, not changed-lines-only. State machine, persistence, replay, and consumer contract surfaces remain in scope even when the exact lines are outside the diff.
Review-Metadata:
verdict: REQUEST_CHANGES
github_event: COMMENT
review_kind: full
merge_eligible: false
head_sha: b1f6126
source_read_ok: true
diff_read_ok: true
blocking_count: 0
Add a direct test for the pytest terminal-summary hook so the Q00#1170 copyable evidence line cannot disappear while the pure formatter still passes. Constraint: PR Q00#1182 must stay test/docs-only and keep live ouroboros_auto deferred to L0-b. Rejected: A pytester subprocess test | the hook is thin and a fake reporter proves the behavior without adding slow harness infrastructure. Confidence: high Scope-risk: narrow Directive: Keep future reporter expansion evidence-driven; do not introduce a custom reporter framework for L0-a. Tested: uv run pytest tests/canonical/ -v; uv run pytest tests/canonical/ -v -k cli-todo; uv run ruff check tests/canonical/; uv run ruff format --check tests/canonical/; uv run mypy tests/canonical/conftest.py tests/canonical/test_canonical.py tests/canonical/test_conftest.py Not-tested: Live ouroboros_auto invocation remains deferred to L0-b. Co-authored-by: OmX <omx@oh-my-codex.dev>
Independent merge-readiness rationale — PR #1182I re-reviewed this PR against the #961 AgentOS SSOT, #1157 minimal-substrate direction, and #1170 L0 canonical-harness contract. I ignored bot verdicts and checked the implementation and tests directly. What this PR does:
Why this is aligned with #961 and not over-engineered:
Independent checks I ran locally:
Merge recommendation: after the direct terminal-summary hook regression test, this PR is mergeable as the correct minimal repair for the canonical harness L0-a slice. |
PR Review SummaryVerdictApprove Scope Reviewed
Blocking IssuesNone. WarningsNone. Mutation-Test Thinking
Complexity / CRAP-style Risk
Test Quality Assessment (6/7)
Security / Operational RiskNone. This PR only touches tests/docs and does not alter runtime execution, permissions, secrets, network calls, or production state. Looks Good
Final RecommendationApprove and merge. The PR now has direct evidence for the manual summary contract, adequate malformed-fixture coverage, low complexity, and no runtime/security risk. |
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: APPROVE
Reviewing commit
c75aa2bfor PR #1182
Review record:
f35c0cd2-218b-42ea-913a-2bddd8286bc5
Blocking Findings
No in-scope blocking findings remained after policy filtering.
Non-blocking Suggestions
None.
Design Notes
Unable to complete the review: every filesystem read command failed before execution because the sandbox wrapper could not create a namespace (bwrap: No permissions to create a new namespace). I did not run any git commands.
Recovery Notes
First recoverable review artifact generated from codex analysis log.
Reviewed by ouroboros-agent[bot] via Codex deep analysis
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
PR #1182
Branch: fix/pr1174-canonical-harness-review | 3 files, +170/-1 | CI: Bridge TypeScript pass 12s https://github.com/Q00/ouroboros/actions/runs/26318715515/job/77483301775
Scope: architecture-level
HEAD checked: c75aa2b8deb9f6658194fc3495a395955c5050b8
What Improved
- Added direct coverage for the canonical pytest terminal-summary hook, proving it emits the copyable per-scenario line through the reporter boundary.
- Existing malformed-fixture tests cover missing
expected.yaml, emptygoal.txt, YAML parse failures, non-mapping YAML, missing required keys, and invalidcompletion_mode. - The canonical suite now produces the documented summary line in an actual local pytest run.
Issue #N/A Requirements
| Requirement | Status |
|---|---|
| Emit one copyable canonical summary line per discovered scenario | Satisfied by tests/canonical/conftest.py:217 and covered by tests/canonical/test_conftest.py:96. |
Keep L0-a hermetic and defer live ouroboros_auto invocation |
Satisfied; live path still skips at tests/canonical/test_canonical.py:124. |
| Cover malformed fixture rejection paths | Satisfied by focused loader tests in tests/canonical/test_conftest.py:40 through tests/canonical/test_conftest.py:80. |
| Document manual summary-line contract | Satisfied in tests/canonical/README.md:33 through tests/canonical/README.md:40. |
Prior Findings Status
| Prior Finding | Status |
|---|---|
| Prior review context | MODIFIED — Prior concerns about the terminal summary hook being behavior-defining but unproven are withdrawn: tests/canonical/test_conftest.py:96 now directly exercises pytest_terminal_summary, and the local pytest run emitted the documented line. Prior concerns about malformed fixture rejection coverage remain resolved by the existing tests at tests/canonical/test_conftest.py:40, tests/canonical/test_conftest.py:46, tests/canonical/test_conftest.py:56, tests/canonical/test_conftest.py:62, tests/canonical/test_conftest.py:68, and tests/canonical/test_conftest.py:74. |
Blockers
| # | File:Line | Severity | Confidence | Finding |
|---|
Follow-ups
| # | File:Line | Priority | Confidence | Suggestion |
|---|---|---|---|---|
| — | — | — | — | None. |
Test Coverage
- Ran
uv run pytest tests/canonical/ -v: 14 passed, 1 skipped. - CI artifact shows Bridge TypeScript, MyPy, Ruff, Python 3.12/3.13/3.14 tests, enforce-boundary, and enforce-envelope all passed.
- Live
ouroboros_autoexecution remains intentionally skipped attests/canonical/test_canonical.py:124for L0-a/L0-b scope separation.
Design / Roadmap Gate
Affected-boundary review covered scenario discovery, fixture loading, pytest parametrization, terminal-summary reporting, documentation, and the deferred live-run surface. The reporter boundary now has test evidence, malformed fixture paths fail as test failures through _load_scenario, and the live state-machine/persistence/replay boundary remains explicitly out of scope for L0-a rather than partially wired. No current-HEAD blocker found in the audited boundary.
Merge Recommendation
Post-merge audit comment only: no revert or corrective follow-up is required for PR #1182 based on current HEAD evidence.
Review-Metadata:
verdict: REQUEST_CHANGES
github_event: COMMENT
review_kind: post_merge_audit
merge_eligible: false
head_sha: c75aa2b
source_read_ok: true
diff_read_ok: true
blocking_count: 0
Summary
Follow-up repair for merged #1174 after ouroboros-agent[bot]'s latest substantive review found two remaining L0-a blockers:
This PR keeps the #1157/#1170 minimal-substrate boundary intact: it does not wire live
ouroboros_auto, does not add CI/replay/cost-budget machinery, and leaves live invocation deferred to L0-b.What changed
Adds
format_canonical_summary_line()and apytest_terminal_summaryhook that prints one copyable line per discovered scenario, e.g.:Documents that summary line in
tests/canonical/README.md.Adds focused tests for malformed fixture cases:
expected.yamlgoal.txtcompletion_modeAdds a formatter test pinning the copyable summary line.
Roadmap / design alignment
Test plan
uv run pytest tests/canonical/ -v→ 13 passed, 1 skipped, with canonical summary line printed.uv run ruff check tests/canonical/→ clean.uv run ruff format --check tests/canonical/→ clean.uv run mypy tests/canonical/conftest.py tests/canonical/test_canonical.py tests/canonical/test_conftest.py→ clean.Refs #1174, #1170, #1157, #961.