fix(tests): align canonical live-run evidence#1195
Conversation
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: APPROVE
Reviewing commit
ae8850cfor PR #1195
Review record:
afd367e6-14cf-4d95-8e39-12474bd9ce6f
Blocking Findings
No in-scope blocking findings remained after policy filtering.
Non-blocking Suggestions
None.
Design Notes
Unable to assess architecture or implementation design because the PR artifacts and changed files were not readable in the current command sandbox.
Policy Notes
- Omitted 1 finding(s) that referenced files outside the current PR changed-files scope.
Recovery Notes
First recoverable review artifact generated from codex analysis log.
Reviewed by ouroboros-agent[bot] via Codex deep analysis
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
PR #1195
Branch: fix/canonical-live-evidence-docs | 3 files, +23/-24 | CI: Bridge TypeScript pass 12s https://github.com/Q00/ouroboros/actions/runs/26325082985/job/77501196509
Scope: architecture-level
HEAD checked: ae8850cbd851fc4c0d7ad48322b3c50f563ce9a9
What Improved
- Canonical summary marker now reports
live=opt_ininstead oflive=deferred_l0b. - README copy now describes the live canonical run as an explicit maintainer opt-in.
- Summary-line regression tests were updated to pin the new marker.
Issue #N/A Requirements
| Requirement | Status |
|---|---|
Change canonical summary marker from live=deferred_l0b to live=opt_in |
Met |
| Update README copy so canonical live run is no longer described as deferred to L0-b | Partially met; README is updated, but tests/canonical/conftest.py:3 still describes the live invocation as future work |
| Update summary-line regression tests | Met |
| Remove stale deferred-live evidence from canonical harness boundary | Not met |
Prior Findings Status
| Prior Finding | Status |
|---|---|
| Prior review context | MODIFIED — No prior review concerns were available in the provided artifacts, so none were maintained, modified, or withdrawn. |
Blockers
| # | File:Line | Severity | Confidence | Finding |
|---|---|---|---|---|
| 1 | tests/canonical/conftest.py:3 | Medium | 95% | The PR is meant to remove stale “live path is deferred” evidence, but the canonical harness module still says this is the “L0-a slice” and that the actual ouroboros_auto invocation lands in a follow-up sub-PR at lines 3-6. Current HEAD already has the live invocation in tests/canonical/test_canonical.py:178 and calls it at tests/canonical/test_canonical.py:246, while the updated README now tells maintainers the path is wired. Leaving this contradiction in the harness boundary preserves exactly the stale acceptance evidence this PR claims to fix. |
Follow-ups
| # | File:Line | Priority | Confidence | Suggestion |
|---|---|---|---|---|
| — | — | — | — | None. |
Test Coverage
- Ran
SETUPTOOLS_SCM_PRETEND_VERSION=0.39.2 uv run pytest tests/canonical/ -q: 17 passed, 1 skipped. - Ran
SETUPTOOLS_SCM_PRETEND_VERSION=0.39.2 uv run pytest tests/canonical/ -q -k cli-todo: 8 passed, 1 skipped, 9 deselected. - Ran
SETUPTOOLS_SCM_PRETEND_VERSION=0.39.2 uv run ruff check tests/canonical/: passed. - Ran
SETUPTOOLS_SCM_PRETEND_VERSION=0.39.2 uv run ruff format --check tests/canonical/: passed.
Design / Roadmap Gate
Affected-boundary review covered the copyable summary contract, README operator guidance, pytest terminal summary, and the opt-in live-run boundary. The functional summary marker is updated and covered by tests, but the same harness module still carries stale deferred-live documentation, so the acceptance evidence boundary remains internally inconsistent.
Merge Recommendation
Do not merge until the stale tests/canonical/conftest.py module docstring is updated to match the current opt-in live-run contract.
Review-Metadata:
verdict: REQUEST_CHANGES
github_event: COMMENT
review_kind: full
merge_eligible: false
head_sha: ae8850c
source_read_ok: true
diff_read_ok: true
blocking_count: 0
shaun0927
left a comment
There was a problem hiding this comment.
Pure docs/evidence-line correction matching the post-#1191 reality. #1157 explicitly names this PR as the remaining L0 cleanup. The live=deferred_l0b → live=opt_in flip is the right copyable summary marker now that OUROBOROS_RUN_CANONICAL=1 is wired, and the README scrub of the "Once live wiring lands in L0-b" / "skips with a typed reason" wording is consistent with the canonical README "live-run path" section. Tests already pin the new marker. Safe to merge first.
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: APPROVE
Metadata
| Field | Value |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.
---|---|
| PR | #1195 |
| HEAD checked | ae8850cbd851fc4c0d7ad48322b3c50f563ce9a9 |
| Request ID | req_1779668968_23 |
| Review record | 648bfd52-daba-4ab4-89b6-8a891aa5df8b |
What Improved
- Updates the canonical acceptance harness status from deferred live execution to explicit opt-in live execution.
- Aligns README examples and summary-line tests with the current
OUROBOROS_RUN_CANONICAL=1behavior.
Issue Requirements
| Requirement | Status |
|---|---|
| #1157 / #1170 canonical acceptance harness manual opt-in live path | Satisfied by the existing OUROBOROS_RUN_CANONICAL=1 live test path and this PR’s updated summary/docs language. |
Prior Findings Status
No prior human or inline review comments were present in the provided artifacts.
Blockers
No in-scope blocking findings remained after policy filtering.
Follow-up Findings
| # | File:Line | Priority | Confidence | Suggestion |
|---|---|---|---|---|
| None. |
Non-blocking Suggestions
| 1 | tests/canonical/conftest.py:3 | Documentation | The module docstring still says the actual ouroboros_auto invocation lands in a follow-up sub-PR, but tests/canonical/test_canonical.py already contains the opt-in live invocation path. This is stale commentary only; the changed summary-line contract is correct. |
Test Coverage Notes
- Reviewed changed tests in
tests/canonical/test_conftest.pyand surrounding live-run coverage intests/canonical/test_canonical.py. - Could not execute the focused pytest suite in this container:
uvis unavailable,pythonis unavailable, andpython3has nopytestinstalled.
Design Notes
The change is narrow and matches the current harness architecture: hermetic shape/catalog checks remain the default, while live LLM execution is clearly opt-in.
Design / Roadmap Gate
Affected boundary is the canonical harness reporting contract. The changed summary string, README examples, and unit assertions agree with the existing runtime boundary in test_scenario_live_run_or_skip: no live execution occurs unless OUROBOROS_RUN_CANONICAL is set.
Directional Notes
Maintainer-memory signals pushed review focus toward evidence-backed acceptance behavior and avoiding misleading status claims. Current source supports live=opt_in; no blocker found.
Test Coverage
- Reviewed changed tests in
tests/canonical/test_conftest.pyand surrounding live-run coverage intests/canonical/test_canonical.py. - Could not execute the focused pytest suite in this container:
uvis unavailable,pythonis unavailable, andpython3has nopytestinstalled.
Merge Recommendation
Approve. No blocking runtime, contract, persistence, or test-harness regressions were found in the changed files or affected canonical test path.
Review-Metadata:
verdict: APPROVE
head_sha: ae8850c
request_id: req_1779668968_23
review_profile: memory-aware-zero-trust-v2
advisory_memory_only: true
Reviewed by ouroboros-agent[bot] via Codex deep analysis
Summary
ouroboros_autopath.live=deferred_l0btolive=opt_in.Review context
This follows up the post-v0.39.1 canonical harness sequence: #1174 added the L0 skeleton, #1182 added copyable summary evidence, and #1191 wired the opt-in live run. After #1191, the README and summary line still said the live run was deferred to L0-b, which could mislead maintainers reading the canonical acceptance evidence.
Validation
SETUPTOOLS_SCM_PRETEND_VERSION=0.39.2 uv run pytest tests/canonical/ -q-> 17 passed, 1 skippedSETUPTOOLS_SCM_PRETEND_VERSION=0.39.2 uv run pytest tests/canonical/ -q -k cli-todo-> 8 passed, 1 skipped, 9 deselectedSETUPTOOLS_SCM_PRETEND_VERSION=0.39.2 uv run ruff check tests/canonical/-> passedSETUPTOOLS_SCM_PRETEND_VERSION=0.39.2 uv run ruff format --check tests/canonical/-> passedgit diff --check-> passedrg -n "deferred_l0b|Once the live wiring lands|actual invocation lands|once L0-b lands|skips with a typed reason" tests/canonical-> no matches