fix: resolve post-merge review blocker foundations#1205
Conversation
014e34b to
04816f6
Compare
04816f6 to
bf55d40
Compare
shaun0927
left a comment
There was a problem hiding this comment.
Thanks for the omnibus — most of the included fixes are clean (_matches_web_service independence, defaulted_sections CLI render, status --limit > 0, mechanical-eval evidence linkage, TimeoutExpired.stdout bytes handling, workflow lifecycle same-ts restart). I'd like to land it, but five hunks need to come out first because they either reverse a merged design decision in the ooo auto SSOT (#1157) or collide with an actively-open sibling PR explicitly named by that SSOT as the canonical landing spot.
Reversals of merged decisions
1. DeferredProbe.passed flipped True→False (reverses #1181)
The class docstring on main is explicit: "returns a probe-PASS RuntimeEvidence with outcome='deferred' so the verifier can flag the gap without failing the grade." Combined with the #1190 contract ("explicit probe passed=False ... downgrade the terminal to BLOCKED"), flipping to False makes every canonical scenario whose TaskClassProfile includes a deferred probe kind block PRODUCT_COMPLETE. That's a global regression of the L3 minimal-substrate behavior, not a safety improvement.
2. L1-b LIBRARY fallback removed (reverses #1177 + #1188)
DomainInference.single is documented on main as "Return the single matched class, or fallback for unmatched" and derive_domain_from_ledger returns fallback=LIBRARY, reason='unmatched'. #1188's body explicitly states "unmatched → apply the safe LIBRARY fallback from L1-b". Switching to next(iter(...)) if is_single else None silently drops the safest-completion-gate fallback; the test rewrite that asserts active_task_class is None for unmatched cements the reversal. If we want to retire LIBRARY-as-fallback that should be an own issue + a DomainInference.single change at the source, not a quiet flip at the consumer.
Collisions with sibling open PRs (#1157 explicitly names them)
3. state.transition() last_error_code clearing collides with #1194
#1157 names #1194 as the remaining L2 follow-up for stop_reason_code clearing. #1194's condition is if next_phase is not AutoPhase.BLOCKED (clears only when leaving BLOCKED); this PR uses if error is None (clears on any error-less transition). The semantics diverge on transition(BLOCKED, msg, error=None) and on non-BLOCKED→non-BLOCKED with an explicit error string. Two PRs writing the same hunk with different conditions will conflict on merge. Suggest dropping this hunk and letting #1194 land first.
4. active_task_class MCP meta collides with #1196
#1157 names #1196 as the remaining L1 MCP-metadata cleanup. #1196's whole point is "Preserve the field as null when inference was ambiguous or unavailable, so clients can distinguish an empty computed value from a missing protocol field." That requires unconditional meta["active_task_class"] = result.active_task_class. The conditional if result.active_task_class is not None here silently opts out of that contract.
5. firewall setdefault → direct assignment reverses #1193
#1193 just landed these three env vars with env.setdefault(...) so downstream entrypoints could pre-seed values. Forcing assignment removes that affordance without a follow-up issue documenting the new contract. If "stale override leaks" is the concern, it should be tracked separately — #1193 was a deliberate setdefault choice.
Suggested path to merge-ready
Apply the five suggestions below (one per hunk), then this PR carries only the legitimate scope:
_matches_web_serviceindependence (good — restores #1177 pattern-independence intent)defaulted_sectionsCLI renderingstatus --limit > 0validation- mechanical-eval
evidence_event_idslock TimeoutExpired.stdout/stderrbytes-safe decode- workflow lifecycle same-timestamp restart ordering + regression test
- MCP meta surface for
interview_closure_mode+runtime_probe_evidence(these aren't covered by #1194/#1196) product_status='not_verified_complete'downgrade for unfinished Ralph plugin runs — please confirm there's a documented decision for this; if so a one-line comment citing the issue would help future readers.
Suggested merge order to avoid conflicts: #1195 → #1196 → #1194 → #1206 → #1213 → (this PR rebased).
| return RuntimeEvidence( | ||
| probe_kind=self.kind, | ||
| passed=True, | ||
| passed=False, |
There was a problem hiding this comment.
Reverts the #1181 contract. The class docstring three lines up still says "returns a probe-PASS ... without failing the grade", so on main consumers depend on PASS. Flipping to False combined with #1190's passed=False → BLOCKED rule will block every PRODUCT_COMPLETE that touches a deferred probe kind. Please keep the PASS semantics here and open a follow-up issue if there's evidence the deferred placeholder should fail-closed.
| passed=False, | |
| passed=True, |
| # a disambiguation hook resolves the class. | ||
| _inference = derive_domain_from_ledger(ledger) | ||
| _task_class = _inference.single | ||
| _task_class = next(iter(_inference.classes)) if _inference.is_single else None |
There was a problem hiding this comment.
DomainInference.single on main is documented as "Return the single matched class, or fallback for unmatched", and derive_domain_from_ledger returns fallback=LIBRARY with reason='unmatched' precisely so the consumer here applies the safe completion gate. Switching to next(iter(_inference.classes)) if _inference.is_single else None drops the unmatched→LIBRARY path that #1177 / #1188 deliberately wired up — and the test edit later in this PR flips test_envelope_uses_library_fallback_when_unmatched to assert no fallback, which is a behavioral reversal. Please revert to the property accessor (or, if the fallback should be retired, raise that as its own issue against DomainInference.single).
| _task_class = next(iter(_inference.classes)) if _inference.is_single else None | |
| _task_class = _inference.single |
| self.last_progress_message = message | ||
| self.last_error = error | ||
| if error is None: | ||
| self.last_error_code = None |
There was a problem hiding this comment.
#1157 names #1194 as the canonical landing spot for stop-reason-code clearing, and #1194's hunk uses if next_phase is not AutoPhase.BLOCKED. The two conditions diverge on transition(BLOCKED, msg, error=None) (this PR clears; #1194 preserves) and on non-BLOCKED → non-BLOCKED with an explicit error string (this PR preserves; #1194 clears). Two open PRs writing the same hunk with different conditions will conflict on merge. Suggest dropping this clearing here and letting #1194 land first (the test additions in test_stop_reason_code.py can stay if they pass against #1194's condition).
| self.last_error_code = None | |
| self.last_error = error |
| ) | ||
| if plugin_home is not None: | ||
| env.setdefault("OUROBOROS_PLUGIN_HOME", str(plugin_home)) | ||
| env["OUROBOROS_PLUGIN_HOME"] = str(plugin_home) |
There was a problem hiding this comment.
#1193 just landed these three env vars with env.setdefault(...) so downstream entrypoints could pre-seed values (e.g. for an .omx-style overlay or test harness). Forcing direct assignment removes that affordance and quietly reverses #1193's API choice. If the goal is to prevent stale override leaks across invocations, that needs its own issue and a documented contract change — it shouldn't ride in as an unannounced revert.
| env["OUROBOROS_PLUGIN_HOME"] = str(plugin_home) | |
| env.setdefault("OUROBOROS_PLUGIN_WORKDIR", str(workdir)) | |
| env.setdefault( | |
| "OUROBOROS_PLUGIN_OUTPUT_DIR", | |
| str(workdir / ".ouroboros" / "plugin-artifacts" / manifest.name), | |
| ) | |
| if plugin_home is not None: | |
| env.setdefault("OUROBOROS_PLUGIN_HOME", str(plugin_home)) |
| if result.interview_closure_mode is not None: | ||
| meta["interview_closure_mode"] = result.interview_closure_mode | ||
| if result.active_task_class is not None: | ||
| meta["active_task_class"] = result.active_task_class |
There was a problem hiding this comment.
Collides with open PR #1196, which #1157 names as the canonical MCP-metadata cleanup for active_task_class. #1196's explicit decision is that the field is always present, null when unavailable, so clients can tell ambiguous inference apart from a missing protocol field. The conditional if ... is not None here silently downgrades to the looser contract. Please drop this two-line block and let #1196 own the surfacing (its insertion is at the unconditional _result_meta body section).
| meta["active_task_class"] = result.active_task_class | |
| if result.interview_closure_mode is not None: | |
| meta["interview_closure_mode"] = result.interview_closure_mode |
Drops the five hunks called out in PR review against #1157 SSOT / merged design decisions, while keeping every legitimate fix in this PR: - Restore DeferredProbe.passed=True (#1181 contract: probe-PASS placeholder so the L3 verifier flags the gap without failing the grade). - Restore `_inference.single` consumer call so unmatched ledgers still apply the safe LIBRARY fallback (#1177 + #1188 decision). - Drop the `if error is None` last_error_code clearing — sibling PR #1194 owns this hunk with the stricter `next_phase is not BLOCKED` condition. - Restore `env.setdefault(...)` for the three plugin runtime env vars introduced by #1193 so downstream entrypoints can still pre-seed. - Drop the conditional `active_task_class` MCP meta block — sibling PR #1196 owns it with unconditional null surfacing (clients need to tell ambiguous-inference apart from a missing protocol field). Kept fixes: `_matches_web_service` independence, `defaulted_sections` CLI rendering, `status --limit > 0` validation, mechanical-eval evidence linkage, TimeoutExpired stdout/stderr bytes-safe decode, workflow lifecycle same-timestamp restart ordering, MCP meta surfacing for `interview_closure_mode` + `runtime_probe_evidence`, plugin-mode Ralph `product_status=not_verified_complete` downgrade. Verification: - uv run pytest tests/unit/auto/test_pipeline_task_class_envelope.py tests/unit/auto/test_domain_inference.py tests/unit/auto/test_surface.py tests/unit/orchestrator/test_runtime_evidence.py tests/unit/cli/test_status_run.py tests/unit/cli/test_auto_command.py tests/unit/plugin/test_firewall.py tests/unit/orchestrator/test_workflow_lifecycle_events.py tests/integration/test_mechanical_eval_projection.py tests/unit/auto/test_stop_reason_code.py -q → 243 passed - uv run ruff check src tests → passed - uv run ruff format --check src tests → passed
shaun0927
left a comment
There was a problem hiding this comment.
Force push 70e2ef7 applies all five revert suggestions from the previous review:
DeferredProbe.passed=Truerestored (#1181 contract)_inference.singlerestored (#1177 + #1188 LIBRARY fallback)state.transition()last_error_code hunk dropped (let #1194 own it)firewall.py setdefaultrestored (#1193 contract)auto_handler.pyconditionalactive_task_classdropped (let #1196 own it)
Test envelope assertion and DeferredProbe test reverted to match. CI is green across the 8 checks. The PR now carries only the legitimate scope I called out as keepers in the previous review.
LGTM. Ready to merge after the smaller sibling PRs (#1195 → #1196 → #1194) land — those don't conflict with this branch but landing them first keeps the SSOT history readable.
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: APPROVE
Metadata
| Field | Value |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.
---|---|
| PR | #1205 |
| HEAD checked | 70e2ef725869f2f765f1c97a0de2ef6ba4e22119 |
| Request ID | req_1779687957_1 |
| Review record | 5caf20cb-47b8-4275-ad7e-fe7495548957 |
What Improved
- Surfaces defaulted ledger sections in CLI output.
- Tightens
status run --limitmalformed-input handling before MCP dispatch. - Preserves timeout stdout/stderr previews for runtime probes.
- Adds MCP metadata for interview closure mode, runtime probe evidence, and unverified plugin-mode Ralph completions.
- Adjusts workflow lifecycle replay ordering for same-timestamp terminal/restart boundaries.
Issue Requirements
| Requirement | Status |
|---|---|
| No linked issue requirement captured | N/A |
Prior Findings Status
Prior inline concerns were checked against the current diff. The earlier runtime deferred-probe PASS concern is not present in the current hunk; pipeline/state/firewall hunks referenced by previous comments are no longer in this diff; the active-task-class MCP metadata conflict is also not present. No prior blocker was re-raised.
Blockers
No in-scope blocking findings remained after policy filtering.
Follow-up Findings
| # | File:Line | Priority | Confidence | Suggestion |
|---|---|---|---|---|
| 1 | src/ouroboros/mcp/tools/auto_handler.py:1302 | Low | Medium | Add a focused assertion for the new runtime_probe_evidence MCP metadata shape. The implementation is straightforward, but this is now a public client contract and only adjacent surface fixtures were updated. |
Non-blocking Suggestions
| 1 | src/ouroboros/mcp/tools/auto_handler.py:1302 | Test coverage | Nice-to-have direct test for serialized runtime probe evidence in _result_meta. |
Test Coverage Notes
- Reviewed changed tests and affected existing suites: domain inference, CLI status run, workflow lifecycle projection/conformance, mechanical eval projection, auto MCP/CLI surface tests.
- Could not execute pytest locally:
pythonis absent,python3is 3.11, andpytestis not installed; importing some project modules also hits newer Python syntax.
Design Notes
The changes are additive or narrowly corrective at existing boundaries: CLI presentation, MCP result metadata, runtime probe payload normalization, and lifecycle replay ordering. No new persistence model or cross-layer ownership split was introduced.
Design / Roadmap Gate
Affected boundaries reviewed: CLI-to-MCP argument validation, MCP metadata compatibility, runtime probe evidence serialization, workflow lifecycle replay ordering, and mechanical projection evidence links. The current changes do not create a validated contract, persistence, replay, or compatibility failure. Same-timestamp lifecycle ambiguity remains guarded in conformance/next-runnable paths.
Directional Notes
Maintainer-memory emphasis on runtime reality over optimistic surfaces shaped the check of product-completion metadata and runtime evidence surfacing. Current HEAD evidence supports the unverified plugin completion marker and timeout output preservation.
Test Coverage
- Reviewed changed tests and affected existing suites: domain inference, CLI status run, workflow lifecycle projection/conformance, mechanical eval projection, auto MCP/CLI surface tests.
- Could not execute pytest locally:
pythonis absent,python3is 3.11, andpytestis not installed; importing some project modules also hits newer Python syntax.
Merge Recommendation
Approve. I found no blocking runtime, API-contract, persistence, or replay issue in the current changed source. The only remaining item is a non-blocking test hardening suggestion for the new MCP runtime evidence metadata.
Review-Metadata:
verdict: APPROVE
head_sha: 70e2ef7
request_id: req_1779687957_1
review_profile: memory-aware-zero-trust-v2
advisory_memory_only: true
Reviewed by ouroboros-agent[bot] via Codex deep analysis
Summary
Original PR blocker coverage
Addresses blocker foundations for #1151, #1148, #1146, #1193, #1191, #1188, #1181, #1177, #1133, #1134, and #1132.
Verification