fix: resolve post-merge review blocker foundations by Q00 · Pull Request #1205 · Q00/ouroboros

Q00 · 2026-05-24T10:20:08Z

Summary

clear stale auto stop reason codes on recovery transitions
expose missing auto result metadata: interview_closure_mode, active_task_class, runtime probe evidence, and external Ralph product status
render defaulted sections in CLI ledger output
force plugin runtime env path contracts instead of inheriting stale overrides
make runtime timeout/deferred evidence safe for consumers
fix task-class inference overlap and unmatched no-op behavior
classify invalid status-run limits as malformed input
make workflow lifecycle same-timestamp restarts replay to the fresh run
lock mechanical-eval verdict evidence event linkage

Original PR blocker coverage

Addresses blocker foundations for #1151, #1148, #1146, #1193, #1191, #1188, #1181, #1177, #1133, #1134, and #1132.

Verification

uv run ruff check src tests
uv run pytest tests/unit/auto/test_stop_reason_code.py tests/unit/auto/test_pipeline_task_class_envelope.py tests/unit/auto/test_domain_inference.py tests/unit/auto/test_surface.py tests/unit/cli/test_auto_command.py tests/unit/plugin/test_firewall.py tests/unit/orchestrator/test_runtime_evidence.py tests/unit/cli/test_status_run.py tests/unit/orchestrator/test_workflow_lifecycle_events.py tests/integration/test_mechanical_eval_projection.py -q

shaun0927

Thanks for the omnibus — most of the included fixes are clean (_matches_web_service independence, defaulted_sections CLI render, status --limit > 0, mechanical-eval evidence linkage, TimeoutExpired.stdout bytes handling, workflow lifecycle same-ts restart). I'd like to land it, but five hunks need to come out first because they either reverse a merged design decision in the ooo auto SSOT (#1157) or collide with an actively-open sibling PR explicitly named by that SSOT as the canonical landing spot.

Reversals of merged decisions

1. DeferredProbe.passed flipped True→False (reverses #1181)
The class docstring on main is explicit: "returns a probe-PASS RuntimeEvidence with outcome='deferred' so the verifier can flag the gap without failing the grade." Combined with the #1190 contract ("explicit probe passed=False ... downgrade the terminal to BLOCKED"), flipping to False makes every canonical scenario whose TaskClassProfile includes a deferred probe kind block PRODUCT_COMPLETE. That's a global regression of the L3 minimal-substrate behavior, not a safety improvement.

2. L1-b LIBRARY fallback removed (reverses #1177 + #1188)
DomainInference.single is documented on main as "Return the single matched class, or fallback for unmatched" and derive_domain_from_ledger returns fallback=LIBRARY, reason='unmatched'. #1188's body explicitly states "unmatched → apply the safe LIBRARY fallback from L1-b". Switching to next(iter(...)) if is_single else None silently drops the safest-completion-gate fallback; the test rewrite that asserts active_task_class is None for unmatched cements the reversal. If we want to retire LIBRARY-as-fallback that should be an own issue + a DomainInference.single change at the source, not a quiet flip at the consumer.

Collisions with sibling open PRs (#1157 explicitly names them)

3. state.transition() last_error_code clearing collides with #1194
#1157 names #1194 as the remaining L2 follow-up for stop_reason_code clearing. #1194's condition is if next_phase is not AutoPhase.BLOCKED (clears only when leaving BLOCKED); this PR uses if error is None (clears on any error-less transition). The semantics diverge on transition(BLOCKED, msg, error=None) and on non-BLOCKED→non-BLOCKED with an explicit error string. Two PRs writing the same hunk with different conditions will conflict on merge. Suggest dropping this hunk and letting #1194 land first.

4. active_task_class MCP meta collides with #1196
#1157 names #1196 as the remaining L1 MCP-metadata cleanup. #1196's whole point is "Preserve the field as null when inference was ambiguous or unavailable, so clients can distinguish an empty computed value from a missing protocol field." That requires unconditional meta["active_task_class"] = result.active_task_class. The conditional if result.active_task_class is not None here silently opts out of that contract.

5. firewall setdefault → direct assignment reverses #1193
#1193 just landed these three env vars with env.setdefault(...) so downstream entrypoints could pre-seed values. Forcing assignment removes that affordance without a follow-up issue documenting the new contract. If "stale override leaks" is the concern, it should be tracked separately — #1193 was a deliberate setdefault choice.

Suggested path to merge-ready

Apply the five suggestions below (one per hunk), then this PR carries only the legitimate scope:

_matches_web_service independence (good — restores #1177 pattern-independence intent)
defaulted_sections CLI rendering
status --limit > 0 validation
mechanical-eval evidence_event_ids lock
TimeoutExpired.stdout/stderr bytes-safe decode
workflow lifecycle same-timestamp restart ordering + regression test
MCP meta surface for interview_closure_mode + runtime_probe_evidence (these aren't covered by #1194/#1196)
product_status='not_verified_complete' downgrade for unfinished Ralph plugin runs — please confirm there's a documented decision for this; if so a one-line comment citing the issue would help future readers.

Suggested merge order to avoid conflicts: #1195 → #1196 → #1194 → #1206 → #1213 → (this PR rebased).

shaun0927 · 2026-05-25T05:35:28Z

        return RuntimeEvidence(
            probe_kind=self.kind,
-            passed=True,
+            passed=False,


Reverts the #1181 contract. The class docstring three lines up still says "returns a probe-PASS ... without failing the grade", so on main consumers depend on PASS. Flipping to False combined with #1190's passed=False → BLOCKED rule will block every PRODUCT_COMPLETE that touches a deferred probe kind. Please keep the PASS semantics here and open a follow-up issue if there's evidence the deferred placeholder should fail-closed.

Suggested change

passed=False,

passed=True,

shaun0927 · 2026-05-25T05:35:28Z

+                    # a disambiguation hook resolves the class.
                    _inference = derive_domain_from_ledger(ledger)
-                    _task_class = _inference.single
+                    _task_class = next(iter(_inference.classes)) if _inference.is_single else None


DomainInference.single on main is documented as "Return the single matched class, or fallback for unmatched", and derive_domain_from_ledger returns fallback=LIBRARY with reason='unmatched' precisely so the consumer here applies the safe completion gate. Switching to next(iter(_inference.classes)) if _inference.is_single else None drops the unmatched→LIBRARY path that #1177 / #1188 deliberately wired up — and the test edit later in this PR flips test_envelope_uses_library_fallback_when_unmatched to assert no fallback, which is a behavioral reversal. Please revert to the property accessor (or, if the fallback should be retired, raise that as its own issue against DomainInference.single).

Suggested change

_task_class = next(iter(_inference.classes)) if _inference.is_single else None

_task_class = _inference.single

shaun0927 · 2026-05-25T05:35:28Z

        self.last_progress_message = message
        self.last_error = error
+        if error is None:
+            self.last_error_code = None


#1157 names #1194 as the canonical landing spot for stop-reason-code clearing, and #1194's hunk uses if next_phase is not AutoPhase.BLOCKED. The two conditions diverge on transition(BLOCKED, msg, error=None) (this PR clears; #1194 preserves) and on non-BLOCKED → non-BLOCKED with an explicit error string (this PR preserves; #1194 clears). Two open PRs writing the same hunk with different conditions will conflict on merge. Suggest dropping this clearing here and letting #1194 land first (the test additions in test_stop_reason_code.py can stay if they pass against #1194's condition).

Suggested change

self.last_error_code = None

self.last_error = error

shaun0927 · 2026-05-25T05:35:28Z

        )
        if plugin_home is not None:
-            env.setdefault("OUROBOROS_PLUGIN_HOME", str(plugin_home))
+            env["OUROBOROS_PLUGIN_HOME"] = str(plugin_home)


#1193 just landed these three env vars with env.setdefault(...) so downstream entrypoints could pre-seed values (e.g. for an .omx-style overlay or test harness). Forcing direct assignment removes that affordance and quietly reverses #1193's API choice. If the goal is to prevent stale override leaks across invocations, that needs its own issue and a documented contract change — it shouldn't ride in as an unannounced revert.

Suggested change

env["OUROBOROS_PLUGIN_HOME"] = str(plugin_home)

env.setdefault("OUROBOROS_PLUGIN_WORKDIR", str(workdir))

env.setdefault(

"OUROBOROS_PLUGIN_OUTPUT_DIR",

str(workdir / ".ouroboros" / "plugin-artifacts" / manifest.name),

)

if plugin_home is not None:

env.setdefault("OUROBOROS_PLUGIN_HOME", str(plugin_home))

shaun0927 · 2026-05-25T05:35:28Z

+    if result.interview_closure_mode is not None:
+        meta["interview_closure_mode"] = result.interview_closure_mode
+    if result.active_task_class is not None:
+        meta["active_task_class"] = result.active_task_class


Collides with open PR #1196, which #1157 names as the canonical MCP-metadata cleanup for active_task_class. #1196's explicit decision is that the field is always present, null when unavailable, so clients can tell ambiguous inference apart from a missing protocol field. The conditional if ... is not None here silently downgrades to the looser contract. Please drop this two-line block and let #1196 own the surfacing (its insertion is at the unconditional _result_meta body section).

Suggested change

meta["active_task_class"] = result.active_task_class

if result.interview_closure_mode is not None:

meta["interview_closure_mode"] = result.interview_closure_mode

Drops the five hunks called out in PR review against #1157 SSOT / merged design decisions, while keeping every legitimate fix in this PR: - Restore DeferredProbe.passed=True (#1181 contract: probe-PASS placeholder so the L3 verifier flags the gap without failing the grade). - Restore `_inference.single` consumer call so unmatched ledgers still apply the safe LIBRARY fallback (#1177 + #1188 decision). - Drop the `if error is None` last_error_code clearing — sibling PR #1194 owns this hunk with the stricter `next_phase is not BLOCKED` condition. - Restore `env.setdefault(...)` for the three plugin runtime env vars introduced by #1193 so downstream entrypoints can still pre-seed. - Drop the conditional `active_task_class` MCP meta block — sibling PR #1196 owns it with unconditional null surfacing (clients need to tell ambiguous-inference apart from a missing protocol field). Kept fixes: `_matches_web_service` independence, `defaulted_sections` CLI rendering, `status --limit > 0` validation, mechanical-eval evidence linkage, TimeoutExpired stdout/stderr bytes-safe decode, workflow lifecycle same-timestamp restart ordering, MCP meta surfacing for `interview_closure_mode` + `runtime_probe_evidence`, plugin-mode Ralph `product_status=not_verified_complete` downgrade. Verification: - uv run pytest tests/unit/auto/test_pipeline_task_class_envelope.py tests/unit/auto/test_domain_inference.py tests/unit/auto/test_surface.py tests/unit/orchestrator/test_runtime_evidence.py tests/unit/cli/test_status_run.py tests/unit/cli/test_auto_command.py tests/unit/plugin/test_firewall.py tests/unit/orchestrator/test_workflow_lifecycle_events.py tests/integration/test_mechanical_eval_projection.py tests/unit/auto/test_stop_reason_code.py -q → 243 passed - uv run ruff check src tests → passed - uv run ruff format --check src tests → passed

shaun0927

Force push 70e2ef7 applies all five revert suggestions from the previous review:

DeferredProbe.passed=True restored (#1181 contract)
_inference.single restored (#1177 + #1188 LIBRARY fallback)
state.transition() last_error_code hunk dropped (let #1194 own it)
firewall.py setdefault restored (#1193 contract)
auto_handler.py conditional active_task_class dropped (let #1196 own it)

Test envelope assertion and DeferredProbe test reverted to match. CI is green across the 8 checks. The PR now carries only the legitimate scope I called out as keepers in the previous review.

LGTM. Ready to merge after the smaller sibling PRs (#1195 → #1196 → #1194) land — those don't conflict with this branch but landing them first keeps the SSOT history readable.

ouroboros-agent

Review — ouroboros-agent[bot]

Verdict: APPROVE

Metadata

| Field | Value |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.

---|---|
| PR | #1205 |
| HEAD checked | 70e2ef725869f2f765f1c97a0de2ef6ba4e22119 |
| Request ID | req_1779687957_1 |
| Review record | 5caf20cb-47b8-4275-ad7e-fe7495548957 |

What Improved

Surfaces defaulted ledger sections in CLI output.
Tightens status run --limit malformed-input handling before MCP dispatch.
Preserves timeout stdout/stderr previews for runtime probes.
Adds MCP metadata for interview closure mode, runtime probe evidence, and unverified plugin-mode Ralph completions.
Adjusts workflow lifecycle replay ordering for same-timestamp terminal/restart boundaries.

Issue Requirements

Requirement	Status
No linked issue requirement captured	N/A

Prior Findings Status

Prior inline concerns were checked against the current diff. The earlier runtime deferred-probe PASS concern is not present in the current hunk; pipeline/state/firewall hunks referenced by previous comments are no longer in this diff; the active-task-class MCP metadata conflict is also not present. No prior blocker was re-raised.

Blockers

No in-scope blocking findings remained after policy filtering.

Follow-up Findings

#	File:Line	Priority	Confidence	Suggestion
1	src/ouroboros/mcp/tools/auto_handler.py:1302	Low	Medium	Add a focused assertion for the new `runtime_probe_evidence` MCP metadata shape. The implementation is straightforward, but this is now a public client contract and only adjacent surface fixtures were updated.

Non-blocking Suggestions

Test Coverage Notes

Reviewed changed tests and affected existing suites: domain inference, CLI status run, workflow lifecycle projection/conformance, mechanical eval projection, auto MCP/CLI surface tests.
Could not execute pytest locally: python is absent, python3 is 3.11, and pytest is not installed; importing some project modules also hits newer Python syntax.

Design Notes

The changes are additive or narrowly corrective at existing boundaries: CLI presentation, MCP result metadata, runtime probe payload normalization, and lifecycle replay ordering. No new persistence model or cross-layer ownership split was introduced.

Design / Roadmap Gate

Affected boundaries reviewed: CLI-to-MCP argument validation, MCP metadata compatibility, runtime probe evidence serialization, workflow lifecycle replay ordering, and mechanical projection evidence links. The current changes do not create a validated contract, persistence, replay, or compatibility failure. Same-timestamp lifecycle ambiguity remains guarded in conformance/next-runnable paths.

Directional Notes

Maintainer-memory emphasis on runtime reality over optimistic surfaces shaped the check of product-completion metadata and runtime evidence surfacing. Current HEAD evidence supports the unverified plugin completion marker and timeout output preservation.

Test Coverage

Reviewed changed tests and affected existing suites: domain inference, CLI status run, workflow lifecycle projection/conformance, mechanical eval projection, auto MCP/CLI surface tests.
Could not execute pytest locally: python is absent, python3 is 3.11, and pytest is not installed; importing some project modules also hits newer Python syntax.

Merge Recommendation

Approve. I found no blocking runtime, API-contract, persistence, or replay issue in the current changed source. The only remaining item is a non-blocking test hardening suggestion for the new MCP runtime evidence metadata.

Review-Metadata:
verdict: APPROVE
head_sha: 70e2ef7
request_id: req_1779687957_1
review_profile: memory-aware-zero-trust-v2
advisory_memory_only: true

Reviewed by ouroboros-agent[bot] via Codex deep analysis

Q00 mentioned this pull request May 24, 2026

Meta SSOT: AgentOS roadmap sequencing (#920–#960) #961

Open

Q00 force-pushed the fix/review-blockers-foundation branch from 014e34b to 04816f6 Compare May 24, 2026 11:00

fix: resolve post-merge review blocker foundations

bf55d40

Q00 force-pushed the fix/review-blockers-foundation branch from 04816f6 to bf55d40 Compare May 24, 2026 11:01

shaun0927 requested changes May 25, 2026

View reviewed changes

This was referenced May 25, 2026

fix(auto): clear stale stop reason codes #1194

Merged

fix(auto): expose active task class in MCP meta #1196

Merged

fix: keep plugin artifact hooks self-contained #1206

Merged

shaun0927 approved these changes May 25, 2026

View reviewed changes

ouroboros-agent Bot approved these changes May 25, 2026

View reviewed changes

shaun0927 merged commit 265aedb into main May 25, 2026
8 checks passed

shaun0927 deleted the fix/review-blockers-foundation branch May 25, 2026 08:06

Q00 mentioned this pull request May 25, 2026

test(auto): assert runtime probe metadata shape #1222

Merged

shaun0927 mentioned this pull request May 25, 2026

Fix auto skip-run closure diagnostics #1229

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: resolve post-merge review blocker foundations#1205

fix: resolve post-merge review blocker foundations#1205
shaun0927 merged 2 commits into
mainfrom
fix/review-blockers-foundation

Q00 commented May 24, 2026

Uh oh!

shaun0927 left a comment

Uh oh!

shaun0927 May 25, 2026

Uh oh!

shaun0927 May 25, 2026

Uh oh!

shaun0927 May 25, 2026

Uh oh!

shaun0927 May 25, 2026

Uh oh!

shaun0927 May 25, 2026

Uh oh!

shaun0927 left a comment

Uh oh!

ouroboros-agent Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	_task_class = next(iter(_inference.classes)) if _inference.is_single else None
	_task_class = _inference.single

	meta["active_task_class"] = result.active_task_class
	if result.interview_closure_mode is not None:
	meta["interview_closure_mode"] = result.interview_closure_mode

Conversation

Q00 commented May 24, 2026

Summary

Original PR blocker coverage

Verification

Uh oh!

shaun0927 left a comment

Choose a reason for hiding this comment

Reversals of merged decisions

Collisions with sibling open PRs (#1157 explicitly names them)

Suggested path to merge-ready

Uh oh!

shaun0927 May 25, 2026

Choose a reason for hiding this comment

Uh oh!

shaun0927 May 25, 2026

Choose a reason for hiding this comment

Uh oh!

shaun0927 May 25, 2026

Choose a reason for hiding this comment

Uh oh!

shaun0927 May 25, 2026

Choose a reason for hiding this comment

Uh oh!

shaun0927 May 25, 2026

Choose a reason for hiding this comment

Uh oh!

shaun0927 left a comment

Choose a reason for hiding this comment

Uh oh!

ouroboros-agent Bot left a comment

Choose a reason for hiding this comment

Review — ouroboros-agent[bot]

Metadata

What Improved

Issue Requirements

Prior Findings Status

Blockers

Follow-up Findings

Non-blocking Suggestions

Test Coverage Notes

Design Notes

Design / Roadmap Gate

Directional Notes

Test Coverage

Merge Recommendation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants