Skip to content

fix: resolve post-merge review blocker foundations#1205

Merged
shaun0927 merged 2 commits into
mainfrom
fix/review-blockers-foundation
May 25, 2026
Merged

fix: resolve post-merge review blocker foundations#1205
shaun0927 merged 2 commits into
mainfrom
fix/review-blockers-foundation

Conversation

@Q00
Copy link
Copy Markdown
Owner

@Q00 Q00 commented May 24, 2026

Summary

  • clear stale auto stop reason codes on recovery transitions
  • expose missing auto result metadata: interview_closure_mode, active_task_class, runtime probe evidence, and external Ralph product status
  • render defaulted sections in CLI ledger output
  • force plugin runtime env path contracts instead of inheriting stale overrides
  • make runtime timeout/deferred evidence safe for consumers
  • fix task-class inference overlap and unmatched no-op behavior
  • classify invalid status-run limits as malformed input
  • make workflow lifecycle same-timestamp restarts replay to the fresh run
  • lock mechanical-eval verdict evidence event linkage

Original PR blocker coverage

Addresses blocker foundations for #1151, #1148, #1146, #1193, #1191, #1188, #1181, #1177, #1133, #1134, and #1132.

Verification

  • uv run ruff check src tests
  • uv run pytest tests/unit/auto/test_stop_reason_code.py tests/unit/auto/test_pipeline_task_class_envelope.py tests/unit/auto/test_domain_inference.py tests/unit/auto/test_surface.py tests/unit/cli/test_auto_command.py tests/unit/plugin/test_firewall.py tests/unit/orchestrator/test_runtime_evidence.py tests/unit/cli/test_status_run.py tests/unit/orchestrator/test_workflow_lifecycle_events.py tests/integration/test_mechanical_eval_projection.py -q

@Q00 Q00 force-pushed the fix/review-blockers-foundation branch from 04816f6 to bf55d40 Compare May 24, 2026 11:01
Copy link
Copy Markdown
Collaborator

@shaun0927 shaun0927 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the omnibus — most of the included fixes are clean (_matches_web_service independence, defaulted_sections CLI render, status --limit > 0, mechanical-eval evidence linkage, TimeoutExpired.stdout bytes handling, workflow lifecycle same-ts restart). I'd like to land it, but five hunks need to come out first because they either reverse a merged design decision in the ooo auto SSOT (#1157) or collide with an actively-open sibling PR explicitly named by that SSOT as the canonical landing spot.

Reversals of merged decisions

1. DeferredProbe.passed flipped True→False (reverses #1181)
The class docstring on main is explicit: "returns a probe-PASS RuntimeEvidence with outcome='deferred' so the verifier can flag the gap without failing the grade." Combined with the #1190 contract ("explicit probe passed=False ... downgrade the terminal to BLOCKED"), flipping to False makes every canonical scenario whose TaskClassProfile includes a deferred probe kind block PRODUCT_COMPLETE. That's a global regression of the L3 minimal-substrate behavior, not a safety improvement.

2. L1-b LIBRARY fallback removed (reverses #1177 + #1188)
DomainInference.single is documented on main as "Return the single matched class, or fallback for unmatched" and derive_domain_from_ledger returns fallback=LIBRARY, reason='unmatched'. #1188's body explicitly states "unmatched → apply the safe LIBRARY fallback from L1-b". Switching to next(iter(...)) if is_single else None silently drops the safest-completion-gate fallback; the test rewrite that asserts active_task_class is None for unmatched cements the reversal. If we want to retire LIBRARY-as-fallback that should be an own issue + a DomainInference.single change at the source, not a quiet flip at the consumer.

Collisions with sibling open PRs (#1157 explicitly names them)

3. state.transition() last_error_code clearing collides with #1194
#1157 names #1194 as the remaining L2 follow-up for stop_reason_code clearing. #1194's condition is if next_phase is not AutoPhase.BLOCKED (clears only when leaving BLOCKED); this PR uses if error is None (clears on any error-less transition). The semantics diverge on transition(BLOCKED, msg, error=None) and on non-BLOCKED→non-BLOCKED with an explicit error string. Two PRs writing the same hunk with different conditions will conflict on merge. Suggest dropping this hunk and letting #1194 land first.

4. active_task_class MCP meta collides with #1196
#1157 names #1196 as the remaining L1 MCP-metadata cleanup. #1196's whole point is "Preserve the field as null when inference was ambiguous or unavailable, so clients can distinguish an empty computed value from a missing protocol field." That requires unconditional meta["active_task_class"] = result.active_task_class. The conditional if result.active_task_class is not None here silently opts out of that contract.

5. firewall setdefault → direct assignment reverses #1193
#1193 just landed these three env vars with env.setdefault(...) so downstream entrypoints could pre-seed values. Forcing assignment removes that affordance without a follow-up issue documenting the new contract. If "stale override leaks" is the concern, it should be tracked separately — #1193 was a deliberate setdefault choice.

Suggested path to merge-ready

Apply the five suggestions below (one per hunk), then this PR carries only the legitimate scope:

  • _matches_web_service independence (good — restores #1177 pattern-independence intent)
  • defaulted_sections CLI rendering
  • status --limit > 0 validation
  • mechanical-eval evidence_event_ids lock
  • TimeoutExpired.stdout/stderr bytes-safe decode
  • workflow lifecycle same-timestamp restart ordering + regression test
  • MCP meta surface for interview_closure_mode + runtime_probe_evidence (these aren't covered by #1194/#1196)
  • product_status='not_verified_complete' downgrade for unfinished Ralph plugin runs — please confirm there's a documented decision for this; if so a one-line comment citing the issue would help future readers.

Suggested merge order to avoid conflicts: #1195#1196#1194#1206#1213 → (this PR rebased).

return RuntimeEvidence(
probe_kind=self.kind,
passed=True,
passed=False,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverts the #1181 contract. The class docstring three lines up still says "returns a probe-PASS ... without failing the grade", so on main consumers depend on PASS. Flipping to False combined with #1190's passed=False → BLOCKED rule will block every PRODUCT_COMPLETE that touches a deferred probe kind. Please keep the PASS semantics here and open a follow-up issue if there's evidence the deferred placeholder should fail-closed.

Suggested change
passed=False,
passed=True,

Comment thread src/ouroboros/auto/pipeline.py Outdated
# a disambiguation hook resolves the class.
_inference = derive_domain_from_ledger(ledger)
_task_class = _inference.single
_task_class = next(iter(_inference.classes)) if _inference.is_single else None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DomainInference.single on main is documented as "Return the single matched class, or fallback for unmatched", and derive_domain_from_ledger returns fallback=LIBRARY with reason='unmatched' precisely so the consumer here applies the safe completion gate. Switching to next(iter(_inference.classes)) if _inference.is_single else None drops the unmatched→LIBRARY path that #1177 / #1188 deliberately wired up — and the test edit later in this PR flips test_envelope_uses_library_fallback_when_unmatched to assert no fallback, which is a behavioral reversal. Please revert to the property accessor (or, if the fallback should be retired, raise that as its own issue against DomainInference.single).

Suggested change
_task_class = next(iter(_inference.classes)) if _inference.is_single else None
_task_class = _inference.single

Comment thread src/ouroboros/auto/state.py Outdated
self.last_progress_message = message
self.last_error = error
if error is None:
self.last_error_code = None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1157 names #1194 as the canonical landing spot for stop-reason-code clearing, and #1194's hunk uses if next_phase is not AutoPhase.BLOCKED. The two conditions diverge on transition(BLOCKED, msg, error=None) (this PR clears; #1194 preserves) and on non-BLOCKED → non-BLOCKED with an explicit error string (this PR preserves; #1194 clears). Two open PRs writing the same hunk with different conditions will conflict on merge. Suggest dropping this clearing here and letting #1194 land first (the test additions in test_stop_reason_code.py can stay if they pass against #1194's condition).

Suggested change
self.last_error_code = None
self.last_error = error

Comment thread src/ouroboros/plugin/firewall.py Outdated
)
if plugin_home is not None:
env.setdefault("OUROBOROS_PLUGIN_HOME", str(plugin_home))
env["OUROBOROS_PLUGIN_HOME"] = str(plugin_home)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1193 just landed these three env vars with env.setdefault(...) so downstream entrypoints could pre-seed values (e.g. for an .omx-style overlay or test harness). Forcing direct assignment removes that affordance and quietly reverses #1193's API choice. If the goal is to prevent stale override leaks across invocations, that needs its own issue and a documented contract change — it shouldn't ride in as an unannounced revert.

Suggested change
env["OUROBOROS_PLUGIN_HOME"] = str(plugin_home)
env.setdefault("OUROBOROS_PLUGIN_WORKDIR", str(workdir))
env.setdefault(
"OUROBOROS_PLUGIN_OUTPUT_DIR",
str(workdir / ".ouroboros" / "plugin-artifacts" / manifest.name),
)
if plugin_home is not None:
env.setdefault("OUROBOROS_PLUGIN_HOME", str(plugin_home))

Comment thread src/ouroboros/mcp/tools/auto_handler.py Outdated
if result.interview_closure_mode is not None:
meta["interview_closure_mode"] = result.interview_closure_mode
if result.active_task_class is not None:
meta["active_task_class"] = result.active_task_class
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Collides with open PR #1196, which #1157 names as the canonical MCP-metadata cleanup for active_task_class. #1196's explicit decision is that the field is always present, null when unavailable, so clients can tell ambiguous inference apart from a missing protocol field. The conditional if ... is not None here silently downgrades to the looser contract. Please drop this two-line block and let #1196 own the surfacing (its insertion is at the unconditional _result_meta body section).

Suggested change
meta["active_task_class"] = result.active_task_class
if result.interview_closure_mode is not None:
meta["interview_closure_mode"] = result.interview_closure_mode

Drops the five hunks called out in PR review against #1157 SSOT / merged
design decisions, while keeping every legitimate fix in this PR:

- Restore DeferredProbe.passed=True (#1181 contract: probe-PASS placeholder
  so the L3 verifier flags the gap without failing the grade).
- Restore `_inference.single` consumer call so unmatched ledgers still
  apply the safe LIBRARY fallback (#1177 + #1188 decision).
- Drop the `if error is None` last_error_code clearing — sibling PR #1194
  owns this hunk with the stricter `next_phase is not BLOCKED` condition.
- Restore `env.setdefault(...)` for the three plugin runtime env vars
  introduced by #1193 so downstream entrypoints can still pre-seed.
- Drop the conditional `active_task_class` MCP meta block — sibling PR
  #1196 owns it with unconditional null surfacing (clients need to tell
  ambiguous-inference apart from a missing protocol field).

Kept fixes: `_matches_web_service` independence, `defaulted_sections` CLI
rendering, `status --limit > 0` validation, mechanical-eval evidence
linkage, TimeoutExpired stdout/stderr bytes-safe decode, workflow
lifecycle same-timestamp restart ordering, MCP meta surfacing for
`interview_closure_mode` + `runtime_probe_evidence`, plugin-mode Ralph
`product_status=not_verified_complete` downgrade.

Verification:
- uv run pytest tests/unit/auto/test_pipeline_task_class_envelope.py
  tests/unit/auto/test_domain_inference.py tests/unit/auto/test_surface.py
  tests/unit/orchestrator/test_runtime_evidence.py
  tests/unit/cli/test_status_run.py tests/unit/cli/test_auto_command.py
  tests/unit/plugin/test_firewall.py
  tests/unit/orchestrator/test_workflow_lifecycle_events.py
  tests/integration/test_mechanical_eval_projection.py
  tests/unit/auto/test_stop_reason_code.py -q  →  243 passed
- uv run ruff check src tests  →  passed
- uv run ruff format --check src tests  →  passed
Copy link
Copy Markdown
Collaborator

@shaun0927 shaun0927 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Force push 70e2ef7 applies all five revert suggestions from the previous review:

  1. DeferredProbe.passed=True restored (#1181 contract)
  2. _inference.single restored (#1177 + #1188 LIBRARY fallback)
  3. state.transition() last_error_code hunk dropped (let #1194 own it)
  4. firewall.py setdefault restored (#1193 contract)
  5. auto_handler.py conditional active_task_class dropped (let #1196 own it)

Test envelope assertion and DeferredProbe test reverted to match. CI is green across the 8 checks. The PR now carries only the legitimate scope I called out as keepers in the previous review.

LGTM. Ready to merge after the smaller sibling PRs (#1195#1196#1194) land — those don't conflict with this branch but landing them first keeps the SSOT history readable.

Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: APPROVE

Metadata

| Field | Value |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.

---|---|
| PR | #1205 |
| HEAD checked | 70e2ef725869f2f765f1c97a0de2ef6ba4e22119 |
| Request ID | req_1779687957_1 |
| Review record | 5caf20cb-47b8-4275-ad7e-fe7495548957 |

What Improved

  • Surfaces defaulted ledger sections in CLI output.
  • Tightens status run --limit malformed-input handling before MCP dispatch.
  • Preserves timeout stdout/stderr previews for runtime probes.
  • Adds MCP metadata for interview closure mode, runtime probe evidence, and unverified plugin-mode Ralph completions.
  • Adjusts workflow lifecycle replay ordering for same-timestamp terminal/restart boundaries.

Issue Requirements

Requirement Status
No linked issue requirement captured N/A

Prior Findings Status

Prior inline concerns were checked against the current diff. The earlier runtime deferred-probe PASS concern is not present in the current hunk; pipeline/state/firewall hunks referenced by previous comments are no longer in this diff; the active-task-class MCP metadata conflict is also not present. No prior blocker was re-raised.

Blockers

No in-scope blocking findings remained after policy filtering.

Follow-up Findings

# File:Line Priority Confidence Suggestion
1 src/ouroboros/mcp/tools/auto_handler.py:1302 Low Medium Add a focused assertion for the new runtime_probe_evidence MCP metadata shape. The implementation is straightforward, but this is now a public client contract and only adjacent surface fixtures were updated.

Non-blocking Suggestions

| 1 | src/ouroboros/mcp/tools/auto_handler.py:1302 | Test coverage | Nice-to-have direct test for serialized runtime probe evidence in _result_meta. |

Test Coverage Notes

  • Reviewed changed tests and affected existing suites: domain inference, CLI status run, workflow lifecycle projection/conformance, mechanical eval projection, auto MCP/CLI surface tests.
  • Could not execute pytest locally: python is absent, python3 is 3.11, and pytest is not installed; importing some project modules also hits newer Python syntax.

Design Notes

The changes are additive or narrowly corrective at existing boundaries: CLI presentation, MCP result metadata, runtime probe payload normalization, and lifecycle replay ordering. No new persistence model or cross-layer ownership split was introduced.

Design / Roadmap Gate

Affected boundaries reviewed: CLI-to-MCP argument validation, MCP metadata compatibility, runtime probe evidence serialization, workflow lifecycle replay ordering, and mechanical projection evidence links. The current changes do not create a validated contract, persistence, replay, or compatibility failure. Same-timestamp lifecycle ambiguity remains guarded in conformance/next-runnable paths.

Directional Notes

Maintainer-memory emphasis on runtime reality over optimistic surfaces shaped the check of product-completion metadata and runtime evidence surfacing. Current HEAD evidence supports the unverified plugin completion marker and timeout output preservation.

Test Coverage

  • Reviewed changed tests and affected existing suites: domain inference, CLI status run, workflow lifecycle projection/conformance, mechanical eval projection, auto MCP/CLI surface tests.
  • Could not execute pytest locally: python is absent, python3 is 3.11, and pytest is not installed; importing some project modules also hits newer Python syntax.

Merge Recommendation

Approve. I found no blocking runtime, API-contract, persistence, or replay issue in the current changed source. The only remaining item is a non-blocking test hardening suggestion for the new MCP runtime evidence metadata.

Review-Metadata:
verdict: APPROVE
head_sha: 70e2ef7
request_id: req_1779687957_1
review_profile: memory-aware-zero-trust-v2
advisory_memory_only: true


Reviewed by ouroboros-agent[bot] via Codex deep analysis

@shaun0927 shaun0927 merged commit 265aedb into main May 25, 2026
8 checks passed
@shaun0927 shaun0927 deleted the fix/review-blockers-foundation branch May 25, 2026 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants