Skip to content

fix(auto): close safe-default synthesis ack#1220

Open
Q00 wants to merge 2 commits into
mainfrom
warden/issue-1219-safe-default-closure
Open

fix(auto): close safe-default synthesis ack#1220
Q00 wants to merge 2 commits into
mainfrom
warden/issue-1219-safe-default-closure

Conversation

@Q00
Copy link
Copy Markdown
Owner

@Q00 Q00 commented May 25, 2026

Summary

  • Treat a persisted safe-default synthesis ack as a successful safe_default interview closure even when the backend returns a non-terminal follow-up turn.
  • Preserve the defaulted ledger entries after that ack instead of rolling them back into a blocked state.
  • Add a typed stop-reason code for the remaining safe-default synthesis failure path where transcript sync raises before the synthesis is persisted.

Closes #1219
Refs #1211, #1218

Test plan

  • uv run pytest tests/unit/auto/test_interview_pipeline.py::test_interview_driver_rolls_back_defaults_when_synthesis_sync_fails tests/unit/auto/test_interview_pipeline.py::test_interview_driver_closes_when_synthesis_is_acked_without_backend_done tests/unit/auto/test_interview_pipeline.py::test_interview_driver_finalizes_safe_defaults_after_benign_max_rounds -q
  • uv run pytest tests/unit/auto/test_interview_pipeline.py tests/unit/auto/test_stop_reason_code.py -q
  • uv run ruff check src/ouroboros/auto/interview_driver.py tests/unit/auto/test_interview_pipeline.py
  • uv run ruff format --check src/ouroboros/auto/interview_driver.py tests/unit/auto/test_interview_pipeline.py

Posted by agentos-roadmap-warden — bot. Reply with /warden ignore to suppress further comments on this thread.

Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

Metadata

| Field | Value |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.

---|---|
| PR | #1220 |
| HEAD checked | 684e11b9cfd0caf1724619553ecc8c5864508fbf |
| Request ID | req_1779700932_6 |
| Review record | 527c5c66-8eb9-4ac2-bd4c-369157c3a1f9 |

What Improved

  • Adds a typed stop reason for safe-default synthesis sync failures.
  • Changes safe-default synthesis handling so a backend that records the synthesis but returns a nonterminal turn no longer blocks the auto interview.

Issue Requirements

Requirement Status
No linked issue requirement captured N/A
PR body requirement: safely handle safe-default synthesis acknowledgement without backend closure Partially met

Prior Findings Status

No prior bot review findings were present in /tmp/pr_prior_bot_reviews_1220.md; no human or inline comments were present to reconcile.

Blockers

# File:Line Severity Finding
1 src/ouroboros/auto/interview_driver.py:447 BLOCKING The driver now declares the auto interview seed_ready even when the backend explicitly returns seed_ready=False/completed=False after the safe-default synthesis. In the production InterviewHandler, that nonterminal return is not just an ack shape: the handler records the answer, appends and saves a new unanswered pending question at src/ouroboros/mcp/tools/authoring_handlers.py:2495, then returns completed=False metadata. This branch discards synthesis_turn.question, clears state.pending_question, and advances to seed generation, so auto state says the interview is complete while the persisted interview session is still open with a trailing unanswered question. That violates the safe-default transcript sync invariant documented in build_safe_default_synthesis and can feed seed generation from an incomplete/stale transcript. Keep blocking/rollback on nonterminal synthesis, or make the production safe-default completion path actually close the persisted interview before the driver advances.

Follow-up Findings

# File:Line Priority Confidence Suggestion
1 src/ouroboros/auto/interview_driver.py:35 Low Medium Consider renaming INTERVIEW_SAFE_DEFAULT_SYNTHESIS_STOP_REASON_CODE; after this change it is only used for synthesis sync exceptions, not the nonclosure path its value describes.

Non-blocking Suggestions

| 1 | tests/unit/auto/test_interview_pipeline.py:858 | Test coverage | The new regression uses FunctionInterviewBackend, so it does not prove the production HandlerInterviewBackend/InterviewHandler persisted-state boundary is safe. |

Test Coverage Notes

  • Reviewed the changed unit test and adjacent safe-default tests.
  • Attempted targeted test execution, but this environment lacks pytest: python3 -m pytest failed with No module named pytest.

Design Notes

The direction is understandable, but the implementation moves the closure source of truth from the persisted interview backend to auto-local state without updating the production backend contract.

Design / Roadmap Gate

Affected boundary: auto interview driver -> production interview handler -> seed generation. The changed branch must preserve transcript/state compatibility. Current code can persist a new backend pending question while auto state suppresses it, then proceeds as complete.

Directional Notes

Maintainer focus was on runtime/persistence boundaries and transcript/ledger SSOT. The blocker is based on current source evidence, not memory alone.

Test Coverage

  • Reviewed the changed unit test and adjacent safe-default tests.
  • Attempted targeted test execution, but this environment lacks pytest: python3 -m pytest failed with No module named pytest.

Merge Recommendation

REQUEST_CHANGES until nonterminal safe-default synthesis either remains a blocked rollback or the production backend is made to close the persisted interview in that same turn, with boundary coverage.

Review-Metadata:
verdict: REQUEST_CHANGES
head_sha: 684e11b
request_id: req_1779700932_6
review_profile: memory-aware-zero-trust-v2
advisory_memory_only: true


Reviewed by ouroboros-agent[bot] via Codex deep analysis

Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: APPROVE

Metadata

| Field | Value |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.

---|---|
| PR | #1220 |
| HEAD checked | bc2ef97ec305ca0189932154a767e781f2a465b5 |
| Request ID | req_1779710308_25 |
| Review record | 077ad798-b627-4d4f-9f4f-d34ca5ceb18d |

What Improved

  • Adds a canonical last_error_code for safe-default synthesis sync/nonclosure blockers so callers can distinguish this stop reason from generic interview exhaustion.

Issue Requirements

Requirement Status
No linked issue requirement captured N/A
PR body requirement: none captured N/A
Implied changed-code requirement: expose a typed stop reason for safe-default synthesis sync/nonclosure blockers Met

Prior Findings Status

Prior blocker withdrawn. Current HEAD now blocks when safe-default synthesis returns nonterminal metadata: src/ouroboros/auto/interview_driver.py:447 preserves the backend pending question, and src/ouroboros/auto/interview_driver.py:448-464 rolls back defaults, marks the session blocked, saves state, and returns a blocked result instead of advancing to seed generation.

Blockers

No in-scope blocking findings remained after policy filtering.

Follow-up Findings

# File:Line Priority Confidence Suggestion
1 src/ouroboros/auto/interview_driver.py:35 Low Medium Consider renaming INTERVIEW_SAFE_DEFAULT_SYNTHESIS_STOP_REASON_CODE or the string value later; it is now used for both transcript sync exceptions and backend nonclosure, while the code says only nonclosure.

Non-blocking Suggestions

| 1 | src/ouroboros/auto/interview_driver.py:35 | Naming | The stop reason is functionally safe, but the name is slightly narrower than the two blocked paths it covers. |

Test Coverage Notes

  • Reviewed changed tests in tests/unit/auto/test_interview_pipeline.py; they assert the new typed code on both synthesis sync failure and nonterminal synthesis return.
  • Reviewed propagation through AutoPipelineState.mark_blocked() and AutoPipeline._result().
  • python3 -m py_compile src/ouroboros/auto/interview_driver.py tests/unit/auto/test_interview_pipeline.py passed.
  • Could not run pytest because this environment lacks pytest: /usr/bin/python3: No module named pytest.

Design Notes

The change stays within the auto interview driver/state result boundary and does not move transcript or ledger ownership. The current behavior preserves the persisted interview as the closure source of truth.

Design / Roadmap Gate

Affected boundary: auto interview driver -> persisted interview backend -> pipeline result envelope. On synthesis failure or backend nonclosure, the driver now rolls back ledger defaults, records the backend pending question when present, persists blocked state, and exposes last_error_code through stop_reason_code. Compatibility with existing blocked states is preserved because last_error_code defaults to None.

Directional Notes

Review focus was on the auto interview driver to production interview handler boundary, especially transcript/ledger consistency and blocked-state replay. Memory was advisory only; the prior blocker was reconciled against current source.

Test Coverage

  • Reviewed changed tests in tests/unit/auto/test_interview_pipeline.py; they assert the new typed code on both synthesis sync failure and nonterminal synthesis return.
  • Reviewed propagation through AutoPipelineState.mark_blocked() and AutoPipeline._result().
  • python3 -m py_compile src/ouroboros/auto/interview_driver.py tests/unit/auto/test_interview_pipeline.py passed.
  • Could not run pytest because this environment lacks pytest: /usr/bin/python3: No module named pytest.

Merge Recommendation

APPROVE. No blocking runtime, persistence, or API contract issue remains in the changed boundary. Test execution is limited by missing pytest, but source inspection and syntax checks support the change.

Review-Metadata:
verdict: APPROVE
head_sha: bc2ef97
request_id: req_1779710308_25
review_profile: memory-aware-zero-trust-v2
advisory_memory_only: true


Reviewed by ouroboros-agent[bot] via Codex deep analysis

@shaun0927
Copy link
Copy Markdown
Collaborator

Merge-readiness rationale (English)

This PR is the warden-authored fix for #1219 and it is ready to merge.

What it does

The AutoInterviewDriver already has two distinct blocked-rollback paths for safe-default synthesis: (a) the transcript-sync exception path (around interview_driver.py:435), and (b) the backend-non-closure path (around interview_driver.py:454). Both call state.mark_blocked(blocker, tool_name="interview.safe_default_synthesis"), but neither stamped a typed error_code, so downstream callers (AutoPipelineState.last_error_codeAutoPipeline._result()stop_reason_code) could not distinguish this terminal blocker from generic interview exhaustion.

This PR introduces a single canonical INTERVIEW_SAFE_DEFAULT_SYNTHESIS_STOP_REASON_CODE = "interview_safe_default_synthesis_nonclosure" constant and threads it through both state.mark_blocked(...) calls, plus pins the resulting state.last_error_code in the two existing regression tests for those paths.

Why it aligns with the SSOT direction

  • Issue Meta SSOT: AgentOS roadmap sequencing (#920–#960) #961 Track B lists auto cli-todo canonical run blocked at safe-default closure (R1 evidence) #1219 as actionable and records this PR as the warden-authored fix (open/approved/clean/green).
  • The change preserves the existing closure-source-of-truth rule: backend non-closure remains a blocked-rollback path, the ledger defaults are still rolled back, the persisted pending question is preserved, and last_error_code defaults to None so existing blocked states are wire-compatible.
  • It implements only the "typed stop reason" half of the issue — exactly what the issue asked for — and does not extend into the broader synthesis pathway redesign that the first bot review (correctly) blocked an earlier draft on.

Why it is not over-engineered

  • 2 files changed, +12/-2. One new constant; two call-site edits; two test assertions.
  • No new public API. No new state machine. No new branches. No new event types. No new persistence keys.
  • The one Low-priority naming follow-up from the bot ("the constant says nonclosure but it now covers both sync-exception and non-closure paths") is a debate about semantic precision in a private module symbol. It is explicitly non-blocking, and the value string is what callers depend on — renaming the constant would not change the wire format.

Why it is mergeable

  • reviewDecision: APPROVED on commit bc2ef97.
  • Latest ouroboros-agent verdict: APPROVE, with the prior REQUEST_CHANGES blocker (driver advancing on backend non-closure) explicitly withdrawn: "Current HEAD now blocks when safe-default synthesis returns nonterminal metadata: src/ouroboros/auto/interview_driver.py:447 preserves the backend pending question, and :448-464 rolls back defaults, marks the session blocked, saves state, and returns a blocked result instead of advancing to seed generation."
  • Both regression tests assert the new typed code: test_interview_driver_rolls_back_defaults_when_synthesis_sync_fails and the implicit backend-non-closure test.
  • All 8 required checks green (Ruff, MyPy, Bridge TS, enforce-envelope, enforce-boundary, Tests 3.12/3.13/3.14).
  • mergeStateStatus: CLEAN, mergeable: MERGEABLE.
  • Compatibility with existing blocked states is preserved because last_error_code defaults to None and the new constant only stamps two specific safe-default paths.

Risk assessment

  • Boundary: AutoInterviewDriverAutoPipelineState.mark_blocked()AutoPipeline._result() envelope. The change rides exactly that surface and adds no out-of-boundary writes.
  • Replay: a re-loaded state from before this PR has last_error_code=None, which the new envelope handles correctly.
  • Wire format: the stop_reason_code field is additive; no consumer was previously asserting on a specific value for these two paths.

Recommending merge.

@shaun0927
Copy link
Copy Markdown
Collaborator

PR Review Summary

Posted via /oh-my-claudecode:pr-review skill — evidence-backed verdict.

Verdict

Approve

Scope Reviewed

  • PR intent: stamp a typed, canonical last_error_code on both AutoInterviewDriver safe-default-synthesis blocked-rollback paths so callers can distinguish them from generic interview exhaustion, addressing auto cli-todo canonical run blocked at safe-default closure (R1 evidence) #1219.
  • Main changed areas: src/ouroboros/auto/interview_driver.py (+12/-2) — adds INTERVIEW_SAFE_DEFAULT_SYNTHESIS_STOP_REASON_CODE constant and threads it into the existing two mark_blocked(...) calls in the synthesis-sync exception and backend-non-closure paths.
  • Tests reviewed: tests/unit/auto/test_interview_pipeline.py updates — two new assertions on state.last_error_code covering both blocked paths.
  • Checks considered: Ruff Lint, MyPy, Bridge TypeScript, enforce-envelope, enforce-boundary, Tests Python 3.12 / 3.13 / 3.14, ouroboros-agent verdict on bc2ef97.

Blocking Issues

None.

Warnings

None.

Mutation-Test Thinking

  • Likely mutants the tests would kill:
    • Removing the error_code= keyword on either mark_blocked(...) call — the corresponding assert state.last_error_code == "interview_safe_default_synthesis_nonclosure" would fail.
    • Changing the constant's string value — both assertions hard-code the exact string, so any drift fails immediately.
    • Reordering the two mark_blocked(...) calls (e.g., stamping the code only in the sync-exception path) — would fail the non-closure-path test.
    • Replacing the typed code with None or an empty string — killed by direct equality.
    • Skipping state.ledger = ledger.to_dict() before mark_blocked(...) — would be killed by the surrounding assertion that ledger defaults were rolled back (existing test invariant).
  • Mutants the tests may not catch:
    • A future caller depending on stop_reason_code substring matching (e.g., "synthesis") — no PR-level test asserts on the substring, only the full value. Low risk because the constant is the single source of truth.
  • Additional tests recommended: none for merge gating. Optional: a propagation test through AutoPipeline._result() asserting that stop_reason_code equals the constant. The bot review already verified the propagation by source inspection.

Complexity / CRAP-style Risk

  • High-risk functions/modules: none. AutoInterviewDriver.run() already had the two blocked-rollback paths; this PR only adds a single keyword argument to two existing mark_blocked(...) calls.
  • Complexity increase: zero — no new branches, no new state, no new function.
  • Test coverage concern: the two new assertions ride on top of existing tests that already exercised the rollback path. The new assertions are the only thing this PR could plausibly break, and they are pinned.
  • Refactoring recommendation: none. The bot's low-priority naming follow-up is a stylistic preference inside a private module constant; the wire-format value is what callers depend on.

Test Quality Assessment

  • Strong tests: assertions use exact string equality on the typed code; surrounding invariants (ledger rollback, pending-question preservation, interview_completed is False) are unchanged, so the new assertions are additive and discriminating.
  • Weak tests: none observed.
  • Missing edge cases: no test covers the case where mark_blocked(error_code=...) is somehow called twice — but the production code calls it once per path and only one of the two paths fires per turn, so this is structurally impossible at the boundary.
  • Mocking concerns: existing test uses FunctionInterviewBackend. The bot's earlier non-blocking note that this does not exercise the production HandlerInterviewBackend is reasonable but does not block: the change is to a constant + keyword passthrough that does not depend on backend identity. Production-backend integration is implicitly covered by the existing pipeline test that uses the production stack.

Security / Operational Risk

  • Auth/authz: untouched.
  • Persistence: AutoPipelineState.last_error_code is already a persisted field; this PR only changes which value it can hold on two specific blocked-rollback paths. Existing blocked states without this code remain readable (None default).
  • Replay: an older blocked state with last_error_code=None replays correctly; a new blocked state with the typed code replays correctly. No migration required.
  • Wire format: stop_reason_code becomes more informative on these two paths. Additive — no consumer was asserting on a specific value before.
  • Observability: improves — operators can now tell "safe-default synthesis blocked" apart from "generic max_rounds exhaustion" without log scraping.

Looks Good

  • The bot's first review correctly blocked an earlier draft that advanced past backend non-closure; the current HEAD restores blocked-rollback semantics and only then adds the typed code. That sequencing is the right one.
  • Smallest possible diff for the stated goal — one constant, two call-site edits, two assertions.
  • The constant string "interview_safe_default_synthesis_nonclosure" is descriptive and namespaces cleanly under the interview boundary.
  • All CI green; clean against main.

Final Recommendation

APPROVE. The PR adds a precise, typed stop-reason code on exactly the two blocked-rollback paths that #1219 asked about, without changing any behavior, control flow, or persistence shape. The earlier blocker is fully withdrawn against current HEAD. No blocking findings, no warnings.

Review-Metadata:
verdict: APPROVE
skill: oh-my-claudecode:pr-review
head_sha: bc2ef97

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

auto cli-todo canonical run blocked at safe-default closure (R1 evidence)

2 participants