You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
R1 live canonical run (OUROBOROS_RUN_CANONICAL=1 pytest tests/canonical/ -k cli-todo on 265aedb4) terminated with phase=blocked at safe-default synthesis. ooo auto cannot complete the cli-todo canonical scenario, which fails SSOT #1157 success condition #1 (all four canonical goals reach status=complete). Two coupled defects produce a single symptom:
B1 — Logic: safe-default fallback cannot close the interview when the backend accepts the synthesis answer but does not flag the resulting turn as seed_ready/completed. Defaults are then rolled back and the auto pipeline exits blocked.
This issue tracks the behavioural fix. #1211 (open) is the observability complement for the same decision point and should land alongside, not instead of, the work proposed here. Bug A (the test-harness is_ok() / unwrap_err() mis-call that previously masked this evidence behind a TypeError) is fixed in #1218.
Reproduction
# Once #1218 merges; current main HEAD = 265aedb4 cannot reproduce without it
OUROBOROS_RUN_CANONICAL=1 uv run pytest tests/canonical/ -k cli-todo -v
# ~350s wall, ~$1 LLM cost. Expect phase=blocked terminal.
Local evidence preserved in .ooo-observability/R1-cli-todo-20260525-1739.log on the reporter's worktree (auto_session_id=auto_3f44b20d63b7, interview_id=interview_169770e8f45c48cf).
R1 evidence — interview ambiguity trajectory (13 rounds, never crossed the 0.2 gate)
round
overall
goal
constraints
success_criteria
ready?
3
0.298
0.78
0.62
0.68
False
4
0.304
0.78
0.62
0.66
False
5
0.540
0.55
0.35
0.45
False
6
0.401
0.74
0.52
0.49
False
7
0.464
0.68
0.46
0.42
False
8
0.361
0.72
0.55
0.62
False
9
0.421
0.72
0.42
0.55
False
10
0.353
0.74
0.55
0.62
False
11
0.397
0.72
0.55
0.50
False
12
0.373
0.72
0.58
0.55
False
Score oscillates between 0.298 and 0.540 — never converges below the 0.2 readiness gate. After round 12 the auto driver enters the safe-default fallback at 08:45:41:
R1 evidence — terminal state (recovered from log; JSON dump was not retained on disk)
phase=blocked
blocker: safe-default synthesis did not close the persisted interview: backend_done=False, ledger defaults rolled back
last_error_code=None
seed_origin=none
runtime_probe_evidence=[]
Suspect code
B1 — Logic: safe-default cannot close when backend silently no-ops synthesis
src/ouroboros/auto/interview_driver.py:440-454
state.interview_session_id=synthesis_turn.session_idstate.pending_question=synthesis_turn.questionifnot (synthesis_turn.seed_readyorsynthesis_turn.completed):
_revert_safe_default_entries(ledger, finalization.defaulted_sections)
blocker= (
"safe-default synthesis did not close the persisted interview: ""backend_done=False, ledger defaults rolled back"
)
state.ledger=ledger.to_dict()
state.mark_blocked(blocker, tool_name="interview.safe_default_synthesis")
record_authoring_backend(state)
self._save(state)
returnAutoInterviewResult(
"blocked", state.interview_session_id, ledger, self.max_rounds, blocker
)
When the Socratic backend accepts the synthesis answer but never flags seed_ready/completed on the resulting turn (the cli-todo runtime_context gap reproduces this reliably), #1167's policy rolls every default back and exits blocked. The backend appears to treat the driver-injected synthesis as just another user response, not a terminator.
B2 — Envelope: last_error_code never set for this blocker
Both safe-default failure sites in interview_driver.py (lines 434 and 449) call mark_blocked(blocker, tool_name="interview.safe_default_synthesis") without passing error_code=, so last_error_code defaults to None. The terminal envelope carries the rich blocker text but no canonical code — breaking the #1151 8-code mapping contract.
Sub-tasks
B1 (logic) — src/ouroboros/auto/interview_driver.py:440-454. Decide closure policy when the backend ack is content-only: either extend the safe-default contract with a third closure mode (alongside mutual_agreement and ledger_only) that accepts "backend echoed, ledger satisfied" as a close, or fail forward into a deterministic ledger_only close instead of reverting defaults. Document the chosen policy on feat(auto): safe-default closure mode + partial-unsafe blocker code (PR-B2) #1167.
B2 (envelope) — Add INTERVIEW_SAFE_DEFAULT_SYNTHESIS_NONCLOSURE (or equivalent — must be drawn from the feat(auto): canonical stop_reason_code for interview-layer blockers #1151 alphabet) and pass it as error_code= at both mark_blocked call sites in interview_driver.py:434, 449. Add a regression test under tests/auto/ asserting that any safe-default blocker emits a non-Nonelast_error_code from the documented alphabet.
fix: log safe-default synthesis nonclosure #1211 (open) — emits structured auto.interview.safe_default_synthesis_nonclosure event. Observability-only by design; complementary to this issue, not a substitute.
Summary
R1 live canonical run (
OUROBOROS_RUN_CANONICAL=1 pytest tests/canonical/ -k cli-todoon265aedb4) terminated withphase=blockedat safe-default synthesis.ooo autocannot complete the cli-todo canonical scenario, which fails SSOT #1157 success condition #1 (all four canonical goals reachstatus=complete). Two coupled defects produce a single symptom:seed_ready/completed. Defaults are then rolled back and the auto pipeline exits blocked.last_error_code=None, violating the canonical 8-code mapping contract introduced by feat(auto): canonical stop_reason_code for interview-layer blockers #1151.This issue tracks the behavioural fix. #1211 (open) is the observability complement for the same decision point and should land alongside, not instead of, the work proposed here. Bug A (the test-harness
is_ok()/unwrap_err()mis-call that previously masked this evidence behind aTypeError) is fixed in #1218.Reproduction
Local evidence preserved in
.ooo-observability/R1-cli-todo-20260525-1739.logon the reporter's worktree (auto_session_id=auto_3f44b20d63b7, interview_id=interview_169770e8f45c48cf).R1 evidence — interview ambiguity trajectory (13 rounds, never crossed the 0.2 gate)
Score oscillates between 0.298 and 0.540 — never converges below the 0.2 readiness gate. After round 12 the auto driver enters the safe-default fallback at 08:45:41:
R1 evidence — terminal state (recovered from log; JSON dump was not retained on disk)
phase=blockedsafe-default synthesis did not close the persisted interview: backend_done=False, ledger defaults rolled backlast_error_code=Noneseed_origin=noneruntime_probe_evidence=[]Suspect code
B1 — Logic: safe-default cannot close when backend silently no-ops synthesis
src/ouroboros/auto/interview_driver.py:440-454When the Socratic backend accepts the synthesis answer but never flags
seed_ready/completedon the resulting turn (the cli-todoruntime_contextgap reproduces this reliably), #1167's policy rolls every default back and exits blocked. The backend appears to treat the driver-injected synthesis as just another user response, not a terminator.B2 — Envelope:
last_error_codenever set for this blockersrc/ouroboros/auto/state.py:626-636Both safe-default failure sites in
interview_driver.py(lines 434 and 449) callmark_blocked(blocker, tool_name="interview.safe_default_synthesis")without passingerror_code=, solast_error_codedefaults toNone. The terminal envelope carries the rich blocker text but no canonical code — breaking the #1151 8-code mapping contract.Sub-tasks
src/ouroboros/auto/interview_driver.py:440-454. Decide closure policy when the backend ack is content-only: either extend the safe-default contract with a third closure mode (alongsidemutual_agreementandledger_only) that accepts "backend echoed, ledger satisfied" as a close, or fail forward into a deterministicledger_onlyclose instead of reverting defaults. Document the chosen policy on feat(auto): safe-default closure mode + partial-unsafe blocker code (PR-B2) #1167.INTERVIEW_SAFE_DEFAULT_SYNTHESIS_NONCLOSURE(or equivalent — must be drawn from the feat(auto): canonical stop_reason_code for interview-layer blockers #1151 alphabet) and pass it aserror_code=at bothmark_blockedcall sites ininterview_driver.py:434, 449. Add a regression test undertests/auto/asserting that any safe-default blocker emits a non-Nonelast_error_codefrom the documented alphabet.Test-harness coordination
TypeError. Reproduction above assumes fix(canonical): call Result.is_ok/is_err as properties #1218 has merged._Oktest stub with method-shapeis_ok()/unwrap_err()that mirrors the broken API. When fix(canonical): call Result.is_ok/is_err as properties #1218 lands, that stub must be aligned with the realResultproperty API; otherwise it silently re-introduces shape drift. See coordination note on fix(canonical): call Result.is_ok/is_err as properties #1218.Prior art / related work
stop_reason_codefor interview-layer blockers; B2 fills a gap in that alphabet.auto.interview.safe_default_synthesis_nonclosureevent. Observability-only by design; complementary to this issue, not a substitute.Cross-refs
Constraints (per evidence-driven minimal-substrate policy)
main.