fix(auto): honor prompt-declared non_goals in unsafe-context matcher#1221
fix(auto): honor prompt-declared non_goals in unsafe-context matcher#1221shaun0927 wants to merge 3 commits into
Conversation
`_unsafe_context_reason` already excludes ledger NON_GOAL entries because confirmed non-goals are explicit exclusions, but that exclusion only fires after the interview has structured them into the ledger. Callers that pre-declare their non-goals in the free-form goal string — handoff prompts and scripted `ooo auto` invocations that bundle the seven canonical interview slots in the request body — leak those exclusion words into the matcher before the interview can register them, flipping the gate into "ambiguous external side effect" on the user's own exclusion text. Add a line-anchored `_strip_prompt_non_goal_sections` pre-pass that drops any `non_goals:` / `non-goals:` / `non goals:` / `excludes:` / `out-of-scope:` section (including bullet-list bodies) from the goal string before unsafe-context matching. Inline prose mentions of "non-goals" without a trailing colon are intentionally untouched. Repro from `ouroboros-plugins` Issue #28 (Superpowers AgentOS L3a): ouroboros auto --skip-run --max-interview-rounds 8 \ "Add bounded retry to a network client. non_goals: implementing a production deploy, ... constraints: filesystem:read and filesystem:write only; ..." Before this fix the matcher fires on "deploy" inside `non_goals:` at round 8, marks every gap as unsafe, and blocks with `auto.interview.safe_default.unsafe_context_match pattern_name='ambiguous external side effect'`. After this fix the same prompt closes through the normal safe-default path because the non-goals body is excluded from matcher input — matching what the NON_GOAL ledger exclusion already promises. Tests: 17 new cases in `tests/unit/auto/test_safe_defaults_prompt_non_goals.py` covering header variants, bullet-list bodies, blank-line termination, inline-prose preservation, idempotency, and four `_unsafe_context_reason` integration cases. Existing 153 safe-defaults / interview tests still pass.
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
Metadata
| Field | Value |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.
---|---|
| PR | #1221 |
| HEAD checked | 223d3a50b5200d151ff11be31c01c5073cd9c28d |
| Request ID | req_1779701717_8 |
| Review record | e37ac873-518f-4a86-85a9-a6fa19f7529b |
What Improved
- Prevents declared
non_goals:/excludes:/out-of-scope:prompt sections from being treated as active unsafe scope during safe-default finalization.
Issue Requirements
| Requirement | Status |
|---|---|
| PR body requirements | N/A - PR body contains no material requirement text. |
| Keep declared prompt non-goals from tripping unsafe-context matching | Partially met - declared sections are stripped, but the stripping is too broad and can remove later active unsafe scope. |
| Preserve unsafe-context blocking for active deploy/release/publish intent | Partially met - covered when active intent is outside a stripped section, but not when it follows an inline non-goal header without a blank/header separator. |
Prior Findings Status
No prior bot review findings were present in /tmp/pr_prior_bot_reviews_1221.md; nothing to maintain or withdraw.
Blockers
| # | File:Line | Severity | Finding |
|---|---|---|---|
| 1 | src/ouroboros/auto/safe_defaults.py:612 | BLOCKING | _strip_prompt_non_goal_sections() enters skipping on any non-goal header and then drops every following non-blank, non-labelled line. Because state.goal is free-form caller input and feeds the unsafe-context gate before safe-default finalization, a prompt like Goal: Build CLI\nnon_goals: no credentials\nDeploy to production has the active Deploy to production line removed before _unsafe_context_reason() runs, so the gate returns safe instead of blocking an external side effect. This should fail closed: for inline headers with a same-line body, only strip that line, or only treat clearly indented/bulleted continuation lines as part of the non-goal section. |
Follow-up Findings
| # | File:Line | Priority | Confidence | Suggestion |
|---|---|---|---|---|
| 1 | tests/unit/auto/test_safe_defaults_prompt_non_goals.py:87 | Medium | High | Add a regression case for an inline non-goals header followed immediately by an unlabelled active unsafe instruction, so the sanitizer cannot silently hide positive unsafe scope. |
Non-blocking Suggestions
| 1 | tests/unit/auto/test_safe_defaults_prompt_non_goals.py:1 | Test coverage | The new tests cover blank-line and next-header termination, but not the fail-closed ambiguity where an inline non-goal line is followed by unindented prose. |
Test Coverage Notes
- Reviewed the new unit tests and adjacent safe-default tests in
tests/unit/auto/test_interview_pipeline.py. - Could not run pytest in this environment:
python3 -m pytestfailed becausepytestis not installed; importing the package also required missing runtime dependencystructlog. - Exercised the changed helper path with a stubbed
structlogand confirmed the unsafe active line after an inlinenon_goals:header is stripped.
Design Notes
The direction matches the existing ledger rule that confirmed NON_GOAL entries are exclusions, but the free-form prompt sanitizer needs a tighter section boundary because it sits in a safety gate.
Design / Roadmap Gate
Affected boundary: AutoInterviewDriver passes persisted state.goal into finalize_safe_defaultable_gaps(), which calls _unsafe_context_reason() before defaulting unresolved Seed sections. The new sanitizer is therefore on the runtime safety path for autonomous closure. It is compatible with ledger NON_GOAL semantics, but the current continuation rule is too permissive for free-form input and can hide active unsafe scope before matching.
Directional Notes
Maintainer posture focused this review on safe-default fail-closed behavior: prompt parsing cannot be allowed to convert caller-controlled unsafe instructions into invisible text without a strong boundary.
Test Coverage
- Reviewed the new unit tests and adjacent safe-default tests in
tests/unit/auto/test_interview_pipeline.py. - Could not run pytest in this environment:
python3 -m pytestfailed becausepytestis not installed; importing the package also required missing runtime dependencystructlog. - Exercised the changed helper path with a stubbed
structlogand confirmed the unsafe active line after an inlinenon_goals:header is stripped.
Merge Recommendation
REQUEST_CHANGES until the non-goal section stripping fails closed for unlabelled active prose after inline headers and has a regression test for that case.
Review-Metadata:
verdict: REQUEST_CHANGES
head_sha: 223d3a5
request_id: req_1779701717_8
review_profile: memory-aware-zero-trust-v2
advisory_memory_only: true
Reviewed by ouroboros-agent[bot] via Codex deep analysis
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
Metadata
| Field | Value |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.
---|---|
| PR | #1221 |
| HEAD checked | eea294b057dde3dd9e30eadfc8ead51c50c96bde |
| Request ID | req_1779709971_22 |
| Review record | f54ac847-490d-4740-b18b-f68b5d343736 |
What Improved
- Prevents declared inline and clearly delimited
non_goals:/excludes:/out-of-scope:prompt sections from being treated as active unsafe scope during safe-default finalization. - Adds regression coverage for active unsafe prose after an inline non-goal header.
Issue Requirements
| Requirement | Status |
|---|---|
| PR body requirements | N/A - PR body contains no material requirement text. |
| Keep declared prompt non-goals from tripping unsafe-context matching | Partially met - inline bodies and simple indented/list bodies work, but indented labelled body lines are preserved and still trip the unsafe matcher. |
| Preserve unsafe-context blocking for active deploy/release/publish intent | Met - active unsafe prose outside a non-goal section is still covered by the added tests and current matcher path. |
Prior Findings Status
Prior blocker modified/withdrawn: the earlier issue where an inline non_goals: header swallowed following unindented active prose appears fixed by skipping = not line[header_match.end():].strip() at src/ouroboros/auto/safe_defaults.py:623 and the regression at tests/unit/auto/test_safe_defaults_prompt_non_goals.py:154. The remaining blocker is a different section-boundary bug for indented labelled continuation lines.
Blockers
| # | File:Line | Severity | Finding |
|---|---|---|---|
| 1 | src/ouroboros/auto/safe_defaults.py:632 | BLOCKING | _strip_prompt_non_goal_sections() checks _PROMPT_SECTION_HEADER before checking whether the line is indented, so an indented labelled item inside a non-goals body is treated as a new active section and preserved. For example, non_goals:\n deploy: production\nactors: user leaves deploy: production in the matcher input, causing _unsafe_context_reason() to block safe-default finalization even though the line is structurally inside the declared non-goal section. This contradicts the helper’s own contract that indented continuation lines are stripped, and it keeps a common YAML-ish structured prompt shape broken for the changed runtime boundary. |
Follow-up Findings
| # | File:Line | Priority | Confidence | Suggestion |
|---|---|---|---|---|
| 1 | tests/unit/auto/test_safe_defaults_prompt_non_goals.py:70 | Medium | High | Add a regression test for indented labelled non-goal body lines such as non_goals:\n deploy: production, so section-boundary detection does not preserve nested non-goal metadata as active unsafe scope. |
Non-blocking Suggestions
None.
Test Coverage Notes
- Reviewed
tests/unit/auto/test_safe_defaults_prompt_non_goals.pyand adjacent safe-default tests intests/unit/auto/test_interview_pipeline.py. - Could not run pytest:
python3 -m pytest ...failed becausepytestis not installed. - Direct import also required missing
structlog; with a stubbedstructlog, I exercised the helper path and confirmed the indented labelled non-goal body remains in matcher input and returnsambiguous external side effect.
Design Notes
The design direction is sound: string-level prompt non-goals should align with ledger NON_GOAL exclusion semantics. The parser boundary needs one more tightening so nested structured non-goal content is not misclassified as active scope.
Design / Roadmap Gate
Affected boundary: AutoInterviewDriver passes persisted state.goal into finalize_safe_defaultable_gaps(), which calls _unsafe_context_reason() before safe-defaulting unresolved Seed sections. The new sanitizer is therefore on the runtime convergence path. It preserves active unindented unsafe prose, but currently misclassifies indented labelled non-goal body lines, creating a direct prompt-format compatibility failure for structured handoff/scripted auto goals.
Directional Notes
Maintainer focus was on fail-closed safety without turning explicit exclusions into false unsafe scope. Memory was advisory only; the blocker above is based on current source behavior in the changed helper.
Test Coverage
- Reviewed
tests/unit/auto/test_safe_defaults_prompt_non_goals.pyand adjacent safe-default tests intests/unit/auto/test_interview_pipeline.py. - Could not run pytest:
python3 -m pytest ...failed becausepytestis not installed. - Direct import also required missing
structlog; with a stubbedstructlog, I exercised the helper path and confirmed the indented labelled non-goal body remains in matcher input and returnsambiguous external side effect.
Merge Recommendation
REQUEST_CHANGES until indented labelled continuation lines inside declared non-goal sections are stripped or otherwise handled consistently with the documented section-body contract, with a regression test for that shape.
Review-Metadata:
verdict: REQUEST_CHANGES
head_sha: eea294b
request_id: req_1779709971_22
review_profile: memory-aware-zero-trust-v2
advisory_memory_only: true
Reviewed by ouroboros-agent[bot] via Codex deep analysis
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: APPROVE
Metadata
| Field | Value |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.
---|---|
| PR | #1221 |
| HEAD checked | e8637baa1abd9b6fcf54c7ac1f6b1f85777b99eb |
| Request ID | req_1779719262_32 |
| Review record | 8aae4ca3-809e-4e23-bd46-d7d4cb46713a |
What Improved
- Prevents declared
non_goals:/excludes:/out-of-scope:sections in free-form goals from being treated as active unsafe scope during safe-default finalization. - Adds regression coverage for inline non-goal bodies, multiline/bulleted bodies, indented labelled body lines, active unsafe prose after inline headers, and repeated exclusion sections.
Issue Requirements
| Requirement | Status |
|---|---|
| No linked issue / PR body requirement captured | N/A |
Prior Findings Status
Prior blocker #1 from the first review is withdrawn: current src/ouroboros/auto/safe_defaults.py:623 only keeps skipping after an empty non-goal header, so inline non-goal bodies no longer swallow following active prose; tests/unit/auto/test_safe_defaults_prompt_non_goals.py:168 covers that regression.
Prior blocker #1 from the second review is withdrawn: current src/ouroboros/auto/safe_defaults.py:632 checks indentation before section-header detection, so indented labelled body lines like deploy: production remain scoped under the non-goal block; tests/unit/auto/test_safe_defaults_prompt_non_goals.py:87 covers that shape.
Blockers
No in-scope blocking findings remained after policy filtering.
Follow-up Findings
| # | File:Line | Priority | Confidence | Suggestion |
|---|---|---|---|---|
| None. |
Non-blocking Suggestions
None.
Test Coverage Notes
- Reviewed
tests/unit/auto/test_safe_defaults_prompt_non_goals.pyand adjacent safe-default tests intests/unit/auto/test_interview_pipeline.py. - Could not run pytest:
/usr/bin/python3: No module named pytest. - Direct import also required missing
structlog; with a stubbedstructlog, manually exercised the prior blocker cases and confirmed active deploy prose is preserved while indented labelled non-goal body lines are stripped.
Design Notes
The approach is scoped to the unsafe-context gate and preserves the existing ledger rule that NON_GOAL entries are exclusions, not active scope. The helper now fails closed for unindented active prose while allowing structurally clear exclusion sections.
Design / Roadmap Gate
Affected boundary: AutoInterviewDriver passes persisted state.goal into finalize_safe_defaultable_gaps(), which calls _unsafe_context_reason() before defaulting unresolved Seed sections. Current HEAD preserves active unsafe goal text, ledger-derived unsafe values, and user interview answers, while excluding confirmed non-goal sections and policy-authored defaults. The prior section-boundary regressions appear covered by source and tests.
Directional Notes
Maintainer focus was on the safe-default runtime boundary: caller-controlled prompt text must not hide active unsafe instructions, but explicit exclusions also must not block benign autonomous closure. Memory was advisory only; no blocker is raised from memory.
Test Coverage
- Reviewed
tests/unit/auto/test_safe_defaults_prompt_non_goals.pyand adjacent safe-default tests intests/unit/auto/test_interview_pipeline.py. - Could not run pytest:
/usr/bin/python3: No module named pytest. - Direct import also required missing
structlog; with a stubbedstructlog, manually exercised the prior blocker cases and confirmed active deploy prose is preserved while indented labelled non-goal body lines are stripped.
Merge Recommendation
APPROVE. I found no current blocking runtime or contract issue in the changed safe-default sanitizer. Test execution is limited by missing local dependencies, but the added tests and manual smoke checks cover the previously failing boundary cases.
Review-Metadata:
verdict: APPROVE
head_sha: e8637ba
request_id: req_1779719262_32
review_profile: memory-aware-zero-trust-v2
advisory_memory_only: true
Reviewed by ouroboros-agent[bot] via Codex deep analysis
Merge-readiness rationale (English)This PR is ready to merge. Here is the chain of reasoning, tied back to issue #961's Track B ( What the PR does
This change introduces Why it aligns with the SSOT direction
Why it is not over-engineered
Why it is mergeable
Risk assessment
Recommending merge. |
PR Review SummaryPosted via VerdictApprove Scope Reviewed
Blocking IssuesNone. WarningsNone. Mutation-Test Thinking
Complexity / CRAP-style Risk
Test Quality Assessment
Security / Operational Risk
Looks Good
Final RecommendationAPPROVE. The PR is a fail-closed string-level companion to an already-shipped ledger-level invariant. The helper's section-boundary logic has been hardened across two review rounds with explicit regression tests, and the integration tests both prove the new behaviour and pin that active unsafe prose still trips the gate. No blocking findings, no warnings. Review-Metadata: |
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
Metadata
| Field | Value |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.
---|---|
| PR | #1221 |
| HEAD checked | e8637baa1abd9b6fcf54c7ac1f6b1f85777b99eb |
| Request ID | req_1779721799_1 |
| Review record | f6cbc35f-8b79-4fef-a3d1-9ddf7c71b468 |
What Improved
- Adds string-level handling so declared
non_goals:/excludes:/out-of-scope:goal sections are not treated as active unsafe scope before the interview ledger can classify them. - Adds regression tests for inline non-goal bodies, bulleted bodies, indented labelled body lines, active unsafe prose after inline headers, and repeated exclusion sections.
Issue Requirements
| Requirement | Status |
|---|---|
| Keep declared prompt non-goals from tripping unsafe-context matching | Partially met |
| Preserve unsafe-context blocking for active deploy/release/publish intent outside non-goal sections | Partially met |
| No material PR-body requirement text captured | N/A |
Prior Findings Status
Prior blocker about inline non-goal headers swallowing following unindented active prose is withdrawn: current code only keeps skipping after an empty header, and the test at tests/unit/auto/test_safe_defaults_prompt_non_goals.py:168 covers it. Prior blocker about indented labelled body lines being preserved is withdrawn: current code strips indented body lines, with coverage at tests/unit/auto/test_safe_defaults_prompt_non_goals.py:87. The latest prior approval is modified by the new blocker above: same-indentation or uniformly indented labelled sections can still be hidden.
Blockers
| # | File:Line | Severity | Finding |
|---|---|---|---|
| 1 | src/ouroboros/auto/safe_defaults.py:632 | BLOCKING | _strip_prompt_non_goal_sections() drops every indented line inside a non-goal block before checking whether that line is actually the next labelled section. Because _PROMPT_NON_GOAL_HEADER explicitly accepts leading whitespace, uniformly indented handoff bodies are in scope, but a prompt like non_goals:\n credentials: customer secrets\n constraints: must deploy to production after merging strips the indented constraints: line and _unsafe_context_reason() returns None. This hides active unsafe scope from the safe-default gate that AutoInterviewDriver feeds with caller-controlled state.goal, allowing autonomous default finalization where it should block. Track the indentation level of the non-goal header, or otherwise preserve same/lower-indent labelled section headers, and add a regression for an indented seven-slot prompt with unsafe constraints:. |
Follow-up Findings
| # | File:Line | Priority | Confidence | Suggestion |
|---|---|---|---|---|
| None. |
Non-blocking Suggestions
| None. |
Test Coverage Notes
- Reviewed
tests/unit/auto/test_safe_defaults_prompt_non_goals.pyand adjacent safe-default/interview tests. - Could not run pytest:
/usr/bin/python3: No module named pytest. - Direct import required missing
structlog; with a smallstructlogstub andPYTHONPATH=src, I manually confirmed the indented prompt case sanitizes down to only the goal line and returnsNonefrom_unsafe_context_reason().
Design Notes
The string-level exclusion matches the existing ledger NON_GOAL policy, but the parser needs indentation-aware section boundaries because it sits directly in the autonomous safe-default safety gate.
Design / Roadmap Gate
Affected boundary: AutoInterviewDriver passes persisted state.goal into finalize_safe_defaultable_gaps(), which calls _unsafe_context_reason() before defaulting unresolved Seed sections. The new sanitizer is recomputed at runtime and not persisted, but it can remove active unsafe sections from the matcher input. Compatibility with indented handoff/bundled prompt formats is therefore part of the safety contract, not just formatting tolerance.
Directional Notes
Review focus was the runtime safety boundary for caller-controlled prompt text. Memory was advisory only; the blocker is based on current source behavior in the changed helper.
Test Coverage
- Reviewed
tests/unit/auto/test_safe_defaults_prompt_non_goals.pyand adjacent safe-default/interview tests. - Could not run pytest:
/usr/bin/python3: No module named pytest. - Direct import required missing
structlog; with a smallstructlogstub andPYTHONPATH=src, I manually confirmed the indented prompt case sanitizes down to only the goal line and returnsNonefrom_unsafe_context_reason().
Merge Recommendation
REQUEST_CHANGES until the non-goal stripper preserves later labelled sections in uniformly indented prompts and covers that fail-closed case with a regression test.
Review-Metadata:
verdict: REQUEST_CHANGES
head_sha: e8637ba
request_id: req_1779721799_1
review_profile: memory-aware-zero-trust-v2
advisory_memory_only: true
Reviewed by ouroboros-agent[bot] via Codex deep analysis
Summary
Why
`_unsafe_context_reason` already documents that ledger `NON_GOAL` entries are excluded because confirmed non-goals are explicit exclusions, not active unsafe scope. That exclusion only fires after the interview has structured the user's exclusion text into the ledger.
Callers that pre-declare their non-goals inside the free-form goal string — handoff prompts, scripted `ooo auto --skip-run` invocations, and any caller that bundles the seven canonical interview slots in the request body — leak those exclusion words into the matcher before the interview can register them. The matcher then trips on the user's own exclusion text and blocks the run with `auto.interview.safe_default.unsafe_context_match pattern_name='ambiguous external side effect'`.
This is the structural blocker behind every L3-style downstream-handoff verification on the `ouroboros-plugins` AgentOS assimilation issues that touch an externally-effecting domain.
Reproduction (before the fix)
Recorded against `origin/main` of `Q00/ouroboros-plugins` `67f776d` with installed `ouroboros-ai 0.39.1`:
```bash
ouroboros auto --skip-run --max-interview-rounds 8 --runtime codex \
"Add bounded retry behaviour to a network client.
non_goals: implementing a production deploy, mutating remote git state, calling external services
actors: single local CLI operator
inputs: handoff.md
outputs: an A-grade execution Seed and a verification plan
constraints: filesystem:read and filesystem:write only; no live merge or PR mutation
failure_modes: retry storm; idempotency violation; missing audit event
runtime_context: codex runtime, dry-run preferred"
```
Observed:
```
[auto] interview — interview round 8/8
[auto] interview — answered round 8/8 from conservative_default
[auto] blocked — auto interview reached max_rounds=8 without closure:
backend_done=False (ambiguity_score=0.228),
ledger_done=False (open_gaps=['actors', 'inputs', 'outputs',
'failure_modes', 'runtime_context'])
auto.interview.safe_default.unsafe_context_observed
unsafe_gaps=(
'actors: unsafe default context (ambiguous external side effect)',
'inputs: unsafe default context (ambiguous external side effect)',
'outputs: unsafe default context (ambiguous external side effect)',
'failure_modes: unsafe default context (ambiguous external side effect)',
'runtime_context: unsafe default context (ambiguous external side effect)'
)
```
`safe_defaults.py` fires on `deploy` inside the user's own `non_goals:` body. `_strip_negated_clauses` does not help here because the user did not phrase it as a negation (`No deploy`) — they declared a structural exclusion (`non_goals: deploy`), which is a different shape than what the negation stripper covers.
After this fix the same prompt closes through the normal safe-default path because the non-goals body is excluded from matcher input — matching what the ledger `NON_GOAL` exclusion already promises.
Scope
Out of scope
Test plan
Refs