Skip to content

fix(auto): honor prompt-declared non_goals in unsafe-context matcher#1221

Open
shaun0927 wants to merge 3 commits into
Q00:mainfrom
shaun0927:fix/safe-defaults-respect-prompt-non-goals
Open

fix(auto): honor prompt-declared non_goals in unsafe-context matcher#1221
shaun0927 wants to merge 3 commits into
Q00:mainfrom
shaun0927:fix/safe-defaults-respect-prompt-non-goals

Conversation

@shaun0927
Copy link
Copy Markdown
Collaborator

Summary

  • Apply the existing `NON_GOAL` ledger exclusion at the string level to any `non_goals:` / `excludes:` / `out-of-scope:` section in the free-form `goal` argument of `_unsafe_context_reason`.
  • Add a line-anchored `_strip_prompt_non_goal_sections` helper next to `_strip_negated_clauses`, used as a pre-pass before the existing NFKC normalization / negation stripping / regex bank.
  • Extend the `_unsafe_context_reason` docstring to document why prompt-pre-declared non-goals receive the same exclusion as ledger `NON_GOAL` entries.
  • New tests at `tests/unit/auto/test_safe_defaults_prompt_non_goals.py` (17 cases) covering header variants, bullet-list bodies, blank-line termination, inline-prose preservation, idempotency, and four `_unsafe_context_reason` integration cases.

Why

`_unsafe_context_reason` already documents that ledger `NON_GOAL` entries are excluded because confirmed non-goals are explicit exclusions, not active unsafe scope. That exclusion only fires after the interview has structured the user's exclusion text into the ledger.

Callers that pre-declare their non-goals inside the free-form goal string — handoff prompts, scripted `ooo auto --skip-run` invocations, and any caller that bundles the seven canonical interview slots in the request body — leak those exclusion words into the matcher before the interview can register them. The matcher then trips on the user's own exclusion text and blocks the run with `auto.interview.safe_default.unsafe_context_match pattern_name='ambiguous external side effect'`.

This is the structural blocker behind every L3-style downstream-handoff verification on the `ouroboros-plugins` AgentOS assimilation issues that touch an externally-effecting domain.

Reproduction (before the fix)

Recorded against `origin/main` of `Q00/ouroboros-plugins` `67f776d` with installed `ouroboros-ai 0.39.1`:

```bash
ouroboros auto --skip-run --max-interview-rounds 8 --runtime codex \
"Add bounded retry behaviour to a network client.
non_goals: implementing a production deploy, mutating remote git state, calling external services
actors: single local CLI operator
inputs: handoff.md
outputs: an A-grade execution Seed and a verification plan
constraints: filesystem:read and filesystem:write only; no live merge or PR mutation
failure_modes: retry storm; idempotency violation; missing audit event
runtime_context: codex runtime, dry-run preferred"
```

Observed:

```
[auto] interview — interview round 8/8
[auto] interview — answered round 8/8 from conservative_default
[auto] blocked — auto interview reached max_rounds=8 without closure:
backend_done=False (ambiguity_score=0.228),
ledger_done=False (open_gaps=['actors', 'inputs', 'outputs',
'failure_modes', 'runtime_context'])

auto.interview.safe_default.unsafe_context_observed
unsafe_gaps=(
'actors: unsafe default context (ambiguous external side effect)',
'inputs: unsafe default context (ambiguous external side effect)',
'outputs: unsafe default context (ambiguous external side effect)',
'failure_modes: unsafe default context (ambiguous external side effect)',
'runtime_context: unsafe default context (ambiguous external side effect)'
)
```

`safe_defaults.py` fires on `deploy` inside the user's own `non_goals:` body. `_strip_negated_clauses` does not help here because the user did not phrase it as a negation (`No deploy`) — they declared a structural exclusion (`non_goals: deploy`), which is a different shape than what the negation stripper covers.

After this fix the same prompt closes through the normal safe-default path because the non-goals body is excluded from matcher input — matching what the ledger `NON_GOAL` exclusion already promises.

Scope

  • Only the matcher input is touched. The unsafe-pattern bank, the NFKC normalization, the negation stripper, and the `_unsafe_ledger_values` / `_interview_answers` projections are unchanged.
  • Header recognition is line-anchored and requires a trailing colon (`non_goals:`, `Excludes:`, etc.), so ordinary prose that says `We will discuss non-goals later` is untouched.
  • Section termination is conservative: blank line, or another labelled `:` line (including bullet form `- constraints:`). Bullet-body lines under the non-goals header are dropped.
  • Constraints / actors / inputs / outputs / etc. sections are not stripped — only the explicit non-goals / excludes / out-of-scope class. A `constraints: must deploy to production` line still trips the matcher, as it should.

Out of scope

  • The synthesis-ack closure bug fixed by fix(auto): close safe-default synthesis ack #1220 (close `safe_default` synthesis ack even when backend ack is non-terminal). This PR addresses a different failure path: the matcher fires before the synthesis ack pathway is even reached, because `unsafe_gaps` is non-empty when the user pre-declared non-goals.
  • Adding a CLI surface (e.g. `--accept-pre-filled-slots`). The string-level exclusion is a strict superset of what such a flag would do, and it requires no opt-in.
  • Seeding the ledger from prompt-level slot markers (`actors:`, `inputs:`, …) before the interview. That is a broader change and orthogonal to closing the matcher false-positive.

Test plan

  • `uv run pytest tests/unit/auto/test_safe_defaults_prompt_non_goals.py -q` — 17 new tests pass, covering helper variants and `_unsafe_context_reason` integration.
  • `uv run pytest tests/unit/auto/test_safe_default_observability.py tests/unit/auto/test_safe_defaults_domain_profile.py tests/unit/auto/test_interview_pipeline.py -q` — 153 existing tests still pass, no regression.
  • `uv run ruff check src/ouroboros/auto/safe_defaults.py tests/unit/auto/test_safe_defaults_prompt_non_goals.py` — clean.
  • `uv run ruff format --check src/ouroboros/auto/safe_defaults.py tests/unit/auto/test_safe_defaults_prompt_non_goals.py` — both formatted.
  • `uv run mypy src/ouroboros/auto/safe_defaults.py` — no issues.

Refs

`_unsafe_context_reason` already excludes ledger NON_GOAL entries because
confirmed non-goals are explicit exclusions, but that exclusion only
fires after the interview has structured them into the ledger. Callers
that pre-declare their non-goals in the free-form goal string — handoff
prompts and scripted `ooo auto` invocations that bundle the seven
canonical interview slots in the request body — leak those exclusion
words into the matcher before the interview can register them, flipping
the gate into "ambiguous external side effect" on the user's own
exclusion text.

Add a line-anchored `_strip_prompt_non_goal_sections` pre-pass that
drops any `non_goals:` / `non-goals:` / `non goals:` / `excludes:` /
`out-of-scope:` section (including bullet-list bodies) from the goal
string before unsafe-context matching. Inline prose mentions of
"non-goals" without a trailing colon are intentionally untouched.

Repro from `ouroboros-plugins` Issue #28 (Superpowers AgentOS L3a):

    ouroboros auto --skip-run --max-interview-rounds 8 \
      "Add bounded retry to a network client.
       non_goals: implementing a production deploy, ...
       constraints: filesystem:read and filesystem:write only; ..."

Before this fix the matcher fires on "deploy" inside `non_goals:` at
round 8, marks every gap as unsafe, and blocks with
`auto.interview.safe_default.unsafe_context_match
 pattern_name='ambiguous external side effect'`. After this fix the
same prompt closes through the normal safe-default path because the
non-goals body is excluded from matcher input — matching what the
NON_GOAL ledger exclusion already promises.

Tests: 17 new cases in `tests/unit/auto/test_safe_defaults_prompt_non_goals.py`
covering header variants, bullet-list bodies, blank-line termination,
inline-prose preservation, idempotency, and four `_unsafe_context_reason`
integration cases. Existing 153 safe-defaults / interview tests still
pass.
Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

Metadata

| Field | Value |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.

---|---|
| PR | #1221 |
| HEAD checked | 223d3a50b5200d151ff11be31c01c5073cd9c28d |
| Request ID | req_1779701717_8 |
| Review record | e37ac873-518f-4a86-85a9-a6fa19f7529b |

What Improved

  • Prevents declared non_goals: / excludes: / out-of-scope: prompt sections from being treated as active unsafe scope during safe-default finalization.

Issue Requirements

Requirement Status
PR body requirements N/A - PR body contains no material requirement text.
Keep declared prompt non-goals from tripping unsafe-context matching Partially met - declared sections are stripped, but the stripping is too broad and can remove later active unsafe scope.
Preserve unsafe-context blocking for active deploy/release/publish intent Partially met - covered when active intent is outside a stripped section, but not when it follows an inline non-goal header without a blank/header separator.

Prior Findings Status

No prior bot review findings were present in /tmp/pr_prior_bot_reviews_1221.md; nothing to maintain or withdraw.

Blockers

# File:Line Severity Finding
1 src/ouroboros/auto/safe_defaults.py:612 BLOCKING _strip_prompt_non_goal_sections() enters skipping on any non-goal header and then drops every following non-blank, non-labelled line. Because state.goal is free-form caller input and feeds the unsafe-context gate before safe-default finalization, a prompt like Goal: Build CLI\nnon_goals: no credentials\nDeploy to production has the active Deploy to production line removed before _unsafe_context_reason() runs, so the gate returns safe instead of blocking an external side effect. This should fail closed: for inline headers with a same-line body, only strip that line, or only treat clearly indented/bulleted continuation lines as part of the non-goal section.

Follow-up Findings

# File:Line Priority Confidence Suggestion
1 tests/unit/auto/test_safe_defaults_prompt_non_goals.py:87 Medium High Add a regression case for an inline non-goals header followed immediately by an unlabelled active unsafe instruction, so the sanitizer cannot silently hide positive unsafe scope.

Non-blocking Suggestions

| 1 | tests/unit/auto/test_safe_defaults_prompt_non_goals.py:1 | Test coverage | The new tests cover blank-line and next-header termination, but not the fail-closed ambiguity where an inline non-goal line is followed by unindented prose. |

Test Coverage Notes

  • Reviewed the new unit tests and adjacent safe-default tests in tests/unit/auto/test_interview_pipeline.py.
  • Could not run pytest in this environment: python3 -m pytest failed because pytest is not installed; importing the package also required missing runtime dependency structlog.
  • Exercised the changed helper path with a stubbed structlog and confirmed the unsafe active line after an inline non_goals: header is stripped.

Design Notes

The direction matches the existing ledger rule that confirmed NON_GOAL entries are exclusions, but the free-form prompt sanitizer needs a tighter section boundary because it sits in a safety gate.

Design / Roadmap Gate

Affected boundary: AutoInterviewDriver passes persisted state.goal into finalize_safe_defaultable_gaps(), which calls _unsafe_context_reason() before defaulting unresolved Seed sections. The new sanitizer is therefore on the runtime safety path for autonomous closure. It is compatible with ledger NON_GOAL semantics, but the current continuation rule is too permissive for free-form input and can hide active unsafe scope before matching.

Directional Notes

Maintainer posture focused this review on safe-default fail-closed behavior: prompt parsing cannot be allowed to convert caller-controlled unsafe instructions into invisible text without a strong boundary.

Test Coverage

  • Reviewed the new unit tests and adjacent safe-default tests in tests/unit/auto/test_interview_pipeline.py.
  • Could not run pytest in this environment: python3 -m pytest failed because pytest is not installed; importing the package also required missing runtime dependency structlog.
  • Exercised the changed helper path with a stubbed structlog and confirmed the unsafe active line after an inline non_goals: header is stripped.

Merge Recommendation

REQUEST_CHANGES until the non-goal section stripping fails closed for unlabelled active prose after inline headers and has a regression test for that case.

Review-Metadata:
verdict: REQUEST_CHANGES
head_sha: 223d3a5
request_id: req_1779701717_8
review_profile: memory-aware-zero-trust-v2
advisory_memory_only: true


Reviewed by ouroboros-agent[bot] via Codex deep analysis

Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

Metadata

| Field | Value |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.

---|---|
| PR | #1221 |
| HEAD checked | eea294b057dde3dd9e30eadfc8ead51c50c96bde |
| Request ID | req_1779709971_22 |
| Review record | f54ac847-490d-4740-b18b-f68b5d343736 |

What Improved

  • Prevents declared inline and clearly delimited non_goals: / excludes: / out-of-scope: prompt sections from being treated as active unsafe scope during safe-default finalization.
  • Adds regression coverage for active unsafe prose after an inline non-goal header.

Issue Requirements

Requirement Status
PR body requirements N/A - PR body contains no material requirement text.
Keep declared prompt non-goals from tripping unsafe-context matching Partially met - inline bodies and simple indented/list bodies work, but indented labelled body lines are preserved and still trip the unsafe matcher.
Preserve unsafe-context blocking for active deploy/release/publish intent Met - active unsafe prose outside a non-goal section is still covered by the added tests and current matcher path.

Prior Findings Status

Prior blocker modified/withdrawn: the earlier issue where an inline non_goals: header swallowed following unindented active prose appears fixed by skipping = not line[header_match.end():].strip() at src/ouroboros/auto/safe_defaults.py:623 and the regression at tests/unit/auto/test_safe_defaults_prompt_non_goals.py:154. The remaining blocker is a different section-boundary bug for indented labelled continuation lines.

Blockers

# File:Line Severity Finding
1 src/ouroboros/auto/safe_defaults.py:632 BLOCKING _strip_prompt_non_goal_sections() checks _PROMPT_SECTION_HEADER before checking whether the line is indented, so an indented labelled item inside a non-goals body is treated as a new active section and preserved. For example, non_goals:\n deploy: production\nactors: user leaves deploy: production in the matcher input, causing _unsafe_context_reason() to block safe-default finalization even though the line is structurally inside the declared non-goal section. This contradicts the helper’s own contract that indented continuation lines are stripped, and it keeps a common YAML-ish structured prompt shape broken for the changed runtime boundary.

Follow-up Findings

# File:Line Priority Confidence Suggestion
1 tests/unit/auto/test_safe_defaults_prompt_non_goals.py:70 Medium High Add a regression test for indented labelled non-goal body lines such as non_goals:\n deploy: production, so section-boundary detection does not preserve nested non-goal metadata as active unsafe scope.

Non-blocking Suggestions

None.

Test Coverage Notes

  • Reviewed tests/unit/auto/test_safe_defaults_prompt_non_goals.py and adjacent safe-default tests in tests/unit/auto/test_interview_pipeline.py.
  • Could not run pytest: python3 -m pytest ... failed because pytest is not installed.
  • Direct import also required missing structlog; with a stubbed structlog, I exercised the helper path and confirmed the indented labelled non-goal body remains in matcher input and returns ambiguous external side effect.

Design Notes

The design direction is sound: string-level prompt non-goals should align with ledger NON_GOAL exclusion semantics. The parser boundary needs one more tightening so nested structured non-goal content is not misclassified as active scope.

Design / Roadmap Gate

Affected boundary: AutoInterviewDriver passes persisted state.goal into finalize_safe_defaultable_gaps(), which calls _unsafe_context_reason() before safe-defaulting unresolved Seed sections. The new sanitizer is therefore on the runtime convergence path. It preserves active unindented unsafe prose, but currently misclassifies indented labelled non-goal body lines, creating a direct prompt-format compatibility failure for structured handoff/scripted auto goals.

Directional Notes

Maintainer focus was on fail-closed safety without turning explicit exclusions into false unsafe scope. Memory was advisory only; the blocker above is based on current source behavior in the changed helper.

Test Coverage

  • Reviewed tests/unit/auto/test_safe_defaults_prompt_non_goals.py and adjacent safe-default tests in tests/unit/auto/test_interview_pipeline.py.
  • Could not run pytest: python3 -m pytest ... failed because pytest is not installed.
  • Direct import also required missing structlog; with a stubbed structlog, I exercised the helper path and confirmed the indented labelled non-goal body remains in matcher input and returns ambiguous external side effect.

Merge Recommendation

REQUEST_CHANGES until indented labelled continuation lines inside declared non-goal sections are stripped or otherwise handled consistently with the documented section-body contract, with a regression test for that shape.

Review-Metadata:
verdict: REQUEST_CHANGES
head_sha: eea294b
request_id: req_1779709971_22
review_profile: memory-aware-zero-trust-v2
advisory_memory_only: true


Reviewed by ouroboros-agent[bot] via Codex deep analysis

Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: APPROVE

Metadata

| Field | Value |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.

---|---|
| PR | #1221 |
| HEAD checked | e8637baa1abd9b6fcf54c7ac1f6b1f85777b99eb |
| Request ID | req_1779719262_32 |
| Review record | 8aae4ca3-809e-4e23-bd46-d7d4cb46713a |

What Improved

  • Prevents declared non_goals: / excludes: / out-of-scope: sections in free-form goals from being treated as active unsafe scope during safe-default finalization.
  • Adds regression coverage for inline non-goal bodies, multiline/bulleted bodies, indented labelled body lines, active unsafe prose after inline headers, and repeated exclusion sections.

Issue Requirements

Requirement Status
No linked issue / PR body requirement captured N/A

Prior Findings Status

Prior blocker #1 from the first review is withdrawn: current src/ouroboros/auto/safe_defaults.py:623 only keeps skipping after an empty non-goal header, so inline non-goal bodies no longer swallow following active prose; tests/unit/auto/test_safe_defaults_prompt_non_goals.py:168 covers that regression.

Prior blocker #1 from the second review is withdrawn: current src/ouroboros/auto/safe_defaults.py:632 checks indentation before section-header detection, so indented labelled body lines like deploy: production remain scoped under the non-goal block; tests/unit/auto/test_safe_defaults_prompt_non_goals.py:87 covers that shape.

Blockers

No in-scope blocking findings remained after policy filtering.

Follow-up Findings

# File:Line Priority Confidence Suggestion
None.

Non-blocking Suggestions

None.

Test Coverage Notes

  • Reviewed tests/unit/auto/test_safe_defaults_prompt_non_goals.py and adjacent safe-default tests in tests/unit/auto/test_interview_pipeline.py.
  • Could not run pytest: /usr/bin/python3: No module named pytest.
  • Direct import also required missing structlog; with a stubbed structlog, manually exercised the prior blocker cases and confirmed active deploy prose is preserved while indented labelled non-goal body lines are stripped.

Design Notes

The approach is scoped to the unsafe-context gate and preserves the existing ledger rule that NON_GOAL entries are exclusions, not active scope. The helper now fails closed for unindented active prose while allowing structurally clear exclusion sections.

Design / Roadmap Gate

Affected boundary: AutoInterviewDriver passes persisted state.goal into finalize_safe_defaultable_gaps(), which calls _unsafe_context_reason() before defaulting unresolved Seed sections. Current HEAD preserves active unsafe goal text, ledger-derived unsafe values, and user interview answers, while excluding confirmed non-goal sections and policy-authored defaults. The prior section-boundary regressions appear covered by source and tests.

Directional Notes

Maintainer focus was on the safe-default runtime boundary: caller-controlled prompt text must not hide active unsafe instructions, but explicit exclusions also must not block benign autonomous closure. Memory was advisory only; no blocker is raised from memory.

Test Coverage

  • Reviewed tests/unit/auto/test_safe_defaults_prompt_non_goals.py and adjacent safe-default tests in tests/unit/auto/test_interview_pipeline.py.
  • Could not run pytest: /usr/bin/python3: No module named pytest.
  • Direct import also required missing structlog; with a stubbed structlog, manually exercised the prior blocker cases and confirmed active deploy prose is preserved while indented labelled non-goal body lines are stripped.

Merge Recommendation

APPROVE. I found no current blocking runtime or contract issue in the changed safe-default sanitizer. Test execution is limited by missing local dependencies, but the added tests and manual smoke checks cover the previously failing boundary cases.

Review-Metadata:
verdict: APPROVE
head_sha: e8637ba
request_id: req_1779719262_32
review_profile: memory-aware-zero-trust-v2
advisory_memory_only: true


Reviewed by ouroboros-agent[bot] via Codex deep analysis

@shaun0927
Copy link
Copy Markdown
Collaborator Author

Merge-readiness rationale (English)

This PR is ready to merge. Here is the chain of reasoning, tied back to issue #961's Track B (ooo auto self-healing) and to the bot review history on this PR.

What the PR does

AutoInterviewDriver passes the persisted free-form state.goal into finalize_safe_defaultable_gaps(), which calls _unsafe_context_reason() before defaulting any unresolved Seed slots. That matcher already had a ledger-level rule: confirmed NON_GOAL entries are exclusions, not active unsafe scope. But callers that pre-declare their exclusions inside the free-form goal string (handoff prompts, scripted ooo auto --skip-run, bundled seven-slot bodies) leak those exclusion words into the matcher before the interview can structure them into the ledger. The matcher then trips on the user's own exclusion text and emits auto.interview.safe_default.unsafe_context_match pattern_name='ambiguous external side effect', blocking the run.

This change introduces _strip_prompt_non_goal_sections() as a line-anchored pre-pass alongside _strip_negated_clauses and the existing NFKC normalization. It strips only the body of explicit non_goals: / excludes: / out-of-scope: sections, terminated conservatively (blank line, another labelled <name>: header, or unindented prose). Constraints / actors / inputs / outputs sections are intentionally untouched, so an active constraints: must deploy to production still trips the matcher.

Why it aligns with the SSOT direction

  • Track B's roadmap explicitly tracks post-merge / adjacent safe-default fixes as follow-ups outside Track C tier gates (issue Meta SSOT: AgentOS roadmap sequencing (#920–#960) #961 status table records fix(auto): honor prompt-declared non_goals in unsafe-context matcher #1221 as "approved/clean/green"). This is exactly such a follow-up.
  • It is the string-level counterpart of an already-shipped ledger-level invariant — same semantics, different surface — so it does not introduce a second safety policy.
  • Scope is the matcher input only. The unsafe-pattern bank, the NFKC normalization, the negation stripper, and the _unsafe_ledger_values / _interview_answers projections are all unchanged.
  • It does not overlap fix(auto): close safe-default synthesis ack #1220 (which closes a different failure path — synthesis-ack non-closure — that only triggers after unsafe_gaps is empty). Both are needed to unblock L3-style downstream handoffs end-to-end.

Why it is not over-engineered

  • 2 files changed: src/ouroboros/auto/safe_defaults.py (+111/-4) and a single new test module (+206). No new public API, no CLI surface, no flag, no migration.
  • The two iterations the bot drove (first inline-body, then indented-labelled-body) tightened the section-boundary detection; both regressions are pinned with explicit tests.
  • Behaviour is fail-closed: anything that doesn't structurally look like a non-goal section body still flows to the matcher.

Why it is mergeable

  • reviewDecision: APPROVED on commit e8637ba.
  • Latest ouroboros-agent verdict: APPROVE with no blocking findings, no follow-up findings, no non-blocking suggestions, and clean Design / Roadmap Gate notes.
  • Both prior blockers (inline-body unsafe-prose preservation; indented-labelled non-goal continuation) are withdrawn against current HEAD with regression tests at tests/unit/auto/test_safe_defaults_prompt_non_goals.py:168 and :87.
  • All 8 required checks green (Ruff, MyPy, Bridge TS, enforce-envelope, enforce-boundary, Tests 3.12/3.13/3.14).
  • mergeStateStatus: CLEAN, mergeable: MERGEABLE.
  • 153 existing safe-default + interview-pipeline tests still pass; 17 new tests pass.

Risk assessment

  • Runtime / persistence: untouched.
  • Replay: untouched (matcher input is recomputed per call, not persisted).
  • Backwards compatibility: a goal string that previously tripped only because of its own non-goals declaration now passes the matcher. That is the intended bugfix. A goal that contained active unsafe prose still trips.
  • Failure mode if the helper misbehaves: matcher sees the original string (helper returns input on error), so worst case is the pre-fix behaviour, not a new bypass.

Recommending merge.

@shaun0927
Copy link
Copy Markdown
Collaborator Author

PR Review Summary

Posted via /oh-my-claudecode:pr-review skill — evidence-backed verdict.

Verdict

Approve

Scope Reviewed

  • PR intent: prevent caller-declared non_goals: / excludes: / out-of-scope: sections in the free-form goal string from being treated as active unsafe scope by _unsafe_context_reason() during safe-default finalization.
  • Main changed areas: src/ouroboros/auto/safe_defaults.py (+111/-4) introducing _strip_prompt_non_goal_sections() as a pre-pass before NFKC normalization in _unsafe_context_reason; tests/unit/auto/test_safe_defaults_prompt_non_goals.py (+206, new module, 17 tests).
  • Tests reviewed: the new module plus parity sanity-check against tests/unit/auto/test_safe_default_observability.py, tests/unit/auto/test_safe_defaults_domain_profile.py, tests/unit/auto/test_interview_pipeline.py (all unchanged but co-asserted to still pass).
  • Checks considered: Ruff Lint, MyPy, Bridge TypeScript, enforce-envelope, enforce-boundary, Tests Python 3.12 / 3.13 / 3.14, ouroboros-agent verdict on e8637ba.

Blocking Issues

None.

Warnings

None.

Mutation-Test Thinking

  • Likely mutants the tests would kill:
    • Comparison/condition inversion in _strip_prompt_non_goal_sectionsskipping = not line[header_match.end():].strip() flipped to skipping = bool(...) would let inline-body headers swallow following active prose; killed by test_strip_preserves_unindented_active_scope_after_inline_header and test_active_scope_after_inline_non_goals_still_trips_unsafe_matcher.
    • Reordering the indented-body check vs _PROMPT_SECTION_HEADER.match(line) so the section-header detector runs first — would re-expose the indented-labelled-body regression that the second review iteration flagged; killed by test_strip_handles_indented_labelled_body_lines.
    • Replacing if not line.strip(): skipping = False with pass — would let a blank-separated narrative section continue to be skipped; killed by test_strip_terminates_on_blank_line.
    • Removing the trailing-colon requirement from _PROMPT_NON_GOAL_HEADER — would strip ordinary prose mentioning "non-goals"; killed by test_strip_leaves_inline_prose_mention_alone.
    • Removing the constraints: carve-out (i.e., treating the helper as stripping every labelled section) — killed by test_constraints_section_with_active_deploy_still_trips_matcher.
    • Removing the sanitized_goal substitution in _unsafe_context_reason — killed by test_prompt_non_goals_section_does_not_trip_unsafe_matcher and test_multiple_non_goal_sections_are_each_stripped.
  • Mutants tests may not catch:
    • Returning the input string unchanged when an exception is raised inside the body loop (no test injects a pathological input). Low risk because the loop has no exception sites: it only does regex .match() / .search() and string slicing.
    • A regression in _PROMPT_SECTION_HEADER's 40-char label cap is not directly asserted (e.g., a label longer than 40 chars would not be recognised as the next section). Not in scope of stated intent.
  • Additional tests recommended: none for merge gating. Optional hardening: a parametrized case for label boundaries (e.g., 40-char vs 41-char labels) if the helper is reused outside this matcher.

Complexity / CRAP-style Risk

  • High-risk functions/modules: _strip_prompt_non_goal_sections is the only new function. Cyclomatic ≈ 7 (header match, skipping branch, blank line, indented line, section header, list item, fallthrough). Linear single-pass over splitlines; no nested loops; no I/O; no mutation of shared state.
  • Complexity increase: bounded. The helper is local, dependency-free, and exclusively a string transformation. _unsafe_context_reason itself only adds one assignment (sanitized_goal = …).
  • Test coverage concern: branch coverage on the helper is high — 17 tests pin (a) seven header variants, (b) inline body, (c) bullet-list body, (d) indented labelled body, (e) blank-line termination, (f) inline-then-prose preservation, (g) ordinary prose untouched, (h) idempotency, (i) integration with _unsafe_context_reason across four shapes plus the multi-section case.
  • Refactoring recommendation: none. The helper is at the right size; extracting further would obscure the per-line state machine.

Test Quality Assessment

  • Strong tests: integration cases call the real _unsafe_context_reason and assert on the exact ambiguous external side effect string — that pins both the matcher contract and the helper's exclusion. The test_active_goal_deploy_phrase_still_trips_unsafe_matcher baseline is essential because it proves the fix does not silently weaken the gate.
  • Weak tests: none observed.
  • Missing edge cases: a single benign omission — the helper's behaviour on a CRLF-line-ending input is not asserted; Python's str.splitlines() handles it, but a future refactor could regress this without a test. Not blocking.
  • Mocking concerns: none. Tests use the real SeedDraftLedger.from_goal() fixture; no patches.

Security / Operational Risk

  • Auth/authz: untouched.
  • Data: untouched (in-memory string transformation only).
  • Side effects: none — _strip_prompt_non_goal_sections is pure.
  • Replay / persistence: matcher input is recomputed per call, not persisted. Resuming a session does not depend on this helper's output being stable across runs.
  • Failure mode: if the helper ever raised, the caller would propagate; but the loop body raises nothing. Pre-fix behaviour (no sanitization) is the safe fallback because the only effect is to less aggressively block — and the fix's whole point is that the pre-fix behaviour overblocks legitimate input. There is no path by which a helper bug introduces a new bypass of active unsafe prose; the integration tests pin that explicitly.
  • Observability: existing auto.interview.safe_default.unsafe_context_match log call is unchanged.

Looks Good

  • Aligns with the existing ledger-level NON_GOAL exclusion rule documented in _unsafe_context_reason — same semantic, different surface.
  • Two review iterations tightened section boundaries; both regressions (inline-body swallowing active prose, indented-labelled-body misclassification) are pinned with regression tests and the bot withdrew both blockers on current HEAD.
  • Scope is minimal — no new public API, no flag, no config knob, no migration.
  • All CI checks green; mergeable; clean against main.
  • Aligns with issue Meta SSOT: AgentOS roadmap sequencing (#920–#960) #961 Track B as a narrow post-merge follow-up outside Track C tier gates.

Final Recommendation

APPROVE. The PR is a fail-closed string-level companion to an already-shipped ledger-level invariant. The helper's section-boundary logic has been hardened across two review rounds with explicit regression tests, and the integration tests both prove the new behaviour and pin that active unsafe prose still trips the gate. No blocking findings, no warnings.

Review-Metadata:
verdict: APPROVE
skill: oh-my-claudecode:pr-review
head_sha: e8637ba

Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

Metadata

| Field | Value |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.

---|---|
| PR | #1221 |
| HEAD checked | e8637baa1abd9b6fcf54c7ac1f6b1f85777b99eb |
| Request ID | req_1779721799_1 |
| Review record | f6cbc35f-8b79-4fef-a3d1-9ddf7c71b468 |

What Improved

  • Adds string-level handling so declared non_goals: / excludes: / out-of-scope: goal sections are not treated as active unsafe scope before the interview ledger can classify them.
  • Adds regression tests for inline non-goal bodies, bulleted bodies, indented labelled body lines, active unsafe prose after inline headers, and repeated exclusion sections.

Issue Requirements

Requirement Status
Keep declared prompt non-goals from tripping unsafe-context matching Partially met
Preserve unsafe-context blocking for active deploy/release/publish intent outside non-goal sections Partially met
No material PR-body requirement text captured N/A

Prior Findings Status

Prior blocker about inline non-goal headers swallowing following unindented active prose is withdrawn: current code only keeps skipping after an empty header, and the test at tests/unit/auto/test_safe_defaults_prompt_non_goals.py:168 covers it. Prior blocker about indented labelled body lines being preserved is withdrawn: current code strips indented body lines, with coverage at tests/unit/auto/test_safe_defaults_prompt_non_goals.py:87. The latest prior approval is modified by the new blocker above: same-indentation or uniformly indented labelled sections can still be hidden.

Blockers

# File:Line Severity Finding
1 src/ouroboros/auto/safe_defaults.py:632 BLOCKING _strip_prompt_non_goal_sections() drops every indented line inside a non-goal block before checking whether that line is actually the next labelled section. Because _PROMPT_NON_GOAL_HEADER explicitly accepts leading whitespace, uniformly indented handoff bodies are in scope, but a prompt like non_goals:\n credentials: customer secrets\n constraints: must deploy to production after merging strips the indented constraints: line and _unsafe_context_reason() returns None. This hides active unsafe scope from the safe-default gate that AutoInterviewDriver feeds with caller-controlled state.goal, allowing autonomous default finalization where it should block. Track the indentation level of the non-goal header, or otherwise preserve same/lower-indent labelled section headers, and add a regression for an indented seven-slot prompt with unsafe constraints:.

Follow-up Findings

# File:Line Priority Confidence Suggestion
None.

Non-blocking Suggestions

| None. |

Test Coverage Notes

  • Reviewed tests/unit/auto/test_safe_defaults_prompt_non_goals.py and adjacent safe-default/interview tests.
  • Could not run pytest: /usr/bin/python3: No module named pytest.
  • Direct import required missing structlog; with a small structlog stub and PYTHONPATH=src, I manually confirmed the indented prompt case sanitizes down to only the goal line and returns None from _unsafe_context_reason().

Design Notes

The string-level exclusion matches the existing ledger NON_GOAL policy, but the parser needs indentation-aware section boundaries because it sits directly in the autonomous safe-default safety gate.

Design / Roadmap Gate

Affected boundary: AutoInterviewDriver passes persisted state.goal into finalize_safe_defaultable_gaps(), which calls _unsafe_context_reason() before defaulting unresolved Seed sections. The new sanitizer is recomputed at runtime and not persisted, but it can remove active unsafe sections from the matcher input. Compatibility with indented handoff/bundled prompt formats is therefore part of the safety contract, not just formatting tolerance.

Directional Notes

Review focus was the runtime safety boundary for caller-controlled prompt text. Memory was advisory only; the blocker is based on current source behavior in the changed helper.

Test Coverage

  • Reviewed tests/unit/auto/test_safe_defaults_prompt_non_goals.py and adjacent safe-default/interview tests.
  • Could not run pytest: /usr/bin/python3: No module named pytest.
  • Direct import required missing structlog; with a small structlog stub and PYTHONPATH=src, I manually confirmed the indented prompt case sanitizes down to only the goal line and returns None from _unsafe_context_reason().

Merge Recommendation

REQUEST_CHANGES until the non-goal stripper preserves later labelled sections in uniformly indented prompts and covers that fail-closed case with a regression test.

Review-Metadata:
verdict: REQUEST_CHANGES
head_sha: e8637ba
request_id: req_1779721799_1
review_profile: memory-aware-zero-trust-v2
advisory_memory_only: true


Reviewed by ouroboros-agent[bot] via Codex deep analysis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants