feat(auto): additive assumption_sources provenance surface (PR-C2)#1169
Conversation
PR-C2 of the L4 Auto Envelope v2 freeze (Q00#1157, Q00#821). ## Summary Add an auditable companion to ``AutoPipelineResult.assumptions`` so callers can distinguish *what the system assumed for them* from *what the user / repo confirmed*. The existing string-only ``assumptions`` field is unchanged — this is a strictly additive new field. Each ``AssumptionRecord`` is a frozen dataclass carrying: - ``text`` — the assumption text - ``source`` — one of ``"assumption"`` (auto-answerer fallback), ``"inference"`` (model reasoning), ``"conservative_default"`` (safe-default policy). These are the three assumption-class ``LedgerSource`` values that, per ``_EVIDENCE_BACKED_SOURCES``, do *not* count as evidence-backed and therefore land in ``assumption_only_sections``. - ``confidence`` — per-entry confidence as recorded by the ledger. Why this is bigger than ``assumptions``: ``assumptions()`` filters to ``LedgerSource.ASSUMPTION`` only (backwards-compatible). The new ``assumption_sources()`` broadens to *all three* assumption-class sources so the surface answers the actual user question ("which of these did the system make up?") rather than just the narrow auto-answerer-fallback subset. ## Scope - ``src/ouroboros/auto/ledger.py`` - New ``AssumptionRecord`` frozen dataclass. - New ``SeedDraftLedger.assumption_sources()`` method using the same inactive-status and dedupe semantics as ``_values_for_sources``. - ``src/ouroboros/auto/pipeline.py`` - Import ``AssumptionRecord``. - Add ``assumption_sources: tuple[AssumptionRecord, ...] = ()`` to ``AutoPipelineResult`` next to ``assumptions``. - Populate in ``_result()`` via ``ledger.assumption_sources()``. - ``skills/auto/SKILL.md`` - New "Assumption-source provenance" subsection documenting the field shape and source vocabulary. - Tests: - ``tests/unit/auto/test_ledger_grading_answerer.py``: three new tests covering all three source kinds, inactive/evidence-backed exclusion, and same-text dedupe across sections. - ``tests/unit/auto/test_interview_pipeline.py``: one new end-to-end pipeline test that exercises ``AutoPipelineResult`` surface, asserts the legacy ``assumptions`` field is unchanged in scope, and asserts the new ``assumption_sources`` carries the source tag intact for each of the three source kinds. - ``tests/unit/auto/test_pipeline_lateral.py`` and ``tests/unit/auto/test_pipeline_evaluate.py``: extend the ``_StubLedger`` test doubles with the new ``assumption_sources`` method so they remain compatible with the additive ``_result()`` consumer. No schema change, no manifest change, no breaking change to existing ``assumptions`` callers. CLI/MCP rendering of ``assumption_sources`` is intentionally deferred to a follow-up — this PR plumbs the envelope only. ## Test plan - ``uv run pytest tests/unit/auto/test_ledger_grading_answerer.py tests/unit/auto/test_interview_pipeline.py -k "assumption_sources or assumption_class or surfaces_assumption" -q`` → 4 passed - ``uv run pytest tests/unit/auto tests/integration/auto -q`` → 912 passed (baseline 908 + 4 new) - ``uv run ruff check`` on touched files → clean - ``uv run ruff format`` on touched files → no changes - ``uv run mypy src/ouroboros/auto/ledger.py src/ouroboros/auto/pipeline.py`` → clean Refs Q00#1157 (L4 lane), Q00#821 (autonomy acceptance matrix), Q00#1146 (PR-C ``defaulted_sections``), Q00#1148 (PR-B1 ``ledger_only``), Q00#1151 (PR-E ``stop_reason_code``), Q00#1167 (PR-B2 ``safe_default`` closure). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: APPROVE
Reviewing commit
9f84af1for PR #1169
Review record:
86f88141-9bff-4cb4-a3a8-42134df38f59
Blocking Findings
No in-scope blocking findings remained after policy filtering.
Non-blocking Suggestions
None.
Design Notes
Unable to complete the review: every filesystem command failed before execution with bwrap: No permissions to create a new namespace. I could not read /tmp/pr_diff_1169.patch, the changed-files list, comments, or source files, so I cannot make a defensible code assessment.
Recovery Notes
First recoverable review artifact generated from codex analysis log.
Reviewed by ouroboros-agent[bot] via Codex deep analysis
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
PR #1169
Branch: feat/auto-assumption-source-provenance | 7 files, +324/-5 | CI: Bridge TypeScript pass 13s https://github.com/Q00/ouroboros/actions/runs/26264268510/job/77304138424
Scope: architecture-level
HEAD checked: 9f84af16840cdbdec94b35054ce3ccfa4a7bbba2
What Improved
- Added
AssumptionRecordandSeedDraftLedger.assumption_sources()so ledger entries fromassumption,inference, andconservative_defaultcan retain per-entry provenance. - Populated
AutoPipelineResult.assumption_sourcesfrom the ledger while keeping the legacyassumptionsstring tuple scoped toLedgerSource.ASSUMPTION. - Added focused unit coverage for ledger filtering, dedupe behavior, and pipeline result population.
Issue #N/A Requirements
| Requirement | Status |
|---|---|
Add auditable companion to AutoPipelineResult.assumptions |
Partially met: Python result field exists and is populated. |
| Let callers distinguish system assumptions from confirmed user/repo facts | Not met at MCP boundary: metadata callers cannot observe assumption_sources. |
Preserve legacy assumptions behavior |
Met by ledger and pipeline tests. |
| Cover newly added logic with meaningful tests | Partially met: ledger and pipeline are covered, MCP contract is not. |
Prior Findings Status
| Prior Finding | Status |
|---|---|
| Prior review context | MODIFIED — No prior review artifact was provided in this audit context; no prior concerns could be maintained, modified, or withdrawn. |
Blockers
| # | File:Line | Severity | Confidence | Finding |
|---|---|---|---|---|
| 1 | src/ouroboros/mcp/tools/auto_handler.py:1273 | High | 92% | The new provenance field is computed on AutoPipelineResult but never crosses the MCP metadata boundary: _result_meta() emits ledger_provenance, defaulted_sections, evidence_backed_sections, and assumption_only_sections, then returns without serializing result.assumption_sources. A focused runtime check with an AutoPipelineResult(..., assumption_sources=(AssumptionRecord(...),)) produced metadata with no assumption_sources key. This leaves MCP clients unable to consume the new PR-C2 contract and defeats the stated goal of letting callers distinguish system-made assumptions from confirmed facts. |
Follow-ups
| # | File:Line | Priority | Confidence | Suggestion |
|---|---|---|---|---|
| — | — | — | — | None. |
Test Coverage
- Ran
uv run pytest tests/unit/auto/test_ledger_grading_answerer.py tests/unit/auto/test_interview_pipeline.py -k "assumption_sources or surfaces_assumption" -q: 4 passed. - Ran
uv run pytest tests/unit/auto/test_surface.py -k "defaulted_sections or ledger_provenance or auto_handler_meta_exposes_auto_progress_fields" -q: 4 passed. - Coverage is missing for the affected MCP consumer contract: no test asserts
_result_meta()exposesassumption_sourcesas JSON-compatible records.
Design / Roadmap Gate
Affected-boundary reasoning: this PR changes an envelope field intended for callers, so the review boundary includes AutoPipelineResult producers and downstream consumer surfaces, not just ledger internals. The ledger and pipeline boundary now carries provenance, but the MCP result metadata boundary drops it before clients can consume it. Because MCP metadata already exposes related envelope fields unconditionally, omitting assumption_sources creates an inconsistent contract and blocks the stated provenance use case for MCP consumers.
Merge Recommendation
Post-merge audit recommendation: patch current HEAD to serialize assumption_sources through MCP metadata as a stable JSON-compatible list of {text, source, confidence} records, and add a test_surface.py contract test proving non-empty and empty cases.
Review-Metadata:
verdict: REQUEST_CHANGES
github_event: COMMENT
review_kind: post_merge_audit
merge_eligible: false
head_sha: 9f84af1
source_read_ok: true
diff_read_ok: true
blocking_count: 0
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
PR #1169
Branch: feat/auto-assumption-source-provenance | 7 files, +324/-5 | CI: Bridge TypeScript pass 13s https://github.com/Q00/ouroboros/actions/runs/26264268510/job/77304138424
Scope: architecture-level
HEAD checked: 9f84af16840cdbdec94b35054ce3ccfa4a7bbba2
What Improved
- Added a frozen
AssumptionRecordand ledger-levelassumption_sources()surface for the three assumption-class sources. - Populated
AutoPipelineResult.assumption_sourcesfrom the ledger while preserving legacyassumptions. - Added focused unit coverage for ledger filtering/dedupe and the internal pipeline result field.
Issue #N/A Requirements
| Requirement | Status |
|---|---|
Add auditable companion to AutoPipelineResult.assumptions |
Partially met: internal Python result field is populated. |
Include text, source, and confidence per assumption record |
Met internally via AssumptionRecord. |
Cover assumption, inference, and conservative_default sources |
Met in ledger and internal pipeline tests. |
Preserve legacy assumptions behavior |
Met by keeping assumptions() filtered to LedgerSource.ASSUMPTION. |
| Make the new envelope usable by callers/clients | Not met for MCP clients because assumption_sources is omitted from _result_meta(). |
Prior Findings Status
| Prior Finding | Status |
|---|---|
| Prior review context | MODIFIED — No prior review findings were present in the provided artifacts; no prior concerns were maintained, modified, or withdrawn. |
Blockers
| # | File:Line | Severity | Confidence | Finding |
|---|---|---|---|---|
| 1 | src/ouroboros/mcp/tools/auto_handler.py:1270 | High | 90% | The new provenance field is not exposed through the MCP result contract. _result_meta() serializes adjacent envelope fields (defaulted_sections, evidence_backed_sections, assumption_only_sections) but never includes result.assumption_sources; AutoHandler.handle() returns only _format_result() text plus this meta, and _format_result() still emits only legacy result.assumptions. As a result, MCP clients cannot observe the newly added inference / conservative_default provenance even though the PR goal is to let callers distinguish system-made assumptions from confirmed facts and skills/auto/SKILL.md:99 documents result.assumption_sources as available. |
Follow-ups
| # | File:Line | Priority | Confidence | Suggestion |
|---|---|---|---|---|
| — | — | — | — | None. |
Test Coverage
- Ledger tests cover all three assumption-class sources, inactive/evidence-backed exclusion, and same-text dedupe.
- Pipeline test covers the internal
AutoPipelineResult.assumption_sourcesfield and legacyassumptionscompatibility. - Missing coverage at the consumer boundary: no MCP
_result_meta()/AutoHandler.handle()test assertsassumption_sourcesis serialized for clients.
Design / Roadmap Gate
The ledger and pipeline internals are coherent, but the affected boundary is the auto result contract consumed by MCP clients. Existing envelope peers are explicitly projected into MCPToolResult.meta, while the new field stops at the in-process dataclass. Because MCP callers do not receive raw AutoPipelineResult objects, this leaves the advertised provenance surface inaccessible at the primary client boundary.
Merge Recommendation
Post-merge audit recommendation: treat current HEAD as still incomplete for PR-C2 until assumption_sources is serialized in MCP metadata with regression coverage.
Review-Metadata:
verdict: REQUEST_CHANGES
github_event: COMMENT
review_kind: post_merge_audit
merge_eligible: false
head_sha: 9f84af1
source_read_ok: true
diff_read_ok: true
blocking_count: 0
Summary
PR-C2 of the L4 Auto Envelope v2 freeze (#1157, #821).
Add an auditable companion to
AutoPipelineResult.assumptionsso callers can distinguish what the system assumed for them from what the user / repo confirmed. The existing string-onlyassumptionsfield is unchanged — this is a strictly additive new field.Each
AssumptionRecordis a frozen dataclass carrying:text— the assumption textsource— one of"assumption"(auto-answerer fallback),"inference"(model reasoning),"conservative_default"(safe-default policy). These are the three assumption-classLedgerSourcevalues that, per_EVIDENCE_BACKED_SOURCES, do not count as evidence-backed and therefore land inassumption_only_sections.confidence— per-entry confidence as recorded by the ledger.Why
assumption_sourcesis broader thanassumptions:assumptions()filters toLedgerSource.ASSUMPTIONonly (backwards-compatible string surface). The newassumption_sources()broadens to all three assumption-class sources so the surface answers the actual user question ("which of these did the system make up?") rather than just the narrow auto-answerer-fallback subset.Why this scope
Closes the documented PR-C2 gap from the #1157 living SSOT. After PR-B2 (#1167) and this PR, the L4 Envelope v2 lane has all five planned envelope fields in place:
stop_reason_codeinterview_closure_modesafe_default)defaulted_sectionsassumption_sourcesWhat is NOT done here
assumptionscallers.assumption_sourcesis intentionally deferred to a follow-up — this PR plumbs the envelope only. The CLI/MCPassumptionsbullet list atsrc/ouroboros/cli/commands/auto.py:748andsrc/ouroboros/mcp/tools/auto_handler.py:1940still works unchanged.assumptions()/_values_for_sources()filter — the existingledger.assumptions().count(\"CLI user\")assertion intest_ledger_grading_answerer.py:1182remains valid (regression-guarded by the newtest_assumption_sources_returns_records_for_all_three_assumption_class_sources).Scope
src/ouroboros/auto/ledger.py: newAssumptionRecordfrozen dataclass; newSeedDraftLedger.assumption_sources()method using the same inactive-status and dedupe semantics as_values_for_sources.src/ouroboros/auto/pipeline.py: importAssumptionRecord; addassumption_sources: tuple[AssumptionRecord, ...] = ()field next toassumptions; populate in_result().skills/auto/SKILL.md: new "Assumption-source provenance" subsection documenting the field shape and source vocabulary.tests/unit/auto/test_ledger_grading_answerer.py: three new tests covering all three source kinds, inactive/evidence-backed exclusion, and same-text dedupe across sections.tests/unit/auto/test_interview_pipeline.py: one new end-to-end pipeline test that asserts the legacyassumptionsfield is unchanged in scope and the newassumption_sourcescarries the source tag intact.tests/unit/auto/test_pipeline_lateral.pyandtests/unit/auto/test_pipeline_evaluate.py: extend the existing_StubLedgertest doubles with the newassumption_sourcesmethod so they remain compatible with the additive_result()consumer.Test plan
uv run pytest tests/unit/auto/test_ledger_grading_answerer.py tests/unit/auto/test_interview_pipeline.py -k "assumption_sources or assumption_class or surfaces_assumption" -q→ 4 passeduv run pytest tests/unit/auto tests/integration/auto -q→ 912 passed (baseline 908 + 4 new)uv run ruff checkon touched files → cleanuv run ruff formaton touched files → no changesuv run mypy src/ouroboros/auto/ledger.py src/ouroboros/auto/pipeline.py→ cleanRefs #1157 (L4 lane), #821 (autonomy acceptance matrix), #1146 (PR-C
defaulted_sections), #1148 (PR-B1ledger_only), #1151 (PR-Estop_reason_code), #1167 (PR-B2safe_defaultclosure).