Meta SSOT: ooo auto Vision — Autonomous Completion Engine

# Meta SSOT: `ooo auto` Vision — Autonomous Completion Engine

> Living SSOT for `ooo auto`'s direction and improvement plan. Stays OPEN
> until the engine reliably completes the canonical end-to-end test
> (e.g. `ooo auto "make me a 2D kart racing game"`) without human
> intervention. Sibling to #961 (AgentOS roadmap sequencing).
>
> **Scope note.** This SSOT is *not* about redesigning Ouroboros's Socratic
> interview / tacit-knowledge substrate — `ooo auto` inherits that as-is.
> What this SSOT owns is *everything around the interview*: domain-aware
> spec inflation, long-running resilience, runtime acceptance, and a
> typed completion contract.
>
> **Substrate honesty.** Four of the five implementation lanes (L0, L1, L3,
> L5) extend existing substrate. One lane — **L2 watchdog v1** — adds
> exactly *one* new EventStore event family (`runtime.watchdog.cancel`)
> as the minimum needed to record *why a run was cancelled at minute X*.
> Earlier drafts proposed a richer 3-timer / 4-directive vocabulary; per
> Ouroboros's minimal-substrate principle, those are deferred to a v2
> expansion path triggered only by evidence of stalls that wall-clock
> alone cannot catch.
>
> **SSOT cleanup note (2026-05-23 KST).** This issue was refreshed after the #1173/#1174/#1175/#1178/#1181/#1188/#1189/#1190/#1191 merge train. Current implementation terms are `TaskClass` / `TaskClassProfile` and `runtime.watchdog.cancel`; older “DomainProfile Catalog”, `runtime.watchdog.decision`, and directive-vocabulary wording is historical unless a section explicitly says it is describing a rejected prior draft. Open cleanup PRs #1194/#1195/#1196 are the remaining known Track-B/`ooo auto` follow-ups; no new issue is needed for those.

## TL;DR — Status at a glance

| Lane | Owner / Anchor | Status | Gate / Next |
|---|---|---|---|
| **L0 — Canonical acceptance test** | #1170 umbrella | 🟡 partially implemented | #1174 merged the canonical harness skeleton; #1191 merged opt-in `OUROBOROS_RUN_CANONICAL=1` live wiring + L1 catalog cross-validation. Remaining work is scenario/evidence cleanup only, especially #1195; no nightly/replay/cost substrate unless later evidence demands it. |
| **L1 — TaskClass Catalog** | #1171 umbrella | 🟡 partially implemented | #1173 merged `TaskClass` / `TaskClassProfile` catalog data; #1188 merged Seed AC injection + active task class envelope. “DomainProfile Catalog” wording below is historical/stale unless explicitly referring to the old recovery-hint concept. Remaining MCP metadata cleanup is #1196. |
| **L2 — Watchdog v1** *(new substrate, minimal)* | #1172 umbrella; lifts #578 | 🟡 partially implemented | #1178 merged `RuntimeControls` + wall-clock `Watchdog`; #1189 merged production `AutoPipeline`/CLI/MCP consumption via `runtime.watchdog.cancel` and typed stop reason. Do not revive the old `runtime.watchdog.decision` / directive-vocabulary design. Remaining cleanup is #1194. |
| **L3 — Runtime Acceptance Substrate** | #1176 umbrella | 🟡 partially implemented | #1181 merged `RuntimeEvidence` + `HeadlessRunProbe`; #1190 merged `AutoPipelineResult.runtime_probe_evidence` + completion-grade `probe_runner` gate. Remaining cleanup is evidence alignment (#1195) and any future real-probe binding/scenario expansion; `sim_trace`/`render_hash`/`api_smoke` stay deferred. |
| **L4 — Auto Envelope v2** | #1146/#1148/#1151/#1167/#1169 all merged | 🟢 complete | All five planned envelope fields landed (`defaulted_sections`, `interview_closure_mode`, `stop_reason_code`, `assumption_sources`, `safe_default` closure). Lane frozen. |
| **L5 — Long-running Resilience** *(minimal)* | extends `ooo unstuck` + Ralph oscillation detection | 🟢 L5-a merged | #1175 merged the `oscillation_detected` → `UNSTUCK_LATERAL` routing slice. Further resilience substrate remains evidence-driven, not prebuilt. |

## North-star

`ooo auto "<one vague line>"` — a single MCP invocation — drives the work to
one of three **typed terminal states** without further user input:

1. **`CODE_COMPLETE`** — passing tests + lint + (for libraries) a usable
   API surface, evidence captured.
2. **`PRODUCT_COMPLETE`** — code-complete *plus* the domain-class
   runtime-acceptance probe passes (headless run / sim / render hash).
3. **`BLOCKED(reason_code)`** — resumable `auto_session_id`, classified
   stop reason from the 7-code taxonomy, *no fabrication, no silent
   abandonment, no untyped failure path*.

Branching between (1) and (2) is decided by **DomainProfile
auto-classification** (L1). The user never has to know which mode applied.

## Why this is bigger than "Seed → Run"

`skills/auto/SKILL.md` currently optimizes:
`Interview → A-grade Seed → Run handoff → (optional) Ralph`.

That's the *engineering ceiling*, not the *product*. The product target
is: **deliver a verifiably-working artifact from a vague one-line goal,
without the user pre-thinking the spec, the verification, or the
recovery.**

Three invariants have to hold for that:

- The system must **infer the domain** without being told.
- The system must **probe runtime behavior**, not just unit tests.
- The system must **refuse to die quietly** on long-running work.

L1–L5 below are exactly the missing pieces that turn the current
Seed→Run pipeline into a completion engine.

## Success conditions (testable, frozen)

We close this SSOT when, against a fixed canonical test matrix
(`ooo auto "CLI todo manager"`, `"2D kart racer"`, `"webhook receiver
service"`, `"refactor src/foo into vertical slices"`), **all** of the
following hold without human intervention in at least one of the goals,
reproducible across two consecutive runs on a clean repo:

| # | Condition | Today | Target |
|---|---|---|---|
| 1 | Interview closes with `closure_mode ∈ {mutual_agreement, ledger_only, safe_default}` | partial | 100% (`genuine_deadlock = 0`) |
| 2 | Seed AC reflects the inferred domain class (e.g. game → game-loop / input / render-target / playable build) | none | 100% via L1 |
| 3 | Long-running run never stalls without a `runtime.watchdog.decision` Directive | ad-hoc | 100% via L2 |
| 4 | Evidence bundle contains *runtime* proof (sim trace / headless run / render hash), not only unit tests | unit-tests only | 100% via L3 |
| 5 | Result envelope exposes `closure_mode`, `defaulted_sections`, `assumptions[source]`, `stop_reason_code` (7-code taxonomy) | partial | 100% via L4 |
| 6 | Sessions resumable across days; same goal lands on same lineage | shipped (#1138 et al.) | hold |
| 7 | On stall / regression, escalation ladder (`unstuck → reframe → safe-default → BLOCKED`) runs to a terminal state | partial | 100% via L5 |

## Lanes

### L0 — Canonical acceptance test *(minimal)*

The acceptance condition in this SSOT (*one canonical goal end-to-end,
reproducible 2x*) needs *something* concrete to point at. L0 provides
the smallest possible thing: a `tests/canonical/` directory with one
fixture per canonical goal and a `pytest` entry point. **The maintainer
runs it manually** when assessing SSOT close-readiness — no CI obligation.

**Scope:**

- `tests/canonical/<slug>/` per goal: `goal.txt` + optional `env/` +
  `expected.yaml` (`domain_class`, `completion_mode`, optional
  `wall_clock_budget_seconds`).
- `tests/canonical/conftest.py` — pytest runner that invokes the
  `ouroboros_auto` MCP tool against the scenario and asserts the
  documented terminal state.
- 4 initial scenarios: `cli-todo`, `webhook-receiver`,
  `vertical-slice-refactor`, `2d-kart-racer` (last requires L3).

**Out of scope (deliberately):** nightly CI workflow, recorded-replay
layer, hermetic-vs-live divergence detection, monthly cost budget,
refresh-rotation ownership policy, per-PR fast-subset CI. All of those
were *operational sludge added by reflex*; each gets opened as a
follow-up only if/when evidence demands it. See #1170 *Self-audit note*.

**Dependency:** none. L0-a can ship today.

**Design tracked in #1170.**

### L1 — DomainProfile Catalog

Promote `#849` Phase-3 DomainProfile from "typed recovery hint" to a
first-class **domain taxonomy** that drives default AC, completion
mode, and runtime-probe binding — *without* a separate LLM classifier.

**Design tracked in #1171.**

Frozen 7-class catalog for L1-a (deferred classes: `game-3d`,
`desktop-app`, `notebook-analysis` — each becomes its own ≤ 10-LoC
follow-up PR):

- `library`, `cli`, `web-service`, `webhook`, `data-pipeline`,
  `game-2d`, `refactor-in-place`.

Each class declares `default_completion_mode`, `default_ac_template`
(plain `tuple[str, ...]` matching `Seed.acceptance_criteria`),
`runtime_probe_kinds` (bound by L3), and the existing #849
`safe_defaults`.

**Domain inference is ledger-derived, not LLM-classified.** The
Socratic interview already extracts structured `SeedDraftLedger`
entries (`actors`, `inputs`, `outputs`, `runtime_context`, …) and
*standardizes them toward canonical vocabulary*. L1-b is a pure-Python
`derive_domain_from_ledger()` function in
`src/ouroboros/auto/domain_inference.py` that pattern-matches those
entries against per-class predicates and returns one of:

- **single match** — exactly one class predicate fires.
- **ambiguous** — multiple classes fire (e.g. CLI + WEB-SERVICE on a
  CLI that also exposes HTTP). The next interview round gets a
  disambiguation-question candidate appended (small hook in
  `interview_driver.py`); the existing ambiguity-gate loop drives
  resolution. *No new escalation system.*
- **unmatched** — no predicate fires. Falls to `library` (narrowest
  completion gate, lowest blast radius) and emits a `domain_unmatched`
  EventStore event for maintainer review.

Zero new LLM calls, zero new external API surface, zero new
substrate. Adding a new class to the catalog later is a ~10-LoC PR
(pattern function + unit test), not an eval-set re-curation.

**Why not a classifier.** An earlier draft proposed a separate Sonnet
classifier with an eval set + accuracy floor + opt-in telemetry. That
duplicated the inference the interview is already doing and violated
this SSOT's own scope note (*"`ooo auto` inherits Ouroboros's
substrate as-is"*). The redesign is documented in the Freshness sync
section under the 2026-05-22 self-audit entry.

### L2 — Watchdog v1 *(minimal new substrate)*

The smallest watchdog that satisfies *"long-running run never stalls without a recorded reason"*: one timer per session (wall-clock), one event (`runtime.watchdog.cancel`), one new stop_reason_code (`watchdog_wall_clock_exceeded`). When the session start time plus `session_wall_clock_seconds` exceeds *now*, the watchdog fires, the EventStore records it, and the pipeline transitions to BLOCKED with the typed code. Timer state is implicit in `AutoPipelineState.session_started_at`, so resume semantics work without separate serialization.

**Substrate addition:** exactly one new EventStore event family
(`runtime.watchdog.cancel`) and one new `aggregate_type = "runtime_control"`. Confirmed additive to projection v1 by reading `src/ouroboros/persistence/schema.py` (informational confirmation posted on #946). The other four lanes in this SSOT light up existing substrate; L2 adds this single family.

**v2 expansion (deferred, evidence-driven):** richer 3-timer config
(`idle` / `no_progress` / `safety`), 4-directive vocabulary
(`WAIT` / `RETRY` / `UNSTUCK` / `CANCEL`), `material_progress_events` vs
`activity_events` split, subscriber pattern for cooperative cancel,
ad-hoc-timeout deprecation across MCP / Ralph / evolve. Each opens as
its own slice only when a real-world stall slips past v1 wall-clock.

**Design tracked in #1172** (this issue lifts #578 to v1 minimum + documents v2 expansion path).

### L3 — Runtime Acceptance Substrate

Extend Track A fat-harness (#920 / #978) evidence schema to legally
accept **non-test evidence**:

- headless run logs (`stdout`, `exit_code`, `duration`)
- deterministic simulation traces (N-tick sim + golden-state diff)
- screenshot / DOM-hash / render-hash for UI classes
- API smoke probes (request → response shape match) for service classes

DomainProfile (L1) binds each class to one or more probes. This is the
substrate change that makes `PRODUCT_COMPLETE` mean *"the thing
actually runs"*, not *"tests pass"*.

**Minimal-substrate audit pending.** Per the L0/L1/L2/L5 self-audits
(2026-05-22), L3 has not yet been re-examined through the same
minimal-substrate lens. *Before* opening the L3 design issue, ask:
which evidence kinds does v1 *actually* need, and which are
speculative? Likely v1 collapses to **`headless_run` only** (capture
stdout/exit_code/duration); `sim_trace`, `render_hash`, and `api_smoke`
each open as their own follow-up only when a canonical scenario
demands them. This audit happens when L3's design-issue PR is drafted,
not in this lane body.

**Track A collision warning.** L3's verifier-integration slice (L3-d)
modifies the same `src/ouroboros/orchestrator/` evidence-handling
surface that Track A verifier follow-ups #1165 / #1166 / #1168 are
currently active in. Sequencing rule:

1. Land #1165 / #1166 / #1168 (or their successors) on `main` first.
2. Open L3-a (evidence-kind taxonomy) only after Track A queue is
   drained — taxonomy is pure-additive and safe regardless, but it
   becomes the authoritative shape downstream verifier work conforms to.
3. Open L3-d (verifier integration) *last* in the L3 sequence so
   it conflicts with at most a single fresh `main`.

### L4 — Auto Envelope v2

A single frozen v2 contract consumed identically by CLI, MCP, and any
future UI. **🟢 Complete** — all five planned envelope fields landed:

| Field | Source | Status |
|---|---|---|
| `defaulted_sections[]` | #1146 | 🟢 merged |
| `interview_closure_mode` (`mutual_agreement` / `ledger_only` / `safe_default`) | #1148 + #1167 | 🟢 merged |
| `stop_reason_code` (8-code) | #1151 + #1167 (`interview_unsafe_gaps_remain`) | 🟢 merged |
| safe-default finalization on `ledger_done=False` (PR-B2) | #1167 | 🟢 merged |
| `assumption_sources[]` with `AssumptionRecord` provenance (PR-C2) | #1169 | 🟢 merged |

Lane frozen. Becomes the canonical shape used by L1 / L2 / L3 / L5
downstream.

### L5 — Long-running Resilience *(minimal, existing substrate only)*

The *"refuse to die quietly"* invariant — but built on what already exists,
not as new substrate.

What exists today:

- `ooo unstuck` (lateral persona swap) already runs when invoked.
- Ralph already emits `oscillation_detected` / `grade_regressing` as
  `stop_reason_code` values (#1151).
- Safe-default closure mode already exists for interview (#1167).
- Terminal taxonomy invariant already enforced by L4's 8-code envelope.

What L5 v1 actually adds (the only missing link):

- **L5-a** — when Ralph emits `oscillation_detected` *during a single
  `ooo auto` session*, automatically invoke `ooo unstuck` once before
  bailing. ~50 LoC + integration test.
- **L5-b** — when `ooo unstuck` exhausts its budget (default: 1 attempt),
  emit a typed `stop_reason_code="unstuck_exhausted"` (new 10th code) so
  the result envelope distinguishes *"tried unstuck and failed"* from
  *"never tried"*. ~50 LoC.

**Out of scope (v1):** new escalation-ladder state machine, new
oscillation-detector substrate, budget unification with L2, reframe
(ontologist) as a separate stage. Each can be added later if/when
evidence shows the v1 plumbing is too thin.

**Total: ~100 LoC across 2 sub-PRs.** Earlier draft was ~600 LoC of new
state-machine substrate that duplicated existing detection signals.

## Lane dependency graph

The lanes are not independent — three real dependencies and one
collision risk constrain the ordering. Reading this graph before
opening a lane PR avoids both *paper completion* (L0 invariant) and
the Track A collision (L3 warning above).

```
                  ┌──────────────────────────────────────┐
                  │  L0 — Canonical Test Harness         │
                  │  (meta-lane: gates every other lane) │
                  └──────────────┬───────────────────────┘
                                 │ acceptance invariant
              ┌──────────────────┼──────────────────┐
              ▼                  ▼                  ▼
       ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
       │  L1         │    │  L2         │    │  L4         │
       │ DomainProf. │    │ RuntimeCtl. │    │ Envelope v2 │
       │ Catalog     │    │ v1 (new sub)│    │ 🟢 complete │
       └──────┬──────┘    └──────┬──────┘    └─────────────┘
              │                  │
              │ probe binding    │ UNSTUCK/CANCEL
              │                  │ directive hooks
              ▼                  ▼
       ┌─────────────┐    ┌─────────────┐
       │  L3         │    │  L5         │
       │ Runtime Acc.│    │ Long-running│
       │ Substrate   │    │ Resilience  │
       └──────┬──────┘    └──────┬──────┘
              │                  │
              │ runtime evidence │ typed terminals
              └────────┬─────────┘
                       ▼
              ┌─────────────────────┐
              │ Canonical matrix    │
              │ 1+ goal end-to-end  │
              │ × 2 reproducible    │
              └─────────────────────┘
                       │
                       ▼
                 SSOT #1157 close
```

**Hard dependencies (a PR opened upstream of an arrow cannot claim
done until the downstream consumer is at least design-locked):**

- **L1 → L3.** L3 evidence probes are bound per DomainProfile class.
  L1-a (catalog data) must land before L3-c (probe binding).
- **L2 → L5.** L5 escalation ladder hooks into L2's
  `UNSTUCK / CANCEL` directives. L5-a (state machine) can start
  before L2 lands; L5-c (watchdog integration) cannot.
- **L0 → everyone.** Every other lane's "complete" claim is
  validated by the canonical matrix runner. L0 doesn't block lane
  *implementation* — it blocks lane *completion claims*.

**Soft dependency / collision risk:**

- **L3 ↔ active Track A verifier follow-ups (#1165 / #1166 /
  #1168).** Same `src/ouroboros/orchestrator/` evidence-handling
  surface. See the Track A collision warning in the L3 section.

**Recommended parallelism:**

| Wave | Run in parallel |
|---|---|
| 1 | L0 design issue **+** L1-a catalog data **+** L2-a `#578` RFC格상 |
| 2 | L1-b ledger-derive inference + pattern unit tests **+** L2-b/c (controls + watchdog) **+** L5-a state machine |
| 3 | L3-a/b/c after L1-a lands and Track A queue drained **+** L5-b oscillation detector |
| 4 | L3-d (verifier integration) **+** L5-c (L2 watchdog integration) — must be sequential within their own queue |
| 5 | L0 canonical matrix run on `main`; if PASS count ≥ 1 reproducible → SSOT close |

## Positioning vs OMX `ultragoal`

[OMX `ultragoal`](https://github.com/Yeachan-Heo/oh-my-codex) (in
[`docs/ultragoal.md`](https://github.com/Yeachan-Heo/oh-my-codex/blob/main/docs/ultragoal.md)
and [`plugins/oh-my-codex/skills/ultragoal/SKILL.md`](https://github.com/Yeachan-Heo/oh-my-codex/blob/main/plugins/oh-my-codex/skills/ultragoal/SKILL.md))
is the closest in-class prior art — a durable, repo-native multi-goal
workflow layered over Codex CLI's `goals` feature. It is genuinely
excellent at *executing a known plan to completion*. Naming where it
ends is how we name where `ooo auto` must do something different — not
just "more".

### What OMX ultragoal actually does (sharp summary)

| Aspect | Behavior |
|---|---|
| Input | Free-text brief (`--brief`, `--brief-file`, `--from-stdin`) |
| Decomposition | LLM decomposes brief into `G001` / `G002` / … stories stored in `.omx/ultragoal/goals.json` |
| Durable state | `.omx/ultragoal/brief.md` + `goals.json` (plan + status + attempts + evidence) + `ledger.jsonl` (append-only event log) |
| Codex coupling | Aggregate mode (default): **one** Codex goal covers the whole run; pointer-style objective references `goals.json` rather than enumerating ids, so steering can add/split stories without weakening the end goal |
| Execution loop | `omx ultragoal complete-goals` prints a handoff; the agent calls `get_goal` → `create_goal` (only if none active) → completes the OMX story → `omx ultragoal checkpoint --status complete --evidence … --codex-goal-json <get_goal snapshot>` |
| Steering | Explicit-only structured mutations — `add_subgoal`, `split_subgoal`, `reorder_pending`, `revise_pending_wording`, `annotate_ledger`, `mark_blocked_superseded`. Prose ("make it easier") is rejected. Every accept / reject appends an audit entry. |
| Final quality gate | Mandatory on the **final story only**: targeted verification → `ai-slop-cleaner` on changed files → re-verification → `$code-review`. Clean = `recommendation:APPROVE` + `architectStatus:CLEAR`. Non-clean → `record-review-blockers` appends a new pending blocker-resolution story and the run continues. |
| Terminal contract | Ledger event kinds — `goal_completed`, `goal_failed`, `goal_blocked`, `goal_review_blocked`, `final_review_failed`, `aggregate_objective_migrated`, … |

This is a *strong* substrate. The append-only ledger, pointer-style
aggregate objective, structured-steering-with-audit, and mandatory
final quality gate are patterns we should learn from — not redo from
scratch.

### What OMX ultragoal does **not** do (the room for `ooo auto`)

OMX ultragoal assumes the brief is sufficient and the LLM
decomposition is correct. Every gap below comes from those two
assumptions.

| Capability | OMX `ultragoal` | `ooo auto` target | Lane |
|---|---|---|---|
| **Spec elicitation under ambiguity** | none — brief is accepted as-is | bounded Socratic + ledger + tacit-knowledge digging until `ambiguity ≤ 0.2`; refuses to proceed otherwise | inherits from Ouroboros |
| **Tacit knowledge / mental model surfacing** | none | Ouroboros Socratic interview + ontologist + contrarian personas already crystallize implicit domain assumptions | inherits from Ouroboros |
| **Domain-aware default AC** | none — stories are free-text decomposition | L1 DomainProfile catalog injects class-specific AC (`game-2d` → game-loop / input / playable build / render target) into the Seed before execution | **L1** |
| **Completion-mode auto-branching** | one quality gate shape (review-centric) for all classes | DomainProfile picks `CODE_COMPLETE` (library) vs `PRODUCT_COMPLETE` (game / app); user never has to choose | **L1** |
| **Long-running watchdog contract** | relies on Codex's native token/time accounting; no `WAIT / RETRY / UNSTUCK / CANCEL` directive at the OMX layer | single RuntimeControls used by MCP / evolve / auto; `runtime.watchdog.decision` events replay *why* a run paused or died | **L2** |
| **Runtime acceptance evidence** | quality gate = ai-slop-cleaner + re-verification + code-review `APPROVE/CLEAR` — heavy on review, light on actual runtime | non-test evidence is first-class: headless run logs, deterministic N-tick sim traces, render/DOM hashes; DomainProfile (L1) binds the probe per class | **L3** |
| **Typed terminal taxonomy** | ledger event *kinds* exist, but no typed `reason_code` on the run-level result | 7-code `stop_reason_code` taxonomy on `AutoPipelineResult`; every terminal carries one | **L4** (merged) |
| **Result-envelope provenance** | evidence is free-text in `goals.json[].evidence`; no `defaulted_sections` or `assumptions[source]` surface | result envelope exposes `closure_mode`, `defaulted_sections[]`, `assumptions[].source`, `stop_reason_code` | **L4** (partial — 3/5 merged) |
| **Resilience escalation ladder** | one recovery path: `record-review-blockers` appends a new pending story, keep going | layered ladder — `unstuck-persona → reframe (ontologist) → safe-default closure → BLOCKED(reason_code)`; each rung bounded; falling off the bottom is always a typed terminal | **L5** |
| **Auto self-correction** | only explicit human/agent steering directives (`omx ultragoal steer …`) | explicit steering **and** automatic correction via Track-B oscillation detector / grade-regression / fingerprinted recovery (already shipped via #928, etc.) | inherited (Track B) |

### Where ultragoal validates existing `ooo auto` substrate

Two ultragoal patterns line up with substrate Ouroboros already has or
already plans in this SSOT — listed here as *confirmation*, not as new
behavior commitments. Per the scope note at the top of this issue, this
SSOT does **not** propose changes to Ouroboros's interview / steering /
recovery substrate; it only owns the L1–L5 lanes.

1. **Append-only durable ledger as audit-of-record.** `ooo auto`
   already has this via Track C EventStore (#946 / #956). Ultragoal's
   `.omx/ultragoal/ledger.jsonl` pattern reinforces the existing design
   decision: keep treating EventStore as SSOT for every lifecycle event
   — including L2 watchdog decisions and L5 escalation transitions when
   those lanes land — and never compute terminal status from anywhere
   else. *No behavior change implied.*
2. **Quality-gate as a single bundled evidence artifact.** Ultragoal's
   `--quality-gate-json` (`aiSlopCleaner` + `verification` +
   `codeReview` keys) is a clean evidence container shape. L3 (Runtime
   Acceptance Substrate) — which is in this SSOT's scope — should emit
   a comparable structured payload for the runtime probe, so
   `PRODUCT_COMPLETE` carries one inspectable evidence object instead
   of scattered files. *Lane-internal shape choice, within L3 scope.*

### Patterns explicitly *not* adopted

Ultragoal has two further patterns that *could* be ported, but doing so
would change Ouroboros substrate outside this SSOT's scope. They are
listed here only so future readers know they were considered and
rejected for this issue:

- **Pointer-style aggregate Ralph handoff** (Ralph references Seed +
  ledger live rather than a snapshotted AC list). Whether to adopt is
  outside L1–L5 scope; if pursued, it requires a separate design issue
  against `ooo:ralph` and `--complete-product` chaining.
- **Structured-steering mutation vocabulary** (a finite set of allowed
  plan-revision kinds with evidence + audit, à la ultragoal's
  `add_subgoal` / `split_subgoal` / etc.). `ooo unstuck` and the typed
  recovery plan (#928) already cover mid-run revision today; replacing
  them with a finite vocabulary would be a substrate redesign, which
  the scope note at the top of this issue forbids.

### One-line positioning

> OMX `ultragoal` owns **"executing a known plan to completion"** under
> Codex's goal feature. `ooo auto` owns **"deciding what the plan is,
> defending it against ambiguity, running it past a runtime gate, and
> never dying without a typed reason"** — i.e. everything *upstream* of
> execution (spec / domain / watchdog) and everything *downstream* of
> execution (runtime acceptance / typed termination), via DomainProfile
> auto-branching between `CODE_COMPLETE` and `PRODUCT_COMPLETE`.
>
> Where the two systems agree on a pattern, prefer ultragoal's
> audit-first, append-only style. Where ultragoal accepts a brief as
> truth, `ooo auto` refuses to accept ambiguity as truth.

## Double-diamond mapping

```
Discover           Define                Develop                Deliver
─────────          ──────                ───────                ───────
ooo interview      ooo seed              ooo run / evolve       ooo qa + L3 probe
  │                  │                     │                      │
  Ouroboros          L1 DomainProfile      L2 watchdog +          L4 envelope +
  Socratic +         default AC +          L5 escalation          typed reason_code
  tacit ledger       completion mode       ladder
```

`ooo auto` is the single entrypoint that drives the entire 4-step
diamond without the user ever choosing which step they are in.

## AgentOS substrate dependencies (from #961)

Mapping each lane to the `#961` track that the warden uses for triage,
plus the routing the warden has actually applied to merged L4 PRs
(#1167 / #1169 were classified as *"Track B follow-up outside Track C
tier gates"*):

- **Track A** (`ooo run` fat-harness, #920 / #978): **L3** *extends*
  the evidence schema — additive, no backwards-incompatible change.
  Warden classification likely: *"Track A follow-up outside Track C
  gates"* (mirroring #1165 / #1166 / #1168 precedent).
- **Track B** (`ooo auto` self-healing, closed #772 / #809 / #821):
  **L4** (🟢 complete via #1146 / #1148 / #1151 / #1167 / #1169)
  and **L5** are natural continuations. Warden classification:
  *"Track B follow-up outside Track C tier gates"* — same precedent
  as the merged L4 PRs.
- **Track C** (AgentOS substrate, #946 / #956 / #939): L2 watchdog
  decisions ride the EventStore + projection vocabulary; L4 envelope
  fields live on the lifecycle event family.
- **Outside #961 tracks**: **L0** is a test-harness lane with no
  Track parent; classify as a peer follow-up. **L2** anchors on
  `#578`, which #961's *"Scope of the Tier system"* paragraph
  explicitly excludes from Track C tier gating — `#578` keeps its
  own design-issue lifecycle. **L1** extends `#849`, a merged
  Track B Phase-3 artifact, so its follow-ups inherit Track B
  routing.

Four of the five implementation lanes (L0 / L1 / L3 / L5) light up
existing substrate. **L2 adds one new substrate family** —
`runtime.watchdog.decision` events plus the directive vocabulary —
which is the only deliberately new building block in this SSOT. See
the *Substrate honesty* note at the top of this issue.

## Anyone asking "what can I do right now"

1. ~~Review and merge L4 in-flight PRs **#1146** / **#1148** / **#1151**~~
   — **merged 2026-05-21** (squash to main, in that order after
   sequential rebases on `AutoPipelineResult` / `_result()`).
2. ~~Merge adjacent unblocker **#1156** (Windows checkpoint sanitize)~~
   — **merged 2026-05-21**; `ooo auto` run phase now reachable on
   Windows.
3. ~~Open **PR-B2** for `ledger_done=False` safe-default finalization~~
   — **merged 2026-05-22 as #1167** (`safe_default` closure_mode +
   `interview_unsafe_gaps_remain` 8th stop_reason_code).
4. ~~Open **PR-C2** for `assumptions[].source` provenance promotion~~
   — **merged 2026-05-22 as #1169** (`AssumptionRecord` +
   `AutoPipelineResult.assumption_sources`, additive surface).
5. ~~Open design issues for L0 / L1 / L2~~ — **opened 2026-05-22**
   as #1170 / #1171 / #1172, then redesigned to minimal-substrate v1
   (see freshness sync entries below). All three are now ready for
   their respective `*-a` PR slices to start in parallel.
6. **Start L0-a** — minimal `tests/canonical/cli-todo/` scenario +
   `pytest` runner skeleton. See #1170. No CI, no replay, no budget.
7. **Start L1-a** — 7-class catalog data + `DomainProfile` dataclass
   per-class fields + unit test per class. See #1171.
8. **Start L2-a** — #578 RFC promotion to v1 minimum (single
   wall-clock timer, single cancel event, single new stop_reason_code).
   v2 expansion path documented for evidence-driven future. See #1172.
9. **Open L3 design issue** — but first apply the same minimal-substrate
   audit (likely outcome: v1 ships `headless_run` evidence kind only;
   sim_trace / render_hash / api_smoke each as their own follow-up
   when a canonical scenario demands them).
10. **Start L5-a** — plumb existing `oscillation_detected` Ralph signal
    into existing `ooo unstuck`. ~50 LoC. Does not need its own design
    issue if scope stays this small.
11. *(deferred)* When L3 opens, adopt OMX `ultragoal`'s bundled
    quality-gate JSON shape (single evidence object with named sub-keys)
    for the runtime-probe payload — only if v1 ships more than one
    evidence kind.

## Anyone asking "what's blocked"

- **L0**: not blocked — #1170 open with minimal design. L0-a ready.
- **L1**: not blocked — #1171 open with ledger-derive design locked.
  L1-a ready.
- **L2**: not blocked — #1172 open with minimal v1 design. L2-a ready
  (#578 body promotion). No BLOCK questions; v2 expansion deferred to
  evidence-driven follow-ups.
- **L3**: pending design issue + minimal-substrate audit. Implementation
  also waits on the active Track A verifier queue (#1165 / #1166 /
  #1168) draining.
- **L4**: 🟢 complete.
- **L5**: not blocked — L5-a (plumb `oscillation_detected` →
  `ooo unstuck`) ready. ~50 LoC.

## Acceptance gate (when this SSOT closes)

This SSOT closes when **all** of the following are true:

- L0, L1, L2, L3, L4, L5 are each either 🟢 merged or 🟢 explicitly
  superseded by a documented alternative.
- The **L0 canonical test matrix** runs end-to-end on a clean repo
  with **zero human intervention** for at least one canonical goal
  (e.g. `ooo auto "2D kart racer"`).
- The result is **reproducible across two consecutive runs** on the
  L0 nightly job.

Until then this issue stays **OPEN** and serves as the living
guideline. Warden-style freshness syncs append below as lanes progress.

## Freshness sync



**2026-05-21 (initial post).** Issue opened as living SSOT.

**2026-05-21 (L4 partial 🟢).** L4 Envelope v2 lane advanced from 🟡 to
🟢 partial — squash-merged #1156 (Windows checkpoint sanitize, the
adjacent prerequisite), then #1151 (`stop_reason_code` 7-code
taxonomy), then #1148 (`interview_closure_mode` ledger-only closure;
required one `src/ouroboros/auto/state.py` rebase conflict
resolution — both PRs added a `payload.setdefault` line, kept both),
then #1146 (`defaulted_sections` surface). Three of the five planned
L4 envelope fields landed; PR-B2 (`ledger_done=False` safe-default
finalization) and PR-C2 (`assumptions[].source` provenance) remain
deferred and unblock the L4 freeze.

**2026-05-21 (Positioning revised).** Original "Positioning vs OMC
autopilot / ultragoal-style flows" section was inaccurate — it
conflated OMC autopilot with OMX `ultragoal`. Rewrote against the
actual [`Yeachan-Heo/oh-my-codex` ultragoal contract](https://github.com/Yeachan-Heo/oh-my-codex/blob/main/docs/ultragoal.md)
(durable `.omx/ultragoal/` ledger, pointer-style aggregate Codex goal,
six structured-steering mutation kinds, mandatory final
ai-slop-cleaner + code-review APPROVE/CLEAR gate).

**2026-05-21 (Positioning scope-tightened).** First draft of the
absorb-these-patterns subsection proposed two ultragoal patterns —
pointer-style aggregate handoff for `--complete-product` Ralph, and a
finite structured-steering mutation vocabulary to replace `ooo unstuck`
+ #928 typed recovery — that would change Ouroboros substrate outside
the L1–L5 lanes this SSOT owns. That contradicted the scope note at
the top of the issue ("ooo auto inherits Ouroboros's substrate as-is").
Removed both from the absorb list; moved them into a new "Patterns
explicitly *not* adopted" subsection that documents the consideration
+ rejection. Remaining absorbed patterns (append-only ledger
confirmation, L3 bundled quality-gate JSON shape) are scope-clean —
the first restates existing Track C behavior, the second is a
lane-internal output schema choice.

**2026-05-22 (L4 lane 🟢 complete).** Squash-merged the two deferred
L4 follow-ups: **#1167** (PR-B2 — `safe_default` closure_mode + new
`interview_unsafe_gaps_remain` stop_reason_code, taxonomy now 8 codes)
and **#1169** (PR-C2 — `AssumptionRecord` frozen dataclass + additive
`AutoPipelineResult.assumption_sources` surface broadening from
`LedgerSource.ASSUMPTION` only to all three assumption-class sources:
`ASSUMPTION`, `INFERENCE`, `CONSERVATIVE_DEFAULT`). All five planned
L4 envelope fields are now live on `main`. L4 lane status moves from
🟡 partial to 🟢 complete; envelope v2 is frozen.

**2026-05-22 (SSOT self-audit corrections, 5 items).** A scope-and-design
audit against `#961` flagged five issues in the prior draft. Fixed in
this revision:

1. *L4 status* updated to 🟢 complete and the L4 lane body table
   regenerated against landed PR numbers / shas.
2. *Substrate honesty* — the original *"This SSOT introduces no new
   substrate"* sentence at the bottom of the AgentOS-dependencies
   section conflicted with **L2 RuntimeControls v1**, which is
   genuinely new substrate (new EventStore event family
   `runtime.watchdog.decision`, new directive vocabulary). Added a
   *Substrate honesty* paragraph to the issue header and rewrote the
   AgentOS-dependencies section to name L2 explicitly as the single
   new-substrate lane (the other four lanes light up existing
   substrate).
3. *New L0 — Canonical Test Harness lane* added as the meta-lane that
   gates every other lane's completion claim against actual canonical
   matrix behavior. Without L0, lane PRs can paper-merge while the
   integration stays broken.
4. ~~*L1 classifier acceptance gate* — L1-b cannot claim done without a
   frozen eval set (≥ 5 examples × 10 classes), ≥ 90% top-1 accuracy,
   and 100% confidence-floor escalation behavior. Added to the L1
   lane body.~~ **Superseded by the 2026-05-22 ledger-derive
   redesign (entry below). The classifier acceptance gate is no
   longer in scope.**
5. *Lane dependency graph + Track A collision warning* — new section
   between L5 and Positioning showing hard dependencies
   (L1 → L3, L2 → L5, L0 → everyone) and the soft collision risk
   between L3 and the active Track A verifier follow-ups
   (#1165 / #1166 / #1168). The L3 lane body now carries the explicit
   sequencing rule.

**2026-05-22 (Design issues opened: #1170 L0, #1171 L1, #1172 L2).**
Three design slices opened on `Q00/ouroboros` for the design-stage
lanes. Each issue body locks the formerly-open design questions with
recommended defaults under a *Decisions awaiting maintainer triage*
section so a maintainer-only triage pass can answer the remaining
4 BLOCK questions in one round (L0-2 cost ceiling, L0-4 replay
refresh ownership, plus the L1-5/L1-10 questions later retired in
the redesign below; #1172 has no remaining BLOCK questions after
verifying the `runtime_control` aggregate is additive to projection
v1 — informational confirmation posted on #946).

**2026-05-22 (L1 self-audit: classifier → ledger-derive redesign).**
Reviewer feedback on the L1 design called out that introducing a
separate Sonnet *classifier* with an eval set, accuracy floor, and
opt-in telemetry pipeline duplicates the work the Socratic interview
already does (structured-spec extraction into the ledger) and
violates this SSOT's own scope note (*"`ooo auto` inherits
Ouroboros's substrate as-is"*). The audit was correct. L1's design
was rewritten:

- Removed: classifier LLM, eval set, accuracy floor (`≥ 90%`),
  confidence threshold knob, model-routing decision (`haiku` vs
  `sonnet` vs `opus`), opt-in telemetry pipeline, `Q00/ouroboros-eval-data`
  private dataset proposal.
- Kept: 7-class catalog, per-class `default_completion_mode` /
  `default_ac_template` / `runtime_probe_kinds`, the
  `domain_unmatched` audit event for catalog gaps.
- Added: `derive_domain_from_ledger()` pure-Python pattern matcher
  in `src/ouroboros/auto/domain_inference.py` (~150 LoC), small
  hook in `interview_driver.py` to feed disambiguation question
  candidates back into the existing ambiguity-gate loop when the
  inference is ambiguous.
- BLOCK questions retired: L1-5 (classifier model) and L1-10
  (opt-in telemetry) are both moot under the redesign. Maintainer
  triage now needs to answer 2 BLOCK questions total (L0-2 cost,
  L0-4 replay refresh ownership).

The TL;DR and L1 lane body in this SSOT have been rewritten to
reflect the redesign. #1171 carries the full design.

**2026-05-22 (Minimal-substrate audit: L0 / L2 / L5 redesigned, L3 flagged).**
Same pattern that produced the L1 classifier mistake was found in L0
(nightly CI + replay layer + cost budget + ownership policy) and L2
(3-timer config + 4-directive vocabulary + subscriber pattern), with
suspicions in L5 (new state-machine substrate) and L3 (4-probe-kind
substrate). All redesigned through the minimal-substrate lens
(*"add substrate only when evidence demands it"*):

- **L0** (#1170): rewritten to a manual `pytest` harness with 4
  scenario fixtures and no CI/replay/budget/ownership infrastructure.
  ~330 LoC across 4 sub-PRs. The 2 BLOCK questions (L0-2 cost ceiling,
  L0-4 replay refresh ownership) are now both retired — they were
  decisions about substrate that no longer exists.
- **L2** (#1172): rewritten to a single wall-clock timer + a single
  `runtime.watchdog.cancel` event + a single new `watchdog_wall_clock_exceeded`
  stop_reason_code. ~150 LoC across 3 sub-PRs. v2 expansion path
  (3 timers, 4 directives, subscriber pattern, ad-hoc timeout
  deprecation) documented inside #1172 as evidence-driven follow-ups.
- **L5** (#1157 lane body): rewritten to plumb the *existing*
  `oscillation_detected` Ralph signal into the *existing* `ooo unstuck`,
  plus one new typed `unstuck_exhausted` stop_reason_code. ~100 LoC
  across 2 sub-PRs. No new state-machine, no new oscillation-detector
  substrate.
- **L3** (no design issue yet): a minimal-substrate audit note has
  been added to the L3 lane body so the same scope-tightening happens
  *before* the L3 design issue is drafted (likely outcome: v1 ships
  `headless_run` evidence kind only; sim/render/api each as their own
  follow-up when a canonical scenario demands them).

Net result: **0 BLOCK questions** across all open design issues; L0-a,
L1-a, L2-a, L5-a all ready to start in parallel; total estimated
implementation across L0/L1/L2/L5 minimal v1 is ~730 LoC versus the
~2,000+ LoC pre-audit plan.

The meta-lesson: Ouroboros's minimal-substrate principle is *"add
substrate only when evidence demands it."* Twice in one session I
defaulted to standard-engineering patterns (ML classifier, CI
infrastructure, state-machine substrate) and twice the maintainer
caught it. The pattern to internalize: *if a lane body lists
infrastructure that solves a class of problems we have not yet
observed, that infrastructure does not belong in v1.*

Wave	Run in parallel
1	L0 design issue + L1-a catalog data + L2-a `#578` RFC格상
2	L1-b ledger-derive inference + pattern unit tests + L2-b/c (controls + watchdog) + L5-a state machine
3	L3-a/b/c after L1-a lands and Track A queue drained + L5-b oscillation detector
4	L3-d (verifier integration) + L5-c (L2 watchdog integration) — must be sequential within their own queue
5	L0 canonical matrix run on `main`; if PASS count ≥ 1 reproducible → SSOT close

Capability	OMX `ultragoal`	`ooo auto` target	Lane
Spec elicitation under ambiguity	none — brief is accepted as-is	bounded Socratic + ledger + tacit-knowledge digging until `ambiguity ≤ 0.2`; refuses to proceed otherwise	inherits from Ouroboros
Tacit knowledge / mental model surfacing	none	Ouroboros Socratic interview + ontologist + contrarian personas already crystallize implicit domain assumptions	inherits from Ouroboros
Domain-aware default AC	none — stories are free-text decomposition	L1 DomainProfile catalog injects class-specific AC (`game-2d` → game-loop / input / playable build / render target) into the Seed before execution	L1
Completion-mode auto-branching	one quality gate shape (review-centric) for all classes	DomainProfile picks `CODE_COMPLETE` (library) vs `PRODUCT_COMPLETE` (game / app); user never has to choose	L1
Long-running watchdog contract	relies on Codex's native token/time accounting; no `WAIT / RETRY / UNSTUCK / CANCEL` directive at the OMX layer	single RuntimeControls used by MCP / evolve / auto; `runtime.watchdog.decision` events replay why a run paused or died	L2
Runtime acceptance evidence	quality gate = ai-slop-cleaner + re-verification + code-review `APPROVE/CLEAR` — heavy on review, light on actual runtime	non-test evidence is first-class: headless run logs, deterministic N-tick sim traces, render/DOM hashes; DomainProfile (L1) binds the probe per class	L3
Typed terminal taxonomy	ledger event kinds exist, but no typed `reason_code` on the run-level result	7-code `stop_reason_code` taxonomy on `AutoPipelineResult`; every terminal carries one	L4 (merged)
Result-envelope provenance	evidence is free-text in `goals.json[].evidence`; no `defaulted_sections` or `assumptions[source]` surface	result envelope exposes `closure_mode`, `defaulted_sections[]`, `assumptions[].source`, `stop_reason_code`	L4 (partial — 3/5 merged)
Resilience escalation ladder	one recovery path: `record-review-blockers` appends a new pending story, keep going	layered ladder — `unstuck-persona → reframe (ontologist) → safe-default closure → BLOCKED(reason_code)`; each rung bounded; falling off the bottom is always a typed terminal	L5
Auto self-correction	only explicit human/agent steering directives (`omx ultragoal steer …`)	explicit steering and automatic correction via Track-B oscillation detector / grade-regression / fingerprinted recovery (already shipped via #928, etc.)	inherited (Track B)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meta SSOT: ooo auto Vision — Autonomous Completion Engine #1157

Meta SSOT: `ooo auto` Vision — Autonomous Completion Engine

TL;DR — Status at a glance

North-star

Why this is bigger than "Seed → Run"

Success conditions (testable, frozen)

Lanes

L0 — Canonical acceptance test (minimal)

L1 — DomainProfile Catalog

L2 — Watchdog v1 (minimal new substrate)

L3 — Runtime Acceptance Substrate

L4 — Auto Envelope v2

L5 — Long-running Resilience (minimal, existing substrate only)

Lane dependency graph

Positioning vs OMX `ultragoal`

What OMX ultragoal actually does (sharp summary)

What OMX ultragoal does not do (the room for `ooo auto`)

Where ultragoal validates existing `ooo auto` substrate

Patterns explicitly not adopted

One-line positioning

Double-diamond mapping

AgentOS substrate dependencies (from #961)

Anyone asking "what can I do right now"

Anyone asking "what's blocked"

Acceptance gate (when this SSOT closes)

Freshness sync

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Lane	Owner / Anchor	Status	Gate / Next
L0 — Canonical acceptance test	#1170 umbrella	🟡 partially implemented	#1174 merged the canonical harness skeleton; #1191 merged opt-in `OUROBOROS_RUN_CANONICAL=1` live wiring + L1 catalog cross-validation. Remaining work is scenario/evidence cleanup only, especially #1195; no nightly/replay/cost substrate unless later evidence demands it.
L1 — TaskClass Catalog	#1171 umbrella	🟡 partially implemented	#1173 merged `TaskClass` / `TaskClassProfile` catalog data; #1188 merged Seed AC injection + active task class envelope. “DomainProfile Catalog” wording below is historical/stale unless explicitly referring to the old recovery-hint concept. Remaining MCP metadata cleanup is #1196.
L2 — Watchdog v1 (new substrate, minimal)	#1172 umbrella; lifts #578	🟡 partially implemented	#1178 merged `RuntimeControls` + wall-clock `Watchdog`; #1189 merged production `AutoPipeline`/CLI/MCP consumption via `runtime.watchdog.cancel` and typed stop reason. Do not revive the old `runtime.watchdog.decision` / directive-vocabulary design. Remaining cleanup is #1194.
L3 — Runtime Acceptance Substrate	#1176 umbrella	🟡 partially implemented	#1181 merged `RuntimeEvidence` + `HeadlessRunProbe`; #1190 merged `AutoPipelineResult.runtime_probe_evidence` + completion-grade `probe_runner` gate. Remaining cleanup is evidence alignment (#1195) and any future real-probe binding/scenario expansion; `sim_trace`/`render_hash`/`api_smoke` stay deferred.
L4 — Auto Envelope v2	#1146/#1148/#1151/#1167/#1169 all merged	🟢 complete	All five planned envelope fields landed (`defaulted_sections`, `interview_closure_mode`, `stop_reason_code`, `assumption_sources`, `safe_default` closure). Lane frozen.
L5 — Long-running Resilience (minimal)	extends `ooo unstuck` + Ralph oscillation detection	🟢 L5-a merged	#1175 merged the `oscillation_detected` → `UNSTUCK_LATERAL` routing slice. Further resilience substrate remains evidence-driven, not prebuilt.

#	Condition	Today	Target
1	Interview closes with `closure_mode ∈ {mutual_agreement, ledger_only, safe_default}`	partial	100% (`genuine_deadlock = 0`)
2	Seed AC reflects the inferred domain class (e.g. game → game-loop / input / render-target / playable build)	none	100% via L1
3	Long-running run never stalls without a `runtime.watchdog.decision` Directive	ad-hoc	100% via L2
4	Evidence bundle contains runtime proof (sim trace / headless run / render hash), not only unit tests	unit-tests only	100% via L3
5	Result envelope exposes `closure_mode`, `defaulted_sections`, `assumptions[source]`, `stop_reason_code` (7-code taxonomy)	partial	100% via L4
6	Sessions resumable across days; same goal lands on same lineage	shipped (#1138 et al.)	hold
7	On stall / regression, escalation ladder (`unstuck → reframe → safe-default → BLOCKED`) runs to a terminal state	partial	100% via L5

Field	Source	Status
`defaulted_sections[]`	#1146	🟢 merged
`interview_closure_mode` (`mutual_agreement` / `ledger_only` / `safe_default`)	#1148 + #1167	🟢 merged
`stop_reason_code` (8-code)	#1151 + #1167 (`interview_unsafe_gaps_remain`)	🟢 merged
safe-default finalization on `ledger_done=False` (PR-B2)	#1167	🟢 merged
`assumption_sources[]` with `AssumptionRecord` provenance (PR-C2)	#1169	🟢 merged

Aspect	Behavior
Input	Free-text brief (`--brief`, `--brief-file`, `--from-stdin`)
Decomposition	LLM decomposes brief into `G001` / `G002` / … stories stored in `.omx/ultragoal/goals.json`
Durable state	`.omx/ultragoal/brief.md` + `goals.json` (plan + status + attempts + evidence) + `ledger.jsonl` (append-only event log)
Codex coupling	Aggregate mode (default): one Codex goal covers the whole run; pointer-style objective references `goals.json` rather than enumerating ids, so steering can add/split stories without weakening the end goal
Execution loop	`omx ultragoal complete-goals` prints a handoff; the agent calls `get_goal` → `create_goal` (only if none active) → completes the OMX story → `omx ultragoal checkpoint --status complete --evidence … --codex-goal-json <get_goal snapshot>`
Steering	Explicit-only structured mutations — `add_subgoal`, `split_subgoal`, `reorder_pending`, `revise_pending_wording`, `annotate_ledger`, `mark_blocked_superseded`. Prose ("make it easier") is rejected. Every accept / reject appends an audit entry.
Final quality gate	Mandatory on the final story only: targeted verification → `ai-slop-cleaner` on changed files → re-verification → `$code-review`. Clean = `recommendation:APPROVE` + `architectStatus:CLEAR`. Non-clean → `record-review-blockers` appends a new pending blocker-resolution story and the run continues.
Terminal contract	Ledger event kinds — `goal_completed`, `goal_failed`, `goal_blocked`, `goal_review_blocked`, `final_review_failed`, `aggregate_objective_migrated`, …

Meta SSOT: ooo auto Vision — Autonomous Completion Engine #1157

Description

Meta SSOT: ooo auto Vision — Autonomous Completion Engine

TL;DR — Status at a glance

North-star

Why this is bigger than "Seed → Run"

Success conditions (testable, frozen)

Lanes

L0 — Canonical acceptance test (minimal)

L1 — DomainProfile Catalog

L2 — Watchdog v1 (minimal new substrate)

L3 — Runtime Acceptance Substrate

L4 — Auto Envelope v2

L5 — Long-running Resilience (minimal, existing substrate only)

Lane dependency graph

Positioning vs OMX ultragoal

What OMX ultragoal actually does (sharp summary)

What OMX ultragoal does not do (the room for ooo auto)

Where ultragoal validates existing ooo auto substrate

Patterns explicitly not adopted

One-line positioning

Double-diamond mapping

AgentOS substrate dependencies (from #961)

Anyone asking "what can I do right now"

Anyone asking "what's blocked"

Acceptance gate (when this SSOT closes)

Freshness sync

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Meta SSOT: `ooo auto` Vision — Autonomous Completion Engine

Positioning vs OMX `ultragoal`

What OMX ultragoal does not do (the room for `ooo auto`)

Where ultragoal validates existing `ooo auto` substrate