You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Meta SSOT slice: L1 — DomainProfile Catalog (ledger-derived domain inference + default AC injection)
Terminology cleanup (2026-05-23 KST). Implementation has standardized on TaskClass / TaskClassProfile. Older DomainProfile Catalog wording in this issue is historical and should be read as superseded unless it explicitly refers to #849's prior recovery-hint concept.
This issue is the L1 design slice of #1157. It promotes #849's DomainProfile from a typed recovery hint to a first-class domain taxonomy that drives default AC and runtime-probe binding — without introducing a separate LLM classifier.
Self-audit note (2026-05-22)
An earlier draft of this issue proposed a separate LLM-based classifier (Sonnet) at the Big Bang phase, plus an eval set, accuracy floor, confidence threshold, and opt-in telemetry. That design was wrong for Ouroboros: it duplicated the work the Socratic interview already does (extracting structured spec into a ledger) and violated the SSOT's own promise to inherit Ouroboros's interview substrate as-is. This issue replaces that design with ledger-derived domain inference — a deterministic pattern-match against the entries the existing interview already populates. Zero new LLM calls, zero new external API surface, zero new substrate. See the Why ledger-derive and Ledger-derived domain inference sections below.
Why now
The current ooo auto pipeline implicitly treats every task as if it were a library with unit-test acceptance:
The Seed Architect emits identical AC templates regardless of domain.
The verifier evaluates the same way for ooo auto "habit tracker CLI" and ooo auto "2D kart racer".
The user has to manually say things like "and also include a game loop" / "and also exit code 0" — defeating the SSOT north-star ("one vague line, no follow-up needed").
L1 closes this by deriving the domain class from the structured ledger the Socratic interview already produces and injecting class-appropriate default AC + completion mode + runtime-probe binding before the Seed is sealed.
Why ledger-derive, not classifier
The interview already runs an LLM that converts the user's goal + interview answers into structured SeedDraftLedger entries: actors, inputs, outputs, runtime_context, constraints, non_goals, acceptance_criteria, verification_plan, failure_modes. Domain class is a derived property of those entries, not a separate inference:
These are pattern matches on the entries the interview already populated and standardized (the interview's other implicit job is to coerce raw user prose toward canonical vocabulary — "do you mean stdout, stderr, or both?"). A separate LLM classifier on top would:
Duplicate the inference the interview is already doing.
Introduce a parallel confidence signal that can disagree with the interview's own ambiguity_score.
Require an eval set, accuracy floor, telemetry pipeline, and model-cost decision — all unnecessary if we reuse what the interview already produced.
Ledger-derive keeps L1 inside Ouroboros's substrate. Pattern matching is deterministic, auditable, and reproducible; growing the catalog is a ~10 LoC PR per new class (pattern function + unit tests), not an eval-set re-curation.
Frozen class taxonomy for L1-a (7 classes)
After design review, L1-a ships seven classes — the smallest set that covers every canonical L0 scenario without over-claiming distinctions we cannot defend. Three classes (game-3d, desktop-app, notebook-analysis) are deferred to follow-up PRs after the L1-a enum lands; the schema is forward-extensible so additions do not require migration.
Class
Ledger signals (illustrative)
Default completion mode
Probe binding (anticipated)
library
outputs: API surface / importable symbols; no shell/HTTP runtime
Deferred (each becomes its own follow-up PR after L1-a lands):
game-3d — render-hash probe is meaningfully harder; defer until L3 ships the render-hash kind.
desktop-app — Electron / native / PWA tri-furcation is too broad to lock in one class. Browser-based interactive frontends are provisionally absorbed by game-2d.
notebook-analysis — outlier in completion semantics; defer until at least one real-world notebook scenario hits the test matrix.
Resolved taxonomy decisions
webhook vs web-service → keep separate. The runtime probe genuinely differs (side-effect vs request-shape). User goals are linguistically distinct.
game-3d deferral → deferred (see above).
desktop-app deferral → deferred (see above).
refactor-in-place as a class → kept. vertical-slice-refactor in the L0 canonical matrix needs this class.
Per-class schema (frozen by L1-a)
@dataclass(frozen=True, slots=True)classDomainProfile:
name: str# canonical class namedefault_completion_mode: CompletionMode# CODE_COMPLETE | PRODUCT_COMPLETEdefault_ac_template: tuple[str, ...] # matches Seed.acceptance_criteria# (plain strings, not AcceptanceCriterion objects)runtime_probe_kinds: tuple[ProbeKind, ...] # bound by L3 when readysafe_defaults: Mapping[str, str] # existing #849 field# Other existing typed-recovery fields preserved
Ledger-derived domain inference (L1-b)
A pure-Python function in src/ouroboros/auto/domain_inference.py:
@dataclass(frozen=True, slots=True)classDomainInference:
"""Outcome of pattern-matching the ledger against the L1 catalog."""single: DomainProfile|Nonecandidates: frozenset[DomainProfile] # populated only when ambiguousreason: str# which pattern(s) matched, for auditdefderive_domain_from_ledger(ledger: SeedDraftLedger) ->DomainInference:
""" Pattern-match the ledger's structured entries against the L1 catalog. Returns one of: - single = <DomainProfile> (exactly one class matched) - candidates = {A, B, ...} (ambiguous; interview must disambiguate) - single = LIBRARY, reason = "unmatched" (no pattern matched; safe default) Zero LLM calls. All matching is keyword/regex/substring against the *interview-standardized* ledger entries (which are already coerced toward canonical vocabulary by the Socratic loop). """
Pattern functions are registered per-class in the same module. Each pattern is one function that returns bool against a SeedDraftLedger:
Adding a new class = new pattern function + unit test. ~10 LoC PR.
Ambiguity handling
When derive_domain_from_ledger() returns multiple candidates, the interview driver gets a next-round question candidate (small hook in interview_driver.py):
ifinference.candidatesandlen(inference.candidates) >1:
next_round_questions.append(
f"This goal could be interpreted as: {', '.join(c.nameforcininference.candidates)}. "f"Which is the primary surface?"
)
The interview's existing ambiguity-gate loop drives the next round. No new escalation system. The disambiguation question gets answered → ledger updates → derive runs again → one class wins (or remains ambiguous, in which case the loop continues until either max_rounds or convergence).
Unmatched handling
When no pattern matches, the inference returns single = LIBRARY with reason = "unmatched" and emits a domain_unmatched event into the EventStore. Reasoning:
LIBRARY has the narrowest completion gate (CODE_COMPLETE with unit tests + API import smoke). Mis-classification has the lowest blast radius — the system makes the least PRODUCT claim, so no irreversible mistakes follow.
The domain_unmatched event lets maintainers spot patterns of unmatched goals and add new catalog classes when justified.
Sub-PR breakdown
L1-a — Catalog data only. Frozen 7-class enum + per-class fields (default_completion_mode, default_ac_template, runtime_probe_kinds, safe_defaults). ~100 LoC + 1 unit test per class. Unblocks L3-c probe binding. No inference logic; uses are explicit-only at this stage.
L1-b — Pattern-matching derive_domain_from_ledger in src/ouroboros/auto/domain_inference.py + per-class _matches_* pattern functions + unit tests covering positive / negative / ambiguous / unmatched cases. ~150 LoC. No new LLM, no eval set, no accuracy floor — just deterministic pattern unit tests.
L1-c — Interview-driver integration: between rounds (after each _step settles), call derive_domain_from_ledger; if ambiguous, append a disambiguation question to next-round candidates; if unmatched after max_rounds, emit domain_unmatched and proceed with LIBRARY default. ~50 LoC + integration tests.
L1-d — Seed AC injection hook in seed_architect: when the Seed is assembled, look up the active DomainProfile.default_ac_template and prepend each entry to Seed.acceptance_criteria (user-supplied AC takes precedence on conflict, with domain_default_ac_overridden audit event). ~80 LoC + unit tests.
L1-a is the smallest first slice. L1-b/c/d/e each ≤ 1 PR.
Acceptance criteria
L1-a: 7-class enum + per-class fields + ≥ 1 unit test per class — 🟢 acceptance.
L1-b: pattern unit tests cover positive / negative / ambiguous (≥ 2 patterns matching) / unmatched cases. Adding a class requires ≤ 10 LoC + 1 test.
L1-c: an ambiguous derive_domain_from_ledger result appends exactly one disambiguation question to the next interview round; the question is resolved within max_interview_rounds.
L1-d: a canonical L0 scenario (e.g. cli-todo) Seed contains the CLI default AC template entries prepended to user AC.
L1-e: result.active_domain_profile populated on every ooo auto run.
An unmatched goal (e.g. constructed test fixture) emits domain_unmatched and falls through to LIBRARY completion mode without raising.
Out of scope
Adding new classes after L1-a freezes — each becomes its own ~10-LoC follow-up PR.
Runtime probes themselves (L3).
Watchdog / resilience (L2 / L5).
Multi-language ledger normalization (assumed: interview standardizes ledger entries to English-leaning canonical vocabulary; verified by reading existing tests/unit/auto/test_ledger_grading_answerer.py fixtures).
Decisions awaiting maintainer triage
None. The earlier draft listed L1-5 (classifier model) and L1-10 (opt-in telemetry) as BLOCK questions. The ledger-derive redesign retires both: there is no classifier model to choose, and no eval set / telemetry pipeline to set up.
Known residual risks (documented, not blockers)
R1 — sparse ledger after short interview. If the user converges the interview in 2 rounds (highly confident user), the ledger may be sparse and pattern matching may fail more often. Mitigation: pattern functions return unmatched rather than guessing; LIBRARY default keeps blast radius low.
R2 — non-English ledger entries. Existing test fixtures show ledger entries are model-generated in English-leaning canonical vocabulary. If a Korean-only interview thread emerges (the auto-answerer's from-auto synthesis is also English-tagged), a separate normalization pass would be needed. Add only if observed in the wild.
R3 — pattern catalog conflicts as the catalog grows. Two domains' patterns matching the same ledger configuration. Mitigation: DomainInference.candidates makes this explicit; no silent precedence. New patterns ship with a positive-set and a disambiguator-set test, so adding a pattern that causes a regression is caught by CI.
Meta SSOT slice: L1 — DomainProfile Catalog (ledger-derived domain inference + default AC injection)
This issue is the L1 design slice of #1157. It promotes #849's
DomainProfilefrom a typed recovery hint to a first-class domain taxonomy that drives default AC and runtime-probe binding — without introducing a separate LLM classifier.Self-audit note (2026-05-22)
An earlier draft of this issue proposed a separate LLM-based classifier (Sonnet) at the Big Bang phase, plus an eval set, accuracy floor, confidence threshold, and opt-in telemetry. That design was wrong for Ouroboros: it duplicated the work the Socratic interview already does (extracting structured spec into a ledger) and violated the SSOT's own promise to inherit Ouroboros's interview substrate as-is. This issue replaces that design with ledger-derived domain inference — a deterministic pattern-match against the entries the existing interview already populates. Zero new LLM calls, zero new external API surface, zero new substrate. See the Why ledger-derive and Ledger-derived domain inference sections below.
Why now
The current
ooo autopipeline implicitly treats every task as if it were a library with unit-test acceptance:ooo auto "habit tracker CLI"andooo auto "2D kart racer".L1 closes this by deriving the domain class from the structured ledger the Socratic interview already produces and injecting class-appropriate default AC + completion mode + runtime-probe binding before the Seed is sealed.
Why ledger-derive, not classifier
The interview already runs an LLM that converts the user's goal + interview answers into structured
SeedDraftLedgerentries:actors,inputs,outputs,runtime_context,constraints,non_goals,acceptance_criteria,verification_plan,failure_modes. Domain class is a derived property of those entries, not a separate inference:outputssays "HTTP response" +inputssays "webhook payload" →webhookoutputssays "stdout, exit code 0" +runtime_contextsays "shell" →clioutputssays "render frames" →game-2dgoalcontains "refactor" +constraintssays "preserve behavior" →refactor-in-placeThese are pattern matches on the entries the interview already populated and standardized (the interview's other implicit job is to coerce raw user prose toward canonical vocabulary — "do you mean stdout, stderr, or both?"). A separate LLM classifier on top would:
ambiguity_score.Ledger-derive keeps L1 inside Ouroboros's substrate. Pattern matching is deterministic, auditable, and reproducible; growing the catalog is a
~10 LoCPR per new class (pattern function + unit tests), not an eval-set re-curation.Frozen class taxonomy for L1-a (7 classes)
After design review, L1-a ships seven classes — the smallest set that covers every canonical L0 scenario without over-claiming distinctions we cannot defend. Three classes (
game-3d,desktop-app,notebook-analysis) are deferred to follow-up PRs after the L1-a enum lands; the schema is forward-extensible so additions do not require migration.libraryoutputs: API surface / importable symbols; no shell/HTTP runtimeCODE_COMPLETEclioutputs: stdout / exit code / printed text;runtime_context: shell / terminalPRODUCT_COMPLETEweb-serviceoutputs: REST/HTTP endpoints, JSON body, multiple routesPRODUCT_COMPLETEwebhookinputs: webhook payload / POST event;outputs: side effect (DB row / file / external call)PRODUCT_COMPLETEdata-pipelineinputs: dataset / CSV / log file;outputs: aggregated / transformed / ParquetPRODUCT_COMPLETEgame-2doutputs: render / frame / screen / canvas; (provisional: also browser-based interactive frontend)PRODUCT_COMPLETErefactor-in-placegoal: refactor / rewrite / restructure;constraints: preserve behavior / same testsCODE_COMPLETEDeferred (each becomes its own follow-up PR after L1-a lands):
game-3d— render-hash probe is meaningfully harder; defer until L3 ships the render-hash kind.desktop-app— Electron / native / PWA tri-furcation is too broad to lock in one class. Browser-based interactive frontends are provisionally absorbed bygame-2d.notebook-analysis— outlier in completion semantics; defer until at least one real-world notebook scenario hits the test matrix.Resolved taxonomy decisions
webhookvsweb-service→ keep separate. The runtime probe genuinely differs (side-effect vs request-shape). User goals are linguistically distinct.game-3ddeferral → deferred (see above).desktop-appdeferral → deferred (see above).refactor-in-placeas a class → kept.vertical-slice-refactorin the L0 canonical matrix needs this class.Per-class schema (frozen by L1-a)
Ledger-derived domain inference (L1-b)
A pure-Python function in
src/ouroboros/auto/domain_inference.py:Pattern functions are registered per-class in the same module. Each pattern is one function that returns
boolagainst aSeedDraftLedger:Adding a new class = new pattern function + unit test. ~10 LoC PR.
Ambiguity handling
When
derive_domain_from_ledger()returns multiple candidates, the interview driver gets a next-round question candidate (small hook ininterview_driver.py):The interview's existing ambiguity-gate loop drives the next round. No new escalation system. The disambiguation question gets answered → ledger updates → derive runs again → one class wins (or remains ambiguous, in which case the loop continues until either max_rounds or convergence).
Unmatched handling
When no pattern matches, the inference returns
single = LIBRARYwithreason = "unmatched"and emits adomain_unmatchedevent into the EventStore. Reasoning:LIBRARYhas the narrowest completion gate (CODE_COMPLETEwith unit tests + API import smoke). Mis-classification has the lowest blast radius — the system makes the least PRODUCT claim, so no irreversible mistakes follow.domain_unmatchedevent lets maintainers spot patterns of unmatched goals and add new catalog classes when justified.Sub-PR breakdown
default_completion_mode,default_ac_template,runtime_probe_kinds,safe_defaults). ~100 LoC + 1 unit test per class. Unblocks L3-c probe binding. No inference logic; uses are explicit-only at this stage.derive_domain_from_ledgerinsrc/ouroboros/auto/domain_inference.py+ per-class_matches_*pattern functions + unit tests covering positive / negative / ambiguous / unmatched cases. ~150 LoC. No new LLM, no eval set, no accuracy floor — just deterministic pattern unit tests._stepsettles), callderive_domain_from_ledger; if ambiguous, append a disambiguation question to next-round candidates; if unmatched aftermax_rounds, emitdomain_unmatchedand proceed with LIBRARY default. ~50 LoC + integration tests.seed_architect: when the Seed is assembled, look up the activeDomainProfile.default_ac_templateand prepend each entry toSeed.acceptance_criteria(user-supplied AC takes precedence on conflict, withdomain_default_ac_overriddenaudit event). ~80 LoC + unit tests.AutoPipelineResult.active_domain_profile: str | None, already populated instate.active_domain_profile_namefrom feat(auto): DomainProfile and VerifiablePredicate contracts (#809 P3, PR 1/6) #849; just plumb it through_result()). ~10 LoC.L1-a is the smallest first slice. L1-b/c/d/e each ≤ 1 PR.
Acceptance criteria
derive_domain_from_ledgerresult appends exactly one disambiguation question to the next interview round; the question is resolved withinmax_interview_rounds.cli-todo) Seed contains the CLI default AC template entries prepended to user AC.result.active_domain_profilepopulated on everyooo autorun.domain_unmatchedand falls through to LIBRARY completion mode without raising.Out of scope
tests/unit/auto/test_ledger_grading_answerer.pyfixtures).Decisions awaiting maintainer triage
None. The earlier draft listed L1-5 (classifier model) and L1-10 (opt-in telemetry) as BLOCK questions. The ledger-derive redesign retires both: there is no classifier model to choose, and no eval set / telemetry pipeline to set up.
Known residual risks (documented, not blockers)
unmatchedrather than guessing; LIBRARY default keeps blast radius low.from-autosynthesis is also English-tagged), a separate normalization pass would be needed. Add only if observed in the wild.DomainInference.candidatesmakes this explicit; no silent precedence. New patterns ship with a positive-set and a disambiguator-set test, so adding a pattern that causes a regression is caught by CI.References
ooo auto(L1 lane body).