Problem
Ouroboros already has strong internal primitives (Seed, AC tree, EventStore, checkpoint, orchestrator session, evaluation results), but the public harness vocabulary is still too fragmented for AgentOS-level debugging and plugin integration. A maintainer or plugin author cannot consistently answer:
- What is the user-visible execution unit?
- Which model/tool/subagent action happened as one step?
- Which artifacts were produced by that step?
- Which acceptance criterion or workflow phase does a step belong to?
- Which event stream entries prove the final verdict?
This makes the harness harder to inspect, replay, evaluate, and extend through ouroboros-plugins.
Why now
Ouroboros is moving toward thin skill, fat harness and a separate UserLevel plugin ecosystem. Before adding more plugins or AgentOS UI, the core needs a stable projection vocabulary over existing events. Letta's Run / Step model shows why this matters: long-running agent behavior becomes understandable when each invocation and model/tool pass has a first-class record.
This issue should be implemented before Run Capsules, eval-suite replicas, and harness inspector work, because those features should consume the same projection contract instead of inventing separate schemas.
User / persona
- Maintainers debugging orchestrator/runtime behavior.
- Plugin authors who need a stable contract for reporting work.
- CI/evaluation jobs that need reproducible evidence.
- Users asking “why did Ouroboros do this?” after a long run.
Current behavior
- Events and sessions exist, but there is no single public
RunRecord / StepRecord / ArtifactRecord projection contract.
- Different subsystems can describe work using different names: session, execution, generation, phase, task, event, AC result.
- Plugins have no stable target object to attach produced artifacts, permission use, or verification evidence to.
Desired behavior
Introduce a minimal, event-sourced Run / Step / Artifact projection layer as the canonical harness vocabulary.
Definitions:
RunRecord: one user-goal or Seed execution envelope.
StageRecord: a named harness phase such as interview, seed, execute, evaluate, evolve, plugin.
StepRecord: one bounded unit of work such as model call, tool call, shell command, subagent dispatch, plugin command, or evaluation check.
ArtifactRecord: a file, structured result, patch, verdict, log excerpt, run capsule, or evidence object produced by a step.
VerdictRecord: run-level or AC-level result with evidence links.
These records must be projections over EventStore / existing state, not a replacement for the event journal.
Proposed solution
- Add Pydantic/dataclass models under a harness/projection namespace, for example:
src/ouroboros/harness/projection.py
- or another name aligned with existing architecture terminology.
- Add a projection builder that can construct records from existing EventStore/session data.
- Include stable IDs:
run_id
stage_id
step_id
artifact_id
event_ids[]
- Add minimal CLI/MCP query surface:
ouroboros status run <run_id> --json, or
- extend existing status/query handlers if that is the established path.
- Preserve compatibility with existing event names and docs.
- Document mapping from existing terms to the new projection vocabulary.
Repository direction fit
This is core harness work, not a new user workflow. It preserves the thin-skill model because skills can continue to be small entrypoints while the harness owns execution evidence, projection, and replay vocabulary. It also prepares ouroboros-plugins to attach work to stable steps instead of inventing plugin-local status formats.
Dependency / sequencing
This should be the first issue in the sequence. Run Capsule, isolated eval suites, plugin audit, context inspection, and Harness Inspector should consume this projection rather than defining competing models.
Constraints
- Do not replace EventStore.
- Do not require a new server process.
- Do not make web UI or ADE a dependency.
- Keep schemas small and append-only where possible.
- Projection must tolerate missing legacy events.
- Must preserve local-first operation.
- Must not move domain/plugin workflows into core.
Non-goals
- No full Letta-style agent server.
- No full database migration unless absolutely needed.
- No self-editing agent memory.
- No plugin marketplace implementation.
- No visual inspector UI in this issue.
Implementation decisions required before coding
Acceptance criteria
Ouroboros 실검증 항목 after implementation
Run these after code implementation and before merge:
uv run pytest tests/ -q
uv run pytest tests/test_*projection* tests/test_*event* tests/test_*status* -q
# Create or reuse a minimal seed fixture that performs a harmless local task.
uv run ouroboros run tests/fixtures/seeds/minimal-local.yaml --runtime codex
# The command name may follow the final CLI decision, but it must return JSON projection data.
uv run ouroboros status run <RUN_ID> --json | jq '.run_id, .stages, .steps, .artifacts'
Manual verification using Ouroboros itself:
- Start a normal
ooo run / ouroboros run flow.
- Confirm the final status can answer: goal, Seed, stage sequence, step sequence, produced artifacts, final verdict, and source events.
- Confirm no plugin or skill prompt has to parse raw logs to reconstruct these facts.
References
Checklist
Problem
Ouroboros already has strong internal primitives (
Seed, AC tree, EventStore, checkpoint, orchestrator session, evaluation results), but the public harness vocabulary is still too fragmented for AgentOS-level debugging and plugin integration. A maintainer or plugin author cannot consistently answer:This makes the harness harder to inspect, replay, evaluate, and extend through
ouroboros-plugins.Why now
Ouroboros is moving toward thin skill, fat harness and a separate UserLevel plugin ecosystem. Before adding more plugins or AgentOS UI, the core needs a stable projection vocabulary over existing events. Letta's
Run/Stepmodel shows why this matters: long-running agent behavior becomes understandable when each invocation and model/tool pass has a first-class record.This issue should be implemented before Run Capsules, eval-suite replicas, and harness inspector work, because those features should consume the same projection contract instead of inventing separate schemas.
User / persona
Current behavior
RunRecord/StepRecord/ArtifactRecordprojection contract.Desired behavior
Introduce a minimal, event-sourced Run / Step / Artifact projection layer as the canonical harness vocabulary.
Definitions:
RunRecord: one user-goal or Seed execution envelope.StageRecord: a named harness phase such asinterview,seed,execute,evaluate,evolve,plugin.StepRecord: one bounded unit of work such as model call, tool call, shell command, subagent dispatch, plugin command, or evaluation check.ArtifactRecord: a file, structured result, patch, verdict, log excerpt, run capsule, or evidence object produced by a step.VerdictRecord: run-level or AC-level result with evidence links.These records must be projections over EventStore / existing state, not a replacement for the event journal.
Proposed solution
src/ouroboros/harness/projection.pyrun_idstage_idstep_idartifact_idevent_ids[]ouroboros status run <run_id> --json, orRepository direction fit
This is core harness work, not a new user workflow. It preserves the thin-skill model because skills can continue to be small entrypoints while the harness owns execution evidence, projection, and replay vocabulary. It also prepares
ouroboros-pluginsto attach work to stable steps instead of inventing plugin-local status formats.Dependency / sequencing
This should be the first issue in the sequence. Run Capsule, isolated eval suites, plugin audit, context inspection, and Harness Inspector should consume this projection rather than defining competing models.
Constraints
Non-goals
Implementation decisions required before coding
harness,observability,persistence, ororchestratorprojection namespace.RunRecordcorresponds to current orchestrator session ID, execution ID, Seed ID, or a new generated ID with backreferences.Acceptance criteria
RunRecord/StageRecord/StepRecord/ArtifactRecordschema exists.StepRecordlinks to one or more source event IDs or explicitly marks itself as legacy/inferred.Ouroboros 실검증 항목 after implementation
Run these after code implementation and before merge:
Manual verification using Ouroboros itself:
ooo run/ouroboros runflow.References
docs/architecture.mddocs/events.mddocs/contributing/agent-os-kernel-terminology.mddocs/guides/evaluation-pipeline.mdChecklist