[Feature] Define Run/Step/Artifact projections as the canonical harness vocabulary

## Problem

Ouroboros already has strong internal primitives (`Seed`, AC tree, EventStore, checkpoint, orchestrator session, evaluation results), but the public harness vocabulary is still too fragmented for AgentOS-level debugging and plugin integration. A maintainer or plugin author cannot consistently answer:

- What is the user-visible execution unit?
- Which model/tool/subagent action happened as one step?
- Which artifacts were produced by that step?
- Which acceptance criterion or workflow phase does a step belong to?
- Which event stream entries prove the final verdict?

This makes the harness harder to inspect, replay, evaluate, and extend through `ouroboros-plugins`.

## Why now

Ouroboros is moving toward **thin skill, fat harness** and a separate UserLevel plugin ecosystem. Before adding more plugins or AgentOS UI, the core needs a stable projection vocabulary over existing events. Letta's `Run` / `Step` model shows why this matters: long-running agent behavior becomes understandable when each invocation and model/tool pass has a first-class record.

This issue should be implemented before Run Capsules, eval-suite replicas, and harness inspector work, because those features should consume the same projection contract instead of inventing separate schemas.

## User / persona

- Maintainers debugging orchestrator/runtime behavior.
- Plugin authors who need a stable contract for reporting work.
- CI/evaluation jobs that need reproducible evidence.
- Users asking “why did Ouroboros do this?” after a long run.

## Current behavior

- Events and sessions exist, but there is no single public `RunRecord` / `StepRecord` / `ArtifactRecord` projection contract.
- Different subsystems can describe work using different names: session, execution, generation, phase, task, event, AC result.
- Plugins have no stable target object to attach produced artifacts, permission use, or verification evidence to.

## Desired behavior

Introduce a minimal, event-sourced **Run / Step / Artifact projection layer** as the canonical harness vocabulary.

Definitions:

- `RunRecord`: one user-goal or Seed execution envelope.
- `StageRecord`: a named harness phase such as `interview`, `seed`, `execute`, `evaluate`, `evolve`, `plugin`.
- `StepRecord`: one bounded unit of work such as model call, tool call, shell command, subagent dispatch, plugin command, or evaluation check.
- `ArtifactRecord`: a file, structured result, patch, verdict, log excerpt, run capsule, or evidence object produced by a step.
- `VerdictRecord`: run-level or AC-level result with evidence links.

These records must be **projections over EventStore / existing state**, not a replacement for the event journal.

## Proposed solution

1. Add Pydantic/dataclass models under a harness/projection namespace, for example:
   - `src/ouroboros/harness/projection.py`
   - or another name aligned with existing architecture terminology.
2. Add a projection builder that can construct records from existing EventStore/session data.
3. Include stable IDs:
   - `run_id`
   - `stage_id`
   - `step_id`
   - `artifact_id`
   - `event_ids[]`
4. Add minimal CLI/MCP query surface:
   - `ouroboros status run <run_id> --json`, or
   - extend existing status/query handlers if that is the established path.
5. Preserve compatibility with existing event names and docs.
6. Document mapping from existing terms to the new projection vocabulary.

## Repository direction fit

This is core harness work, not a new user workflow. It preserves the thin-skill model because skills can continue to be small entrypoints while the harness owns execution evidence, projection, and replay vocabulary. It also prepares `ouroboros-plugins` to attach work to stable steps instead of inventing plugin-local status formats.

## Dependency / sequencing

This should be the first issue in the sequence. Run Capsule, isolated eval suites, plugin audit, context inspection, and Harness Inspector should consume this projection rather than defining competing models.

## Constraints

- Do not replace EventStore.
- Do not require a new server process.
- Do not make web UI or ADE a dependency.
- Keep schemas small and append-only where possible.
- Projection must tolerate missing legacy events.
- Must preserve local-first operation.
- Must not move domain/plugin workflows into core.

## Non-goals

- No full Letta-style agent server.
- No full database migration unless absolutely needed.
- No self-editing agent memory.
- No plugin marketplace implementation.
- No visual inspector UI in this issue.

## Implementation decisions required before coding

- [ ] Name and module boundary: `harness`, `observability`, `persistence`, or `orchestrator` projection namespace.
- [ ] Whether `RunRecord` corresponds to current orchestrator session ID, execution ID, Seed ID, or a new generated ID with backreferences.
- [ ] Minimal required fields for v1 records.
- [ ] How to map old events that do not contain enough metadata.
- [ ] Whether projection is computed on demand only or cached as a checkpoint.
- [ ] JSON schema versioning strategy.

## Acceptance criteria

- [ ] A documented `RunRecord` / `StageRecord` / `StepRecord` / `ArtifactRecord` schema exists.
- [ ] A projection builder can reconstruct these records from a normal Ouroboros run's persisted state/events.
- [ ] Every projected `StepRecord` links to one or more source event IDs or explicitly marks itself as legacy/inferred.
- [ ] Artifacts can be attached to steps without plugin-specific code paths.
- [ ] Projection output is available through a machine-readable CLI or MCP query path.
- [ ] Existing tests for event persistence and status continue passing.
- [ ] At least one fixture demonstrates projection for an execution with mechanical evaluation.

## Ouroboros 실검증 항목 after implementation

Run these after code implementation and before merge:

```bash
uv run pytest tests/ -q
uv run pytest tests/test_*projection* tests/test_*event* tests/test_*status* -q

# Create or reuse a minimal seed fixture that performs a harmless local task.
uv run ouroboros run tests/fixtures/seeds/minimal-local.yaml --runtime codex

# The command name may follow the final CLI decision, but it must return JSON projection data.
uv run ouroboros status run <RUN_ID> --json | jq '.run_id, .stages, .steps, .artifacts'
```

Manual verification using Ouroboros itself:

- Start a normal `ooo run` / `ouroboros run` flow.
- Confirm the final status can answer: goal, Seed, stage sequence, step sequence, produced artifacts, final verdict, and source events.
- Confirm no plugin or skill prompt has to parse raw logs to reconstruct these facts.

## References

- `docs/architecture.md`
- `docs/events.md`
- `docs/contributing/agent-os-kernel-terminology.md`
- `docs/guides/evaluation-pipeline.md`
- Letta Run/Step concept: https://docs.letta.com/guides/agents/overview

## Checklist

- [x] I searched existing issues and discussions first.
- [x] I explained the problem, not just the solution.
- [x] I included clear scope boundaries and non-goals.
- [x] I listed concrete acceptance criteria.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Define Run/Step/Artifact projections as the canonical harness vocabulary #946

Problem

Why now

User / persona

Current behavior

Desired behavior

Proposed solution

Repository direction fit

Dependency / sequencing

Constraints

Non-goals

Implementation decisions required before coding

Acceptance criteria

Ouroboros 실검증 항목 after implementation

References

Checklist

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Feature] Define Run/Step/Artifact projections as the canonical harness vocabulary #946

Description

Problem

Why now

User / persona

Current behavior

Desired behavior

Proposed solution

Repository direction fit

Dependency / sequencing

Constraints

Non-goals

Implementation decisions required before coding

Acceptance criteria

Ouroboros 실검증 항목 after implementation

References

Checklist

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions