Skip to content

Agent OS: introduce typed Workflow IR for fat-harness execution planning #956

@shaun0927

Description

@shaun0927

Summary

Introduce a small, repo-native Agent OS Workflow IR for Ouroboros: typed workflow nodes, typed edges, barriers, and run-state metadata that the harness can validate before dispatching work.

This is inspired by Microsoft Agent Framework's executor/edge workflow model, but it should not add Microsoft Agent Framework as a core dependency. Ouroboros should learn the harness pattern, not import the framework. The goal is to strengthen the existing thin-skill / fat-harness direction and make execution plans mechanically inspectable before agents run.

Related context:

Why this belongs in Ouroboros

Ouroboros already has strong spec-first primitives: Interview, Seed, AC tree, EventStore, ControlContract, runtime adapters, and plugin firewall. However, execution still needs a clearer harness-level representation of what is about to run:

  • what node owns a unit of work;
  • what input shape it accepts;
  • what evidence/output shape it must produce;
  • which edges may be traversed next;
  • whether fan-out/fan-in or a barrier is expected;
  • which runtime/profile/capability constraints apply.

Without a typed Workflow IR, these facts are distributed across phase code, prompts, profile metadata, and runtime-specific control flow. That makes replay, resume, plugin interop, UI inspection, and conformance testing harder than they need to be.

This should be implemented because an Agent OS needs a process graph / execution graph equivalent. The graph does not replace Seed or the evolutionary loop; it gives the harness a durable, validated control surface for executing a Seed.

Proposed implementation direction

Add a lightweight Python/Pydantic model layer, likely under one of:

  • src/ouroboros/orchestrator/workflow_ir.py, or
  • src/ouroboros/execution/workflow_ir.py

The initial IR should be intentionally small and additive.

Minimum concepts:

  • WorkflowSpec
    • stable id, schema version, source (seed, plugin, first_party_program, etc.)
    • list of nodes and edges
    • optional metadata for source Seed / AC / plugin command
  • WorkflowNode
    • node id, node kind, owner (harness, agent, plugin, human_gate, verifier)
    • input schema reference or inline JSON Schema/Pydantic shape
    • output/evidence schema reference
    • allowed capability envelope
    • runtime hints, not hard-coded runtime decisions
  • WorkflowEdge
    • source node, target node
    • edge kind: direct, conditional, fan-out, fan-in/barrier, terminal
    • condition metadata that can be evaluated by harness code, not by agent prose
  • WorkflowValidationResult
    • invalid node refs
    • incompatible input/output shapes
    • unreachable terminal state
    • duplicate ids
    • missing evidence schema

Do not wire this into all execution paths in the first PR. First ship the substrate and tests, then add adapters from current Seed/AC execution paths.

Required decisions before implementation

  1. Location and ownership

    • Should the IR live under orchestrator because it controls runtime dispatch, or under execution because it models executable work?
    • Recommendation: place the IR where current execution strategy code can consume it without importing plugin-specific modules.
  2. Schema technology

    • Pydantic models only, JSON Schema only, or Pydantic models that export JSON Schema?
    • Recommendation: Pydantic for in-process type safety, JSON Schema export for plugins and future ouroboros-plugins validation.
  3. Relationship to AC tree

    • Does each AC become one node, or can an AC expand into a subgraph?
    • Recommendation: allow both. Initial adapter can map each AC to one node; future decomposition can expand ACs into subgraphs.
  4. Relationship to RFC v2: Thin Skill (YAML) + Fat Harness — Composable Execution Invariants #830 profile YAML

    • Does execution_profile become node metadata or a separate referenced object?
    • Recommendation: keep profile metadata referenced, not duplicated, so thin profile configs remain the source of execution invariants.
  5. Versioning and compatibility

    • Define schema_version and support additive changes. Do not make stored Workflow IR unversioned.

Non-goals

  • Do not import or depend on Microsoft Agent Framework.
  • Do not redesign Interview, Seed, or the AC tree.
  • Do not replace current execution in one large PR.
  • Do not expose a user-facing workflow builder as a product surface yet.
  • Do not add Azure/DurableTask/MAF hosting dependencies.

Implementation breakdown

  1. Add Workflow IR models and validation helpers.
  2. Add unit tests for valid/invalid graphs.
  3. Add a read-only adapter that can project a simple Seed/AC execution plan into Workflow IR without changing behavior.
  4. Add docs explaining how Workflow IR supports thin skill / fat harness and differs from UserLevel plugins.
  5. Add one CLI/debug/read-only surface, if appropriate, to inspect projected IR for a seed or sample fixture.

Acceptance criteria

  • Workflow IR models exist with a versioned schema.
  • Validation rejects dangling edges, duplicate node ids, missing terminal paths, and missing required evidence/output schema metadata.
  • A current Seed/AC fixture can be projected into a Workflow IR without changing execution behavior.
  • At least one test demonstrates fan-out/fan-in or barrier metadata can be represented, even if not fully executed yet.
  • Docs explicitly state this is a harness substrate, not a new user-facing workflow product.
  • No Microsoft Agent Framework dependency is added to core runtime dependencies.

Ouroboros dogfood verification after implementation

After code lands, verify using Ouroboros itself, not only raw unit tests:

  1. Run a small spec-first coding task through the normal Ouroboros path and capture the generated/projection Workflow IR.
  2. Confirm the IR names each executable unit and its required evidence/output schema.
  3. Confirm a deliberately broken IR fixture fails validation before any agent/plugin execution starts.
  4. Confirm normal execution behavior remains unchanged for an existing small seed.
  5. Record the verification evidence in the PR:
    • command(s) run;
    • seed or fixture used;
    • IR validation output;
    • tests passed;
    • any known gaps.

Suggested command evidence should include the project's standard test command plus an Ouroboros CLI/MCP flow that exercises the projection path.

Success criteria

This issue is complete when a maintainer can inspect a planned execution as typed Workflow IR before dispatch, and invalid graph/schema wiring fails fast in the harness rather than later as agent prose drift.

Absorbed downstream scope from #957 and #959

To reduce AgentOS roadmap issue sprawl, #957 and #959 are folded into this canonical Workflow IR issue instead of remaining separate work items. Treat them as downstream phases / acceptance surfaces, not parallel substrate proposals.

Phase 2 — durable node lifecycle events, after the IR shape is stable

The first IR implementation should leave room for durable lifecycle events, but it does not need to wire every runtime path immediately. The later lifecycle slice should define bounded, versioned events such as:

  • workflow.run.created
  • workflow.node.scheduled
  • workflow.node.started
  • workflow.node.completed
  • workflow.node.failed
  • workflow.node.retried
  • workflow.edge.traversed
  • workflow.checkpoint.saved
  • workflow.run.completed|failed|cancelled

Required lifecycle constraints:

  • lifecycle payloads must be bounded and replay-safe; raw prompts, stdout, stderr, credentials, and free-form secrets do not belong in these events;
  • failed attempts remain visible in raw history even when an effective state view shows a later success;
  • lifecycle events should link to ControlContract, CheckpointStore, IOJournal, and plugin audit events instead of duplicating their payloads;
  • resume tests must prove the harness identifies the last completed node and the next runnable node without replaying side effects.

Phase 3 — conformance harness for the IR/lifecycle contract

The conformance work from #959 is also absorbed here as the verification backbone for this canonical surface. The MVP should be local and deterministic:

  • invalid Workflow IR fails before execution;
  • lifecycle/projector fixtures reconstruct current state;
  • resume fixtures do not rerun completed nodes;
  • plugin firewall contract tests continue to live in the conformance suite where they validate cross-module AgentOS boundaries;
  • HITL fixtures may be added after Agent OS HITL: standardize WAIT/RESUME ask-user and approval contract #960 lands, but should not block the first IR substrate PR;
  • default CI must not require live credentials, cloud services, or nondeterministic network calls.

Consolidated acceptance additions

  • The Workflow IR design states how node ids map to future lifecycle events.
  • At least one fixture demonstrates the intended node.failed -> retry -> effective success projection shape, even if full runtime instrumentation is follow-up.
  • A conformance test location and marker strategy are documented before lifecycle/HITL/projector tests are added there.
  • Future PRs that claim durability or replayability include EventStore/projector evidence, not only unit-test evidence.

Self-review refinement

This issue is intentionally narrower than #830. #830 focuses on execution invariants for leaf quality; this issue focuses on a typed graph/plan contract that the harness can validate before execution. It should not become a broad runtime rewrite. The first implementation is successful if it can validate and inspect a projected plan while preserving current behavior.

Merge reviewers should reject implementations that add a heavyweight external workflow SDK, move domain logic back into skills, or turn Workflow IR into a user-facing product surface before the harness contract is proven.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or meaningful improvementneeds-designMulti-PR epic or architectural change, needs human planningtier-2-unblockedPost-wiring Tier 2 work — agentos-substrate-wiring is closed; actionable with #961 sequencing

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions