Skip to content

Epic: Assimilate SWE-agent into AgentOS as a seamless issue-to-patch harness #43

@shaun0927

Description

@shaun0927

Epic: Assimilate SWE-agent into the AgentOS ecosystem as a seamless issue-to-patch execution harness

Goal

Assimilate SWE-agent/SWE-agent into the AgentOS/Ouroboros ecosystem as a first-class, contract-aware, permissioned, auditable software-engineering execution harness delivered through the ouroboros-plugins repository and the plugin contract described in #27.

The product goal is not merely to wrap the sweagent binary. The goal is to make SWE-agent run smoothly from Ouroboros while preserving the upstream SWE-agent user experience as much as possible:

sweagent run --config config/default.yaml \
  --agent.model.name gpt-4o \
  --env.repo.github_url https://github.com/org/repo \
  --problem_statement.github_url https://github.com/org/repo/issues/123

should have an AgentOS-native equivalent such as:

ooo swe-agent run --config config/default.yaml \
  --agent.model.name gpt-4o \
  --env.repo.github_url https://github.com/org/repo \
  --problem_statement.github_url https://github.com/org/repo/issues/123

while adding Ouroboros-native semantics:

SWE-agent issue-to-patch harness
  + plugin manifest
  + declared capabilities and permissions
  + scoped trust and risk classification
  + sandbox/runtime policy
  + ledger/provenance/audit events
  + Seed-compatible problem/handoff artifacts
  + resumable state/progress
  + patch/trajectory/prediction artifact attachment
  + downstream handoff to ooo auto / reviewers / verifiers

The strategic objective is to make AgentOS feel like the operating system for external software-engineering agents: users should be able to run SWE-agent from Ouroboros without losing the SWE-agent mental model, CLI shape, config-driven workflow, trajectories, patches, replay/inspection tools, or research ergonomics.

This is a concrete implementation candidate for the #27 thesis:

The plugin layer exists to keep core small while allowing the outside world to become Ouroboros-native.

In this issue, "Ouroboros-native" means SWE-agent remains recognizably SWE-agent, but its authority, artifacts, lifecycle, audit trail, and handoffs are governed by the AgentOS plugin contract instead of escaping into an unbounded command wrapper.


Source capability summary

SWE-agent is an open-source autonomous software-engineering harness that takes a GitHub issue or custom problem statement and attempts to produce a patch using a language model and a tool-enabled execution environment.

Current upstream facts observed while drafting this issue:

  • Repository: https://github.com/SWE-agent/SWE-agent
  • Default branch: main
  • License: MIT
  • Latest release observed: v1.1.0 (2025-05-22)
  • Recent repository activity observed: pushed on 2026-05-18
  • Upstream README notes that much current development effort is on SWE-agent/mini-swe-agent, which upstream describes as simpler and generally recommended going forward.
  • SWE-agent still remains a large, documented, config-rich harness with trajectories, run replay, batch mode, inspector tooling, SWE-ReX/Docker runtime support, GitHub issue ingestion, patch generation, and optional PR-opening hooks.

Primary upstream command families:

sweagent run           # run on a single issue/problem
sweagent run-batch     # batch/SWE-bench style execution
sweagent run-replay    # replay a trajectory/demo
sweagent inspect       # terminal trajectory inspector
sweagent inspector     # web trajectory inspector
sweagent quick-stats   # summarize trajectory directories
sweagent merge-preds   # merge prediction files
sweagent traj-to-demo  # convert trajectory to demo
sweagent remove-unfinished
sweagent shell

Important upstream execution surfaces:

  • RunSingleConfig: combines environment config, agent/model config, problem statement config, output directory, env var path, and action options.
  • EnvironmentConfig: controls deployment, repository source, startup commands, and shell environment.
  • SWEEnv: wraps SWE-ReX deployment/runtime, starts a shell session, copies/resets repositories, executes commands, reads/writes files, and closes the runtime.
  • ProblemStatementConfig: supports GitHub issues, text, files, and SWE-bench/multimodal problem statements.
  • SaveApplyPatchHook: saves patches and can optionally apply them to a local repository.
  • OpenPRHook: can push a branch and create a draft PR when enabled.
  • Trajectory and prediction artifacts: .traj, .pred, .patch, logs, replay config, model stats, exit status, and edited-file context.

Why this belongs in ouroboros-plugins

This is not about moving SWE-agent into Ouroboros core. This repository should remain the contract/reference/plugin layer, not a marketplace and not a dumping ground for arbitrary wrappers.

SWE-agent is a reference-quality assimilation case because it exercises exactly the boundaries #27 is trying to make explicit:

  1. External autonomous execution harness

    • SWE-agent is not just a library call; it is an agent runtime that can clone/copy repos, execute shell commands, edit code, produce patches, and optionally create PRs.
    • This makes it an ideal stress test for the difference between a trivial command wrapper and an Ouroboros-native capability.
  2. User-experience preservation requirement

    • The upstream value is tied to the sweagent CLI, YAML config model, trajectory format, replay tooling, inspectors, and research workflow.
    • The AgentOS plugin must preserve that experience rather than forcing users into a completely different abstraction prematurely.
  3. Permission and risk boundary

    • SWE-agent can read repos, write patches, run arbitrary shell commands in a sandbox, call LLM APIs, use Docker/SWE-ReX, read GitHub issues, and optionally push branches/open PRs.
    • These authorities must be declared and audited through plugin capabilities/permissions.
  4. Artifact and handoff richness

    • SWE-agent produces exactly the kind of artifacts AgentOS should understand: patches, trajectories, predictions, logs, replay configs, model stats, and edited-file context.
    • The plugin should translate these into ledger/provenance/handoff artifacts for downstream review, verification, or ooo auto continuation.
  5. AgentOS as the execution substrate

    • The long-term vision is that AgentOS can run external agents as supervised capabilities, not that core knows every external agent’s internal branches.
    • SWE-agent should become a canonical example of “external agent harness → AgentOS-native supervised capability.”

Product principle: preserve SWE-agent UX first

The first plugin version should deliberately preserve the SWE-agent user experience:

  • Keep upstream sweagent subcommands recognizable.
  • Preserve upstream config files and dotted CLI override style.
  • Preserve upstream output artifacts and directory conventions where practical.
  • Preserve .traj, .pred, .patch, logs, replay config, and inspector compatibility.
  • Preserve existing SWE-agent docs/tutorial compatibility by making command translation obvious.
  • Add AgentOS semantics around the run rather than replacing the run with a new workflow vocabulary.

A user who already knows SWE-agent should be able to predict the AgentOS command surface.

Preferred mapping:

sweagent run ...              → ooo swe-agent run ...
sweagent run-batch ...        → ooo swe-agent run-batch ...
sweagent run-replay ...       → ooo swe-agent run-replay ...
sweagent inspect ...          → ooo swe-agent inspect ...
sweagent inspector ...        → ooo swe-agent inspector ...
sweagent quick-stats ...      → ooo swe-agent quick-stats ...
sweagent merge-preds ...      → ooo swe-agent merge-preds ...
sweagent traj-to-demo ...     → ooo swe-agent traj-to-demo ...
sweagent remove-unfinished ...→ ooo swe-agent remove-unfinished ...
sweagent shell ...            → ooo swe-agent shell ...

The plugin adapter may add AgentOS-specific flags, but should avoid breaking upstream CLI expectations:

ooo swe-agent run ... \
  --agentos-artifact-dir .ouroboros/artifacts/swe-agent/<run-id> \
  --agentos-handoff \
  --agentos-audit \
  --agentos-no-open-pr

If additional flags are needed, prefer namespaced flags such as --agentos-* or a separate wrapper mode rather than silently changing upstream SWE-agent semantics.


Proposed plugin identity

plugins/swe-agent-harness/
  ouroboros.plugin.json
  README.md
  swe_agent_harness/
    __init__.py
    __main__.py
    adapter.py
    artifacts.py
    audit.py
    handoff.py
    manifest.py
    run_spec.py
    permissions.py

Suggested plugin name:

swe-agent-harness

Rationale: the plugin assimilates an external harness. It does not reimplement SWE-agent and should not imply that SWE-agent itself has been absorbed into core.

Entrypoint pattern:

{
  "entrypoint": {
    "type": "command",
    "command": "python -m swe_agent_harness"
  }
}

The adapter should invoke upstream SWE-agent as an external dependency or local executable when available. It should not vendor the entire SWE-agent repository into Ouroboros core.


Proposed command surface

Baseline commands

ooo swe-agent run ...

Run SWE-agent on a single problem while preserving upstream CLI compatibility.

Must support upstream-style inputs:

ooo swe-agent run --config config/default.yaml \
  --agent.model.name gpt-4o \
  --env.repo.github_url https://github.com/org/repo \
  --problem_statement.github_url https://github.com/org/repo/issues/123
ooo swe-agent run --config config/default.yaml \
  --env.repo.path /path/to/repo \
  --problem_statement.path ./problem.md

AgentOS additions:

  • validate declared permissions before invocation
  • create run id and artifact directory
  • record normalized run spec
  • capture patch/prediction/trajectory/logs
  • emit audit/provenance summary
  • create handoff.md and handoff.json
  • expose next-step recommendations: inspect, apply patch, verify, hand off to ooo auto, or open PR if separately trusted

ooo swe-agent run-batch ...

Preserve upstream batch mode, but classify it as a higher-risk / higher-cost command.

MVP can defer full batch support if single-run artifacts and trust semantics are not complete.

ooo swe-agent run-replay ...

Replay an existing trajectory/demo and attach the replay result as a new AgentOS artifact.

This should be one of the safest and most valuable early commands because it helps audit and reproduce prior runs.

ooo swe-agent inspect ...

Open or summarize a trajectory using the upstream-compatible inspector path.

Risk should be read_only when it only reads existing artifacts.

ooo swe-agent quick-stats ...

Read trajectory directories and summarize exit statuses/model stats.

Risk should be read_only.

ooo swe-agent merge-preds ...

Merge prediction files into a derived artifact.

Risk should be write because it writes a new local file, but it should not need external authority.

ooo swe-agent traj-to-demo ...

Convert trajectory files to editable demos.

Risk should be write.

AgentOS-native helper commands

These may be plugin-specific additions that do not exist upstream:

ooo swe-agent prepare ...

Create a bounded run spec / handoff without executing SWE-agent.

Purpose:

  • inspect inputs
  • normalize repo/problem/config references
  • declare required permissions
  • estimate risk
  • create Seed-compatible problem/handoff artifacts

ooo swe-agent collect-artifacts <output-dir>

Read an existing SWE-agent run output and attach it to the AgentOS ledger/provenance/handoff system.

This is useful for retroactive assimilation of runs performed outside Ouroboros.

ooo swe-agent handoff <run-id-or-output-dir>

Generate or regenerate handoff.md / handoff.json from SWE-agent artifacts.

ooo swe-agent verify-artifacts <run-id-or-output-dir>

Validate that expected artifacts exist and are internally consistent.


Proposed capabilities

The plugin should declare only the Ouroboros substrate capabilities it actually needs.

Likely baseline capabilities:

seed:read/write          # consume or generate problem/run specs when used as Seed handoff
ledger:write             # record evidence, run status, patch artifacts, decisions
state:write              # persist run/progress/resume state
provenance:write         # record repo, issue, config, model, artifact, and source metadata
runtime:execute          # invoke SWE-agent and sandbox runtime
handoff:attach           # attach patch/trajectory/prediction/handoff artifacts
progress:write           # stream run status and summaries
mcp:call                 # optional/future only, if delegated through MCP surfaces

Do not request mcp:call unless the implementation actually uses it.


Proposed permissions and risk taxonomy

Authority Scope Risk Required? Notes
Read local repo/problem/config/artifacts filesystem:read read_only yes Needed for almost all commands.
Write output artifacts/patch/handoff filesystem:write write yes for run/prepare Must be output-dir bounded.
Execute SWE-agent and sandbox commands shell:execute write yes for run Should be sandboxed and audited.
Start runtime / container / SWE-ReX deployment shell:execute + runtime:execute write or policy-high yes for sandboxed run Docker/socket access must be treated carefully.
Read GitHub issue/repo metadata github:read, network:read read_only optional Needed for problem_statement.github_url or env.repo.github_url.
Call LLM provider APIs network:write write optional Cost/data egress must be visible.
Push branch / open PR github:pull_request:write, network:write destructive or high write no / deferred Should not be enabled by default.
Apply patch to host local repo filesystem:write write optional Requires explicit command/confirmation.
Offensive cybersecurity/CTF modes separate explicit scope high risk no / deferred Must not be silently bundled with normal issue-fixing UX.

Initial risk recommendation

  • inspect, quick-stats: read_only
  • prepare, handoff, collect-artifacts, traj-to-demo, merge-preds: write
  • run, run-replay: write with confirmation depending on sandbox authority
  • apply-patch: write with confirmation
  • open-pr: defer, or classify as destructive/high-write with explicit trust
  • any security/offensive mode: defer until a separate policy issue exists

Artifact contract

Every AgentOS-managed SWE-agent run should produce an artifact bundle. Preserve upstream artifacts and add AgentOS metadata rather than replacing the upstream layout.

Suggested bundle:

.agentos/swe-agent/<run-id>/
  run-spec.json              # normalized AgentOS + SWE-agent invocation spec
  problem.md                 # resolved problem statement when available
  upstream-command.txt       # exact sweagent command equivalent
  stdout.log
  stderr.log
  swe-agent-output/          # upstream output directory, preserved
    <instance-id>/
      <instance-id>.traj
      <instance-id>.pred
      <instance-id>.patch
      *.trace.log
      *.debug.log
      *.info.log
      config.yaml
  patch.diff                 # normalized pointer/copy of selected patch, if any
  prediction.pred            # normalized pointer/copy of selected prediction, if any
  trajectory.traj            # normalized pointer/copy of selected trajectory, if any
  audit-summary.json
  provenance.json
  handoff.json
  handoff.md

handoff.md should answer

  • What problem was attempted?
  • Which repo/base commit/branch was used?
  • Which SWE-agent config/model was used?
  • Which command was run?
  • Which permissions were exercised?
  • Which artifacts were produced?
  • Did the run complete, fail, block, submit a patch, or exit early?
  • What files were edited according to the patch/trajectory metadata?
  • What should happen next?
    • inspect patch
    • run tests
    • apply patch
    • hand off to ooo auto
    • open PR only if explicitly trusted

provenance.json should record bounded metadata only

Allowed examples:

{
  "source_repo": "https://github.com/org/repo",
  "base_commit": "abc123",
  "problem_statement_source": "github_issue",
  "problem_statement_url": "https://github.com/org/repo/issues/123",
  "swe_agent_repo": "https://github.com/SWE-agent/SWE-agent",
  "swe_agent_version": "v1.1.0",
  "config_files": ["config/default.yaml"],
  "model_name": "gpt-4o",
  "output_dir": ".agentos/swe-agent/<run-id>/swe-agent-output",
  "artifact_paths": ["trajectory.traj", "patch.diff", "prediction.pred"]
}

Forbidden examples:

  • raw API keys
  • raw OAuth tokens
  • unredacted environment dumps
  • full private issue bodies unless explicitly allowed
  • arbitrary shell history outside the run
  • unbounded model prompts if they contain secrets

Execution semantics

Preserve upstream execution path

The adapter should be able to run the upstream CLI directly:

sweagent run <original args>

or a configured local checkout/module when needed.

The plugin should not fork upstream behavior unless required for AgentOS safety. Prefer these layers:

  1. preflight validation and permission checks
  2. command construction / pass-through compatibility
  3. sandbox/runtime guardrails
  4. artifact collection
  5. provenance/audit/handoff conversion

Sandboxing requirements

The plugin should make sandbox policy explicit:

  • local repo path must be bounded and resolved
  • output dir must be controlled by the plugin or explicitly supplied
  • host patch application must be a separate action from sandboxed patch generation
  • Docker/SWE-ReX access must be documented as runtime authority
  • startup commands must be captured in run spec
  • network/API use must be visible before invocation
  • timeout/budget/cost-limit values must be captured when provided

Failure semantics

The plugin should distinguish:

blocked      permission/trust/sandbox policy prevented run
failed       adapter or SWE-agent execution failed
completed    run completed, no patch necessarily produced
submitted    SWE-agent produced a patch/submission
partial      artifacts exist but run ended early or with uncertain status
cancelled    user/runtime cancelled run

Map these to the standard plugin audit vocabulary where possible:

plugin.invoked
plugin.permission_used
plugin.completed
plugin.failed

Use plugin.failed with status=blocked for firewall/trust denials, consistent with the existing plugin contract.


Manifest / schema pressure discovered by this epic

This plugin should start within the existing v0.1 manifest contract. However, SWE-agent is likely to expose future schema needs. Do not expand the schema speculatively; document pressure and open follow-up issues only when implementation proves the need.

Likely future pressure points:

  1. Command-level permissions

    • inspect needs only read permissions.
    • run needs shell/runtime/filesystem/network.
    • open-pr needs GitHub write/destructive authority.
    • Current v0.1 permissions are plugin-level, so command-level mapping may become necessary.
  2. Artifact declarations

    • The plugin produces patch, trajectory, prediction, logs, replay config, and handoff bundles.
    • A future schema could declare artifact types and paths.
  3. Secret/environment declarations

    • SWE-agent often relies on OPENAI_API_KEY, ANTHROPIC_API_KEY, GITHUB_TOKEN, or provider-specific variables.
    • A future schema may need bounded secret requirements without storing secret values.
  4. Network endpoint declarations

    • GitHub, LLM providers, Modal/AWS, or other deployment targets may be involved.
    • A future schema may need endpoint categories or allowlists.
  5. Long-running progress/resume metadata

    • SWE-agent runs can be long-running and expensive.
    • Better progress, cancellation, and resume semantics may be needed.

Non-goals

  • Do not vendor the full SWE-agent repository into Ouroboros core.
  • Do not teach ooo auto SWE-agent-specific branches.
  • Do not turn Q00/ouroboros-plugins into a marketplace listing for SWE-agent.
  • Do not hide SWE-agent’s CLI/config model behind an incompatible abstraction.
  • Do not silently apply patches to the host repository after a run.
  • Do not silently push branches or open PRs.
  • Do not grant network:write, shell:execute, or GitHub write permissions implicitly.
  • Do not store raw secrets or unbounded private prompts in provenance.
  • Do not enable offensive cybersecurity workflows under the same default trust path as ordinary issue fixing.
  • Do not expand the plugin manifest schema until this reference plugin proves a real contract need.

Suggested implementation phases

Phase 1 — RFC/design and UX parity spec

Deliverables:

  • Add a design note or RFC section documenting SWE-agent as an external agent harness assimilation case.
  • Define exact command parity goals.
  • Define the adapter-vs-vendoring boundary.
  • Define the artifact bundle contract.
  • Define permission/risk classification for each command family.
  • Decide whether baseline targets upstream SWE-agent, mini-SWE-agent, or both.

Recommendation: start with SWE-agent parity because this epic is scoped to SWE-agent/SWE-agent, but explicitly leave room for a future mini-swe-agent adapter or compatibility mode.

Phase 2 — Plugin skeleton

Deliverables:

plugins/swe-agent-harness/ouroboros.plugin.json
plugins/swe-agent-harness/README.md
plugins/swe-agent-harness/swe_agent_harness/__main__.py

Commands to declare first:

  • run
  • run-replay
  • inspect
  • quick-stats
  • collect-artifacts
  • handoff

Defer:

  • run-batch
  • apply-patch
  • open-pr
  • offensive/security modes

Phase 3 — Pass-through runner with artifact collection

Deliverables:

  • command pass-through preserving upstream argument shape
  • run id generation
  • output dir normalization
  • stdout/stderr capture
  • upstream command recording
  • artifact discovery for .traj, .pred, .patch, logs, config
  • handoff.md / handoff.json
  • provenance.json
  • audit-summary.json

Phase 4 — Permission and trust integration

Deliverables:

  • command-risk classification in manifest/docs
  • preflight checks for filesystem/network/shell/GitHub authorities
  • blocked status semantics when permissions are missing
  • confirmation behavior for run if local write/shell/runtime authority is present
  • separate trust path for any host patch application or GitHub mutation

Phase 5 — Replay/inspect/read-only tooling

Deliverables:

  • inspect over existing trajectories
  • quick-stats over trajectory directories
  • run-replay with artifact attachment
  • retroactive collect-artifacts for SWE-agent runs performed outside AgentOS

Phase 6 — Controlled mutation commands

Only after the baseline is safe:

  • apply-patch with explicit confirmation
  • open-pr with explicit GitHub write/destructive trust
  • optional run-batch with budget/cost guardrails

Acceptance criteria

This epic is complete when:

  • A SWE-agent assimilation design/RFC note exists and links back to SSOT: UserLevel plugin authoring and capability assimilation #27.
  • The design clearly states that the goal is UX-preserving AgentOS assimilation, not a trivial wrapper and not core vendoring.
  • A plugin skeleton exists under plugins/swe-agent-harness/.
  • ouroboros.plugin.json validates against the current schema.
  • The plugin README documents command parity with upstream sweagent.
  • The plugin README documents capabilities, permissions, risk tiers, trust expectations, and non-goals.
  • ooo swe-agent run ... can pass through a normal upstream-style sweagent run ... invocation in a bounded way.
  • The plugin preserves upstream artifacts: trajectory, prediction, patch, logs, and config where produced.
  • The plugin creates AgentOS artifacts: run-spec.json, provenance.json, audit-summary.json, handoff.json, and handoff.md.
  • The plugin does not apply patches to the host repository by default.
  • The plugin does not open PRs by default.
  • Missing permissions produce blocked/failure semantics instead of pretending the plugin ran.
  • inspect and/or quick-stats can operate as read-only commands over existing SWE-agent artifacts.
  • At least one fixture or smoke test validates manifest conformance and basic artifact conversion.
  • Any schema gaps discovered during implementation are documented as follow-up issues rather than silently expanding the manifest.
  • README or docs link this plugin as a reference example of assimilating an external autonomous agent harness into AgentOS.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions