Epic: Assimilate Semgrep into Ouroboros AgentOS while preserving Semgrep UX

## Summary

Assimilate [`semgrep/semgrep`](https://github.com/semgrep/semgrep) into the Ouroboros plugin ecosystem as a first-class AgentOS capability while preserving the Semgrep user experience.

This epic is not about replacing Semgrep, reimplementing Semgrep, or hiding Semgrep behind a generic wrapper. The goal is to let users keep the Semgrep mental model they already know — local-first scans, familiar configs, familiar flags, JSON/SARIF output, rule testing, CI-style behavior, and optional autofix flows — while Ouroboros adds the missing AgentOS layer around it:

- explicit permissions,
- capability declarations,
- risk classification,
- audit events,
- provenance,
- normalized evidence artifacts,
- Seed / Ledger / State / Handoff compatibility,
- and resumable agent workflows.

In short:

> Preserve the Semgrep experience, but make every Semgrep capability executable, inspectable, permissioned, and handoff-capable inside Ouroboros.

This directly exercises the thesis from #27:

> Ouroboros plugins are not merely command wrappers. They are the capability assimilation layer that turns external tools, open-source libraries, and domain workflows into structured, auditable, permissioned, Seed-compatible Ouroboros capabilities.

Semgrep is explicitly in scope for that RFC class: a static-analysis engine that should remain outside core while becoming usable as an Ouroboros-native AgentOS capability.

## Source capability

Repository: https://github.com/semgrep/semgrep

Semgrep is a mature static-analysis engine that provides:

- local code scanning,
- rule-based code search,
- security and quality guardrails,
- Semgrep YAML rules,
- JSON and SARIF outputs,
- CI-oriented scan modes,
- rule testing,
- optional registry / remote configuration,
- optional metrics,
- optional autofix / dry-run flows,
- MCP / AI-assistant integration surfaces,
- and broad language support.

Important upstream properties to preserve:

- Semgrep users expect to run scans against local repositories.
- Local scanning should remain local-first.
- Familiar Semgrep concepts such as `--config`, local rule files, registry configs, `--json`, `--sarif`, `--metrics=off`, `--autofix`, `--dryrun`, `.semgrepignore`, and exit-code behavior should remain recognizable.
- Semgrep output should remain available in raw form when requested.
- Ouroboros should add structured artifacts rather than erase Semgrep's native output model.

## Product goal

Make Semgrep feel like a native AgentOS capability without breaking Semgrep muscle memory.

A user should be able to say, in effect:

```bash
ooo semgrep scan . --config rules/ci.yml
```

and get the same kind of Semgrep experience they expect, plus Ouroboros-native execution products:

```text
Semgrep scan ran locally
Raw Semgrep JSON/SARIF was preserved
Findings were normalized
Ledger/provenance/audit records were emitted
A handoff artifact was attached
A downstream agent can now triage, fix, suppress, or gate on the results
```

## Non-goals

This epic must not turn `ouroboros-plugins` into a marketplace or a dumping ground for scanner wrappers.

Out of scope for the first implementation:

- Reimplementing Semgrep's parser, rule engine, or CLI.
- Vendoring Semgrep source into this repository.
- Making Semgrep part of Ouroboros core.
- Requiring Semgrep AppSec Platform or proprietary Semgrep services.
- Uploading source code by default.
- Enabling registry/network behavior by default without explicit permission modeling.
- Enabling autofix in the v0 read-only reference path.
- Treating raw `semgrep ...` execution as sufficient plugin compliance.
- Adding schema fields speculatively before proving that the existing contract cannot represent the capability.

## Desired plugin shape

Initial plugin candidate:

```text
plugins/semgrep-static-analysis/
  ouroboros.plugin.json
  README.md
  semgrep_static_analysis/
    __init__.py
    __main__.py
    cli.py
    runner.py
    normalize.py
    artifacts.py
    audit.py
  tests/
    fixtures/
      semgrep-output-empty.json
      semgrep-output-findings.json
    test_normalize.py
    test_manifest.py
```

The name is intentionally capability-oriented rather than marketplace-oriented. Alternative names are acceptable if they preserve the same boundary:

- `semgrep-static-analysis`
- `semgrep-code-scan`
- `semgrep-agentos`

## UX preservation contract

The plugin must preserve Semgrep UX as a hard constraint.

### Preserve familiar Semgrep inputs

The plugin should accept a bounded subset of familiar Semgrep concepts first:

- target path / scanning root,
- `--config` with local rule files or directories,
- optional registry config only when network permission is modeled,
- include/exclude behavior where feasible,
- JSON output preservation,
- SARIF output preservation,
- metrics control,
- dry-run behavior for future autofix paths,
- Semgrep exit-code semantics where they matter for CI.

### Preserve familiar Semgrep outputs

The plugin should not replace Semgrep output with only an Ouroboros summary.

It should produce:

- raw Semgrep JSON artifact,
- optional raw Semgrep SARIF artifact,
- normalized Ouroboros findings artifact,
- human-readable Markdown summary,
- audit/provenance records that point to artifact paths and hashes.

### Preserve local-first behavior

The default invocation should be local and read-only:

```bash
semgrep scan --json --metrics=off --disable-version-check --config <local-config> <target>
```

Exact flags may vary by installed Semgrep version, but the intent must hold:

- no source upload by default,
- no registry fetch by default,
- metrics disabled by default,
- bounded repo-relative target paths,
- bounded artifact writes only into an Ouroboros-controlled output directory.

## AgentOS-native translation

The plugin must translate Semgrep into Ouroboros primitives rather than merely execute it.

### Core capabilities

Expected manifest capabilities for the read-only scan path:

- `ledger:write` — record invocation, policy inputs, scan summary, and decision-relevant facts.
- `provenance:write` — record Semgrep version, config source, target paths, command shape, output artifact hashes, and environment facts.
- `handoff:attach` — attach normalized findings and summary for downstream agents.
- `progress:write` — report scan progress and completion.

Optional future capabilities:

- `state:write` — persist scan state across resumable long-running scans or multi-stage triage.
- `seed:write` — generate remediation Seeds from findings when this becomes a deliberate workflow.
- `runtime:execute` — only if delegated agent execution becomes part of the plugin itself.

### External permissions

Expected baseline permissions:

- `filesystem:read` / `read_only` / required — read target source files and local rule files.
- `shell:execute` / `read_only` / required — invoke the installed Semgrep CLI with bounded arguments.

Expected optional permissions:

- `network:read` / `read_only` / optional — only for registry configs, remote configs, version checks, or other remote Semgrep flows.
- `filesystem:write` / `write` / optional — only for future autofix or explicit artifact output outside the controlled handoff directory.

The plugin must keep capabilities and permissions distinct per #27 and `docs/contract.md`.

## Command plan

### v0 command: read-only scan

```bash
ooo semgrep scan <target-path> --config <local-config>
```

Responsibilities:

- validate target path is repo-relative / bounded,
- validate local config path is bounded,
- invoke Semgrep with local-first safe defaults,
- preserve raw JSON output,
- optionally preserve SARIF output if requested,
- normalize findings,
- emit audit/provenance data,
- attach handoff artifacts,
- return a clear status code / summary.

Risk: `read_only`

Required external permissions:

- `filesystem:read`
- `shell:execute`

Required core capabilities:

- `ledger:write`
- `provenance:write`
- `handoff:attach`
- `progress:write`

### v0.1 or v1 command: CI-style scan

```bash
ooo semgrep ci-scan <target-path> --config <config> --baseline <ref>
```

Responsibilities:

- preserve Semgrep CI mental model,
- optionally honor baseline / changed-findings semantics,
- emit gate result suitable for `ooo auto`, PR review, or policy workflows,
- keep raw Semgrep output available.

Risk: usually `read_only`; network config must be separately declared if used.

### Future command: rule test

```bash
ooo semgrep rule-test <rules-path>
```

Responsibilities:

- run Semgrep rule tests,
- normalize pass/fail results,
- attach rule-test evidence for plugin / policy development.

Risk: `read_only`

### Future command: autofix dry run

```bash
ooo semgrep autofix-dryrun <target-path> --config <config>
```

Responsibilities:

- run Semgrep autofix in dry-run / preview mode,
- produce patch preview artifact,
- do not modify files.

Risk: `read_only`

### Future command: autofix apply

```bash
ooo semgrep autofix <target-path> --config <config>
```

Responsibilities:

- apply deterministic Semgrep rule-defined fixes,
- emit patch provenance,
- attach before/after evidence,
- require explicit confirmation and `filesystem:write`.

Risk: `write`

This should not ship until the read-only scan path proves the boundary.

## Manifest draft

The v0 read-only manifest should fit the current `0.1` schema without expanding the manifest contract:

```json
{
  "schema_version": "0.1",
  "name": "semgrep-static-analysis",
  "version": "0.1.0",
  "description": "Assimilates Semgrep local static-analysis scans into Ouroboros audit, provenance, and handoff artifacts while preserving Semgrep CLI UX.",
  "source": {
    "type": "local_path",
    "path": "plugins/semgrep-static-analysis"
  },
  "commands": [
    {
      "namespace": "semgrep",
      "name": "scan",
      "summary": "Run a bounded read-only Semgrep scan and attach normalized findings as Ouroboros evidence.",
      "usage": "ooo semgrep scan <target-path> --config <local-rule-or-pack>",
      "risk": "read_only",
      "requires_confirmation": false,
      "arguments": [
        {
          "name": "target_path",
          "type": "path",
          "required": true,
          "description": "Repo-relative file or directory to scan."
        },
        {
          "name": "config",
          "type": "string",
          "required": true,
          "description": "Local Semgrep config path or explicitly approved registry config."
        }
      ]
    }
  ],
  "capabilities": [
    {
      "name": "ledger",
      "access": "write",
      "reason": "Record scan invocation, policy inputs, and summary verdict."
    },
    {
      "name": "provenance",
      "access": "write",
      "reason": "Record Semgrep version, config source, target paths, and output hashes."
    },
    {
      "name": "handoff",
      "access": "attach",
      "reason": "Attach normalized findings for downstream review or automated remediation."
    },
    {
      "name": "progress",
      "access": "write",
      "reason": "Report scan progress and completion status."
    }
  ],
  "permissions": [
    {
      "scope": "filesystem:read",
      "risk": "read_only",
      "required": true,
      "reason": "Read target source files and local Semgrep rule files."
    },
    {
      "scope": "shell:execute",
      "risk": "read_only",
      "required": true,
      "reason": "Invoke the installed Semgrep CLI with bounded arguments."
    },
    {
      "scope": "network:read",
      "risk": "read_only",
      "required": false,
      "reason": "Only needed when using Semgrep registry or remote configs."
    }
  ],
  "entrypoint": {
    "type": "command",
    "command": "python -m semgrep_static_analysis"
  },
  "audit": {
    "events": [
      "plugin.invoked",
      "plugin.permission_used",
      "plugin.completed",
      "plugin.failed"
    ]
  }
}
```

## Artifact contract

Each successful scan should attach a handoff bundle similar to:

```text
.omx/artifacts/semgrep/<run-id>/
  semgrep.raw.json
  semgrep.raw.sarif          # optional
  semgrep.findings.json      # normalized Ouroboros finding model
  semgrep.summary.md         # human-readable summary
  semgrep.provenance.json    # bounded provenance fields and hashes
```

Suggested normalized finding shape:

```json
{
  "schema_version": "0.1",
  "tool": "semgrep",
  "tool_version": "1.x",
  "rule_id": "python.lang.security.audit...",
  "severity": "ERROR",
  "message": "...",
  "path": "src/example.py",
  "start": { "line": 10, "col": 5 },
  "end": { "line": 10, "col": 25 },
  "metadata": {
    "cwe": "...",
    "owasp": "..."
  },
  "fix_available": false,
  "fingerprint": "stable-or-derived-id",
  "raw_result_ref": "semgrep.raw.json#/results/0"
}
```

The normalized model should preserve enough information for downstream agents to:

- summarize risk,
- decide whether a finding blocks a workflow,
- generate remediation Seeds,
- open follow-up tasks,
- compare scan runs,
- and attach evidence to PR / review workflows.

## Audit and provenance requirements

The plugin should emit / prepare audit-compatible data for:

- `plugin.invoked`
- `plugin.permission_used`
- `plugin.completed`
- `plugin.failed`

Provenance should include bounded, redacted facts only:

- Semgrep version,
- plugin version,
- command namespace/name,
- target path(s),
- config path or config identifier,
- whether config was local or remote,
- metrics mode,
- network mode,
- output artifact paths,
- artifact hashes,
- result counts by severity,
- exit code,
- run duration.

Provenance must not include:

- raw source code,
- access tokens,
- unbounded Semgrep output blobs,
- raw user prompts,
- secret values found by scans.

## Privacy and network behavior

The default path must be privacy-preserving:

- prefer local config,
- set metrics off by default,
- disable version checks where feasible,
- do not fetch registry rules unless the user explicitly chooses a registry / remote config path,
- require `network:read` for registry or remote configuration.

If the user requests a Semgrep Registry config such as `p/ci`, `auto`, or a URL config, the plugin must surface that this is no longer a purely local invocation and requires the optional network permission path.

## Dependency and license policy

Semgrep is LGPL-2.1. This repository should not vendor Semgrep source as part of the plugin.

Preferred dependency model:

1. Require an installed `semgrep` executable and inspect it with `semgrep --version`.
2. Document installation options but do not silently install Semgrep in v0.
3. Optionally support a future setup helper that installs Semgrep only after explicit user action.
4. Preserve Semgrep license notices in plugin README / docs.

This avoids turning the Ouroboros plugin into a Semgrep fork or derivative distribution problem.

## Implementation phases

### Phase 0 — Contract analysis and RFC alignment

- [ ] Confirm the plugin fits the current manifest schema without schema expansion.
- [ ] Document why this is an assimilation plugin, not a trivial wrapper.
- [ ] Link this epic to #27 as a concrete static-analysis reference case.
- [ ] Decide final plugin name and namespace.

### Phase 1 — Read-only reference plugin skeleton

- [ ] Add `plugins/semgrep-static-analysis/ouroboros.plugin.json`.
- [ ] Add plugin README with product boundary, non-goals, privacy behavior, and dependency expectations.
- [ ] Add Python entrypoint with `scan` command.
- [ ] Validate manifest with `scripts/validate_contract.py`.
- [ ] Add catalog entry if this repository is meant to host it as a reference plugin.

### Phase 2 — Safe Semgrep runner

- [ ] Detect installed Semgrep executable.
- [ ] Capture `semgrep --version`.
- [ ] Build bounded argv instead of shell string interpolation.
- [ ] Enforce repo-relative / allowed target paths.
- [ ] Enforce local config by default.
- [ ] Add explicit branch for remote / registry configs requiring `network:read`.
- [ ] Run with metrics disabled by default.
- [ ] Preserve Semgrep exit code and stderr in bounded artifact form.

### Phase 3 — Output normalization and artifacts

- [ ] Save raw Semgrep JSON.
- [ ] Optionally save raw SARIF.
- [ ] Normalize findings into an Ouroboros-friendly JSON artifact.
- [ ] Generate Markdown summary.
- [ ] Hash artifacts for provenance.
- [ ] Include result counts by severity / rule / path.

### Phase 4 — Audit, provenance, and handoff

- [ ] Emit or prepare audit event payloads matching `schemas/0.1/audit-event.schema.json`.
- [ ] Record capabilities used.
- [ ] Record permissions used.
- [ ] Attach handoff bundle path / metadata.
- [ ] Ensure failure modes are represented as `blocked` or `failed`, not silent success.

### Phase 5 — Tests and validation

- [ ] Unit-test Semgrep JSON normalization from fixtures.
- [ ] Unit-test empty findings.
- [ ] Unit-test malformed Semgrep output.
- [ ] Unit-test path bounding.
- [ ] Unit-test argv construction.
- [ ] Unit-test manifest validation.
- [ ] Run repository validator and test suite.

### Phase 6 — Future UX parity expansion

- [ ] Add CI-style scan command if v0 proves the boundary.
- [ ] Add rule-test command.
- [ ] Add autofix dry-run command.
- [ ] Add autofix apply command only with `filesystem:write`, explicit confirmation, and patch provenance.
- [ ] Consider Semgrep MCP integration only after the CLI path is stable.

## Acceptance criteria

This epic is complete when:

- [ ] A Semgrep plugin exists under `plugins/<name>/` with a valid `ouroboros.plugin.json`.
- [ ] The plugin preserves Semgrep CLI mental model for the initial read-only scan path.
- [ ] The plugin does not vendor or reimplement Semgrep.
- [ ] The default scan path is local-first, read-only, and metrics-off.
- [ ] The manifest declares capabilities and permissions separately.
- [ ] The command risk is `read_only` for scan.
- [ ] Network behavior is optional and explicitly permissioned.
- [ ] Raw Semgrep JSON is preserved as an artifact.
- [ ] Normalized Ouroboros findings are produced.
- [ ] A human-readable summary is produced.
- [ ] Provenance records include Semgrep version, config source, target paths, artifact hashes, and result counts.
- [ ] Audit-compatible event payloads are generated or prepared.
- [ ] The scan result can be attached as a handoff artifact for downstream agents.
- [ ] Failure and blocked states are explicit.
- [ ] `python3 scripts/validate_contract.py` passes.
- [ ] Relevant unit tests pass.
- [ ] The README explains why this is an AgentOS capability assimilation plugin rather than a command wrapper.

## Why this matters for AgentOS

Ouroboros becomes a true AgentOS when external tools do not merely run beside it, but become structured capabilities inside it.

Semgrep is an ideal proof case because it already has a strong developer experience and a strong CLI identity. The challenge is therefore not to redesign Semgrep. The challenge is to preserve Semgrep's UX while adding the operating-system layer that Semgrep alone does not own:

- explicit authority,
- durable state,
- auditability,
- provenance,
- normalized artifacts,
- policy-aware risk handling,
- and downstream agent handoff.

If this succeeds, the same assimilation pattern can guide future static-analysis engines, test tools, security scanners, CI gates, and remediation loops.

## References

- https://github.com/semgrep/semgrep
- https://semgrep.dev/docs/getting-started/cli
- https://semgrep.dev/docs/cli-reference
- https://semgrep.dev/docs/writing-rules/rule-syntax
- https://semgrep.dev/docs/writing-rules/rule-defined-fix
- https://semgrep.dev/docs/metrics
- #27


Epic: Assimilate Semgrep into Ouroboros AgentOS while preserving Semgrep UX #47

Description

Summary

Source capability

Product goal

Non-goals

Desired plugin shape

UX preservation contract

Preserve familiar Semgrep inputs

Preserve familiar Semgrep outputs

Preserve local-first behavior

AgentOS-native translation

Core capabilities

External permissions

Command plan

v0 command: read-only scan

v0.1 or v1 command: CI-style scan

Future command: rule test

Future command: autofix dry run

Future command: autofix apply

Manifest draft

Artifact contract

Audit and provenance requirements

Privacy and network behavior

Dependency and license policy

Implementation phases

Phase 0 — Contract analysis and RFC alignment

Phase 1 — Read-only reference plugin skeleton

Phase 2 — Safe Semgrep runner

Phase 3 — Output normalization and artifacts

Phase 4 — Audit, provenance, and handoff

Phase 5 — Tests and validation

Phase 6 — Future UX parity expansion

Acceptance criteria

Why this matters for AgentOS

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions