Skip to content

Epic: Assimilate Semgrep into Ouroboros AgentOS while preserving Semgrep UX #47

@shaun0927

Description

@shaun0927

Summary

Assimilate semgrep/semgrep into the Ouroboros plugin ecosystem as a first-class AgentOS capability while preserving the Semgrep user experience.

This epic is not about replacing Semgrep, reimplementing Semgrep, or hiding Semgrep behind a generic wrapper. The goal is to let users keep the Semgrep mental model they already know — local-first scans, familiar configs, familiar flags, JSON/SARIF output, rule testing, CI-style behavior, and optional autofix flows — while Ouroboros adds the missing AgentOS layer around it:

  • explicit permissions,
  • capability declarations,
  • risk classification,
  • audit events,
  • provenance,
  • normalized evidence artifacts,
  • Seed / Ledger / State / Handoff compatibility,
  • and resumable agent workflows.

In short:

Preserve the Semgrep experience, but make every Semgrep capability executable, inspectable, permissioned, and handoff-capable inside Ouroboros.

This directly exercises the thesis from #27:

Ouroboros plugins are not merely command wrappers. They are the capability assimilation layer that turns external tools, open-source libraries, and domain workflows into structured, auditable, permissioned, Seed-compatible Ouroboros capabilities.

Semgrep is explicitly in scope for that RFC class: a static-analysis engine that should remain outside core while becoming usable as an Ouroboros-native AgentOS capability.

Source capability

Repository: https://github.com/semgrep/semgrep

Semgrep is a mature static-analysis engine that provides:

  • local code scanning,
  • rule-based code search,
  • security and quality guardrails,
  • Semgrep YAML rules,
  • JSON and SARIF outputs,
  • CI-oriented scan modes,
  • rule testing,
  • optional registry / remote configuration,
  • optional metrics,
  • optional autofix / dry-run flows,
  • MCP / AI-assistant integration surfaces,
  • and broad language support.

Important upstream properties to preserve:

  • Semgrep users expect to run scans against local repositories.
  • Local scanning should remain local-first.
  • Familiar Semgrep concepts such as --config, local rule files, registry configs, --json, --sarif, --metrics=off, --autofix, --dryrun, .semgrepignore, and exit-code behavior should remain recognizable.
  • Semgrep output should remain available in raw form when requested.
  • Ouroboros should add structured artifacts rather than erase Semgrep's native output model.

Product goal

Make Semgrep feel like a native AgentOS capability without breaking Semgrep muscle memory.

A user should be able to say, in effect:

ooo semgrep scan . --config rules/ci.yml

and get the same kind of Semgrep experience they expect, plus Ouroboros-native execution products:

Semgrep scan ran locally
Raw Semgrep JSON/SARIF was preserved
Findings were normalized
Ledger/provenance/audit records were emitted
A handoff artifact was attached
A downstream agent can now triage, fix, suppress, or gate on the results

Non-goals

This epic must not turn ouroboros-plugins into a marketplace or a dumping ground for scanner wrappers.

Out of scope for the first implementation:

  • Reimplementing Semgrep's parser, rule engine, or CLI.
  • Vendoring Semgrep source into this repository.
  • Making Semgrep part of Ouroboros core.
  • Requiring Semgrep AppSec Platform or proprietary Semgrep services.
  • Uploading source code by default.
  • Enabling registry/network behavior by default without explicit permission modeling.
  • Enabling autofix in the v0 read-only reference path.
  • Treating raw semgrep ... execution as sufficient plugin compliance.
  • Adding schema fields speculatively before proving that the existing contract cannot represent the capability.

Desired plugin shape

Initial plugin candidate:

plugins/semgrep-static-analysis/
  ouroboros.plugin.json
  README.md
  semgrep_static_analysis/
    __init__.py
    __main__.py
    cli.py
    runner.py
    normalize.py
    artifacts.py
    audit.py
  tests/
    fixtures/
      semgrep-output-empty.json
      semgrep-output-findings.json
    test_normalize.py
    test_manifest.py

The name is intentionally capability-oriented rather than marketplace-oriented. Alternative names are acceptable if they preserve the same boundary:

  • semgrep-static-analysis
  • semgrep-code-scan
  • semgrep-agentos

UX preservation contract

The plugin must preserve Semgrep UX as a hard constraint.

Preserve familiar Semgrep inputs

The plugin should accept a bounded subset of familiar Semgrep concepts first:

  • target path / scanning root,
  • --config with local rule files or directories,
  • optional registry config only when network permission is modeled,
  • include/exclude behavior where feasible,
  • JSON output preservation,
  • SARIF output preservation,
  • metrics control,
  • dry-run behavior for future autofix paths,
  • Semgrep exit-code semantics where they matter for CI.

Preserve familiar Semgrep outputs

The plugin should not replace Semgrep output with only an Ouroboros summary.

It should produce:

  • raw Semgrep JSON artifact,
  • optional raw Semgrep SARIF artifact,
  • normalized Ouroboros findings artifact,
  • human-readable Markdown summary,
  • audit/provenance records that point to artifact paths and hashes.

Preserve local-first behavior

The default invocation should be local and read-only:

semgrep scan --json --metrics=off --disable-version-check --config <local-config> <target>

Exact flags may vary by installed Semgrep version, but the intent must hold:

  • no source upload by default,
  • no registry fetch by default,
  • metrics disabled by default,
  • bounded repo-relative target paths,
  • bounded artifact writes only into an Ouroboros-controlled output directory.

AgentOS-native translation

The plugin must translate Semgrep into Ouroboros primitives rather than merely execute it.

Core capabilities

Expected manifest capabilities for the read-only scan path:

  • ledger:write — record invocation, policy inputs, scan summary, and decision-relevant facts.
  • provenance:write — record Semgrep version, config source, target paths, command shape, output artifact hashes, and environment facts.
  • handoff:attach — attach normalized findings and summary for downstream agents.
  • progress:write — report scan progress and completion.

Optional future capabilities:

  • state:write — persist scan state across resumable long-running scans or multi-stage triage.
  • seed:write — generate remediation Seeds from findings when this becomes a deliberate workflow.
  • runtime:execute — only if delegated agent execution becomes part of the plugin itself.

External permissions

Expected baseline permissions:

  • filesystem:read / read_only / required — read target source files and local rule files.
  • shell:execute / read_only / required — invoke the installed Semgrep CLI with bounded arguments.

Expected optional permissions:

  • network:read / read_only / optional — only for registry configs, remote configs, version checks, or other remote Semgrep flows.
  • filesystem:write / write / optional — only for future autofix or explicit artifact output outside the controlled handoff directory.

The plugin must keep capabilities and permissions distinct per #27 and docs/contract.md.

Command plan

v0 command: read-only scan

ooo semgrep scan <target-path> --config <local-config>

Responsibilities:

  • validate target path is repo-relative / bounded,
  • validate local config path is bounded,
  • invoke Semgrep with local-first safe defaults,
  • preserve raw JSON output,
  • optionally preserve SARIF output if requested,
  • normalize findings,
  • emit audit/provenance data,
  • attach handoff artifacts,
  • return a clear status code / summary.

Risk: read_only

Required external permissions:

  • filesystem:read
  • shell:execute

Required core capabilities:

  • ledger:write
  • provenance:write
  • handoff:attach
  • progress:write

v0.1 or v1 command: CI-style scan

ooo semgrep ci-scan <target-path> --config <config> --baseline <ref>

Responsibilities:

  • preserve Semgrep CI mental model,
  • optionally honor baseline / changed-findings semantics,
  • emit gate result suitable for ooo auto, PR review, or policy workflows,
  • keep raw Semgrep output available.

Risk: usually read_only; network config must be separately declared if used.

Future command: rule test

ooo semgrep rule-test <rules-path>

Responsibilities:

  • run Semgrep rule tests,
  • normalize pass/fail results,
  • attach rule-test evidence for plugin / policy development.

Risk: read_only

Future command: autofix dry run

ooo semgrep autofix-dryrun <target-path> --config <config>

Responsibilities:

  • run Semgrep autofix in dry-run / preview mode,
  • produce patch preview artifact,
  • do not modify files.

Risk: read_only

Future command: autofix apply

ooo semgrep autofix <target-path> --config <config>

Responsibilities:

  • apply deterministic Semgrep rule-defined fixes,
  • emit patch provenance,
  • attach before/after evidence,
  • require explicit confirmation and filesystem:write.

Risk: write

This should not ship until the read-only scan path proves the boundary.

Manifest draft

The v0 read-only manifest should fit the current 0.1 schema without expanding the manifest contract:

{
  "schema_version": "0.1",
  "name": "semgrep-static-analysis",
  "version": "0.1.0",
  "description": "Assimilates Semgrep local static-analysis scans into Ouroboros audit, provenance, and handoff artifacts while preserving Semgrep CLI UX.",
  "source": {
    "type": "local_path",
    "path": "plugins/semgrep-static-analysis"
  },
  "commands": [
    {
      "namespace": "semgrep",
      "name": "scan",
      "summary": "Run a bounded read-only Semgrep scan and attach normalized findings as Ouroboros evidence.",
      "usage": "ooo semgrep scan <target-path> --config <local-rule-or-pack>",
      "risk": "read_only",
      "requires_confirmation": false,
      "arguments": [
        {
          "name": "target_path",
          "type": "path",
          "required": true,
          "description": "Repo-relative file or directory to scan."
        },
        {
          "name": "config",
          "type": "string",
          "required": true,
          "description": "Local Semgrep config path or explicitly approved registry config."
        }
      ]
    }
  ],
  "capabilities": [
    {
      "name": "ledger",
      "access": "write",
      "reason": "Record scan invocation, policy inputs, and summary verdict."
    },
    {
      "name": "provenance",
      "access": "write",
      "reason": "Record Semgrep version, config source, target paths, and output hashes."
    },
    {
      "name": "handoff",
      "access": "attach",
      "reason": "Attach normalized findings for downstream review or automated remediation."
    },
    {
      "name": "progress",
      "access": "write",
      "reason": "Report scan progress and completion status."
    }
  ],
  "permissions": [
    {
      "scope": "filesystem:read",
      "risk": "read_only",
      "required": true,
      "reason": "Read target source files and local Semgrep rule files."
    },
    {
      "scope": "shell:execute",
      "risk": "read_only",
      "required": true,
      "reason": "Invoke the installed Semgrep CLI with bounded arguments."
    },
    {
      "scope": "network:read",
      "risk": "read_only",
      "required": false,
      "reason": "Only needed when using Semgrep registry or remote configs."
    }
  ],
  "entrypoint": {
    "type": "command",
    "command": "python -m semgrep_static_analysis"
  },
  "audit": {
    "events": [
      "plugin.invoked",
      "plugin.permission_used",
      "plugin.completed",
      "plugin.failed"
    ]
  }
}

Artifact contract

Each successful scan should attach a handoff bundle similar to:

.omx/artifacts/semgrep/<run-id>/
  semgrep.raw.json
  semgrep.raw.sarif          # optional
  semgrep.findings.json      # normalized Ouroboros finding model
  semgrep.summary.md         # human-readable summary
  semgrep.provenance.json    # bounded provenance fields and hashes

Suggested normalized finding shape:

{
  "schema_version": "0.1",
  "tool": "semgrep",
  "tool_version": "1.x",
  "rule_id": "python.lang.security.audit...",
  "severity": "ERROR",
  "message": "...",
  "path": "src/example.py",
  "start": { "line": 10, "col": 5 },
  "end": { "line": 10, "col": 25 },
  "metadata": {
    "cwe": "...",
    "owasp": "..."
  },
  "fix_available": false,
  "fingerprint": "stable-or-derived-id",
  "raw_result_ref": "semgrep.raw.json#/results/0"
}

The normalized model should preserve enough information for downstream agents to:

  • summarize risk,
  • decide whether a finding blocks a workflow,
  • generate remediation Seeds,
  • open follow-up tasks,
  • compare scan runs,
  • and attach evidence to PR / review workflows.

Audit and provenance requirements

The plugin should emit / prepare audit-compatible data for:

  • plugin.invoked
  • plugin.permission_used
  • plugin.completed
  • plugin.failed

Provenance should include bounded, redacted facts only:

  • Semgrep version,
  • plugin version,
  • command namespace/name,
  • target path(s),
  • config path or config identifier,
  • whether config was local or remote,
  • metrics mode,
  • network mode,
  • output artifact paths,
  • artifact hashes,
  • result counts by severity,
  • exit code,
  • run duration.

Provenance must not include:

  • raw source code,
  • access tokens,
  • unbounded Semgrep output blobs,
  • raw user prompts,
  • secret values found by scans.

Privacy and network behavior

The default path must be privacy-preserving:

  • prefer local config,
  • set metrics off by default,
  • disable version checks where feasible,
  • do not fetch registry rules unless the user explicitly chooses a registry / remote config path,
  • require network:read for registry or remote configuration.

If the user requests a Semgrep Registry config such as p/ci, auto, or a URL config, the plugin must surface that this is no longer a purely local invocation and requires the optional network permission path.

Dependency and license policy

Semgrep is LGPL-2.1. This repository should not vendor Semgrep source as part of the plugin.

Preferred dependency model:

  1. Require an installed semgrep executable and inspect it with semgrep --version.
  2. Document installation options but do not silently install Semgrep in v0.
  3. Optionally support a future setup helper that installs Semgrep only after explicit user action.
  4. Preserve Semgrep license notices in plugin README / docs.

This avoids turning the Ouroboros plugin into a Semgrep fork or derivative distribution problem.

Implementation phases

Phase 0 — Contract analysis and RFC alignment

Phase 1 — Read-only reference plugin skeleton

  • Add plugins/semgrep-static-analysis/ouroboros.plugin.json.
  • Add plugin README with product boundary, non-goals, privacy behavior, and dependency expectations.
  • Add Python entrypoint with scan command.
  • Validate manifest with scripts/validate_contract.py.
  • Add catalog entry if this repository is meant to host it as a reference plugin.

Phase 2 — Safe Semgrep runner

  • Detect installed Semgrep executable.
  • Capture semgrep --version.
  • Build bounded argv instead of shell string interpolation.
  • Enforce repo-relative / allowed target paths.
  • Enforce local config by default.
  • Add explicit branch for remote / registry configs requiring network:read.
  • Run with metrics disabled by default.
  • Preserve Semgrep exit code and stderr in bounded artifact form.

Phase 3 — Output normalization and artifacts

  • Save raw Semgrep JSON.
  • Optionally save raw SARIF.
  • Normalize findings into an Ouroboros-friendly JSON artifact.
  • Generate Markdown summary.
  • Hash artifacts for provenance.
  • Include result counts by severity / rule / path.

Phase 4 — Audit, provenance, and handoff

  • Emit or prepare audit event payloads matching schemas/0.1/audit-event.schema.json.
  • Record capabilities used.
  • Record permissions used.
  • Attach handoff bundle path / metadata.
  • Ensure failure modes are represented as blocked or failed, not silent success.

Phase 5 — Tests and validation

  • Unit-test Semgrep JSON normalization from fixtures.
  • Unit-test empty findings.
  • Unit-test malformed Semgrep output.
  • Unit-test path bounding.
  • Unit-test argv construction.
  • Unit-test manifest validation.
  • Run repository validator and test suite.

Phase 6 — Future UX parity expansion

  • Add CI-style scan command if v0 proves the boundary.
  • Add rule-test command.
  • Add autofix dry-run command.
  • Add autofix apply command only with filesystem:write, explicit confirmation, and patch provenance.
  • Consider Semgrep MCP integration only after the CLI path is stable.

Acceptance criteria

This epic is complete when:

  • A Semgrep plugin exists under plugins/<name>/ with a valid ouroboros.plugin.json.
  • The plugin preserves Semgrep CLI mental model for the initial read-only scan path.
  • The plugin does not vendor or reimplement Semgrep.
  • The default scan path is local-first, read-only, and metrics-off.
  • The manifest declares capabilities and permissions separately.
  • The command risk is read_only for scan.
  • Network behavior is optional and explicitly permissioned.
  • Raw Semgrep JSON is preserved as an artifact.
  • Normalized Ouroboros findings are produced.
  • A human-readable summary is produced.
  • Provenance records include Semgrep version, config source, target paths, artifact hashes, and result counts.
  • Audit-compatible event payloads are generated or prepared.
  • The scan result can be attached as a handoff artifact for downstream agents.
  • Failure and blocked states are explicit.
  • python3 scripts/validate_contract.py passes.
  • Relevant unit tests pass.
  • The README explains why this is an AgentOS capability assimilation plugin rather than a command wrapper.

Why this matters for AgentOS

Ouroboros becomes a true AgentOS when external tools do not merely run beside it, but become structured capabilities inside it.

Semgrep is an ideal proof case because it already has a strong developer experience and a strong CLI identity. The challenge is therefore not to redesign Semgrep. The challenge is to preserve Semgrep's UX while adding the operating-system layer that Semgrep alone does not own:

  • explicit authority,
  • durable state,
  • auditability,
  • provenance,
  • normalized artifacts,
  • policy-aware risk handling,
  • and downstream agent handoff.

If this succeeds, the same assimilation pattern can guide future static-analysis engines, test tools, security scanners, CI gates, and remediation loops.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions