Skip to content

ametel01/orgs-ai-harness

Repository files navigation

orgs-ai-harness

orgs-ai-harness is a CLI-first path toward a complete AI harness: a working agent runtime that lets an LLM act through tools, feedback, memory, context management, permissions, skills, delegation, and iteration. The canonical target architecture is defined in HARNESS_SPEC.md.

The current implemented CLI focuses on the skill-pack lifecycle inside that runtime: generating, validating, approving, caching, and exporting organization and repository agent skill packs. Those packs give the runtime durable operating knowledge while the rest of the runtime loop is built out.

Setup

Install uv, then sync the locked development environment:

uv sync --frozen

Run the CLI through the installed script:

uv run harness --help

The raw unittest command remains a fallback/debugging reference:

PYTHONPATH=src python3 -m unittest tests.test_org_pack_foundation

Core CLI Workflows

Initialize or attach an org skill pack:

uv run harness org init --name <org-name>
uv run harness org init --repo <path-or-git-url>

Register repositories and inspect coverage:

uv run harness repo add <path-or-url>
uv run harness repo discover <github-owner>
uv run harness repo list
uv run harness validate
uv run harness validate <repo-id>
uv run harness explain <repo-id>

Generate, review, and promote repository knowledge:

uv run harness onboard <repo-id>
uv run harness approve <repo-id> --all
uv run harness reject <repo-id> --reason "<reason>"
uv run harness eval <repo-id>
uv run harness eval <repo-id> --ci --summary-path .agent-harness/ci-eval/<repo-id>.json
uv run harness review changed-files --repo-id <repo-id> --files src/app.py --json-path .agent-harness/pr-review/<repo-id>.json --markdown-path .agent-harness/pr-review/<repo-id>.md
uv run harness release readiness --repo-id <repo-id> --version v1.2.3 --files CHANGELOG.md --json-path .agent-harness/release-readiness/<repo-id>.json --markdown-path .agent-harness/release-readiness/<repo-id>.md
uv run harness dependency campaign --name dependency-campaign --json-path .agent-harness/dependency-campaign/campaign.json --markdown-path .agent-harness/dependency-campaign/campaign.md
uv run harness cache refresh <repo-id>
uv run harness export codex <repo-id>

Start the current deterministic runtime slice:

uv run harness run "summarize this repo state"
uv run harness run "edit then validate" --permission workspace-write --adapter codex-local
uv run harness run --resume --session-id <session-id>

Create and review proposed updates after source changes:

uv run harness improve <repo-id>
uv run harness refresh <repo-id>
uv run harness proposals list
uv run harness proposals show <proposal-id>
uv run harness proposals apply <proposal-id> --yes

Artifact Lifecycle

Repository entries start as selected coverage in harness.yml. Onboarding scans source repositories, writes draft repo artifacts under org-agent-skills/repos/, and records scan evidence, unknowns, generated skills, eval fixtures, and pack reports.

Approval is explicit. harness approve <repo-id> --all writes approval.yml, protects approved artifact hashes, records an approval trace, and moves the repo to approved-unverified. harness eval <repo-id> can promote an approved pack to verified when replay checks pass, or keep it approved-unverified with warnings when human-approved guidance has not been fully verified. harness eval <repo-id> --ci is the workflow-safe replay path: it uses the deterministic fixture adapter, emits a stable JSON summary, writes eval report artifacts, and does not promote or rewrite approval lifecycle metadata.

harness review changed-files is the artifact-only PR/change review path. It accepts explicit repo-relative files with --files or --files-from, or a local git diff with --base <ref> --head <ref>, then writes deterministic JSON and Markdown when --json-path and --markdown-path are provided. Review artifacts include changed files, risk items, suggested local checks, suggested eval ids, matched generated skills/resolver context, missing coverage, and warnings. The GitHub Actions PR Review Artifacts job runs this command for eligible approved or verified local repo packs and uploads .agent-harness/pr-review/ as pr-review-artifacts. It is artifact-only: it does not post PR comments, request reviewers, mutate GitHub state, or block merges based on risk classification.

harness release readiness is the artifact-only release review path. It requires one registered, active, non-external repo with a local path. Optional release inputs include --version, explicit changed files with --files or --files-from, and a local git range with --base <ref> --head <ref>. When --json-path and --markdown-path are provided, it writes deterministic artifacts with schema version 1. Artifacts include release inputs, lifecycle status, local release evidence, missing evidence, risk items, suggested local checks, suggested eval ids, and warnings. Suggested checks and evals are not executed. The GitHub Actions Release Readiness Artifacts job is workflow_dispatch only, uploads .agent-harness/release-readiness/ as release-readiness-artifacts, and skips ineligible repo states with discovery.json plus SKIPPED.md. This first workflow does not tag, publish, deploy, create GitHub Releases, post comments, request reviewers, or block merges.

harness dependency campaign is the artifact-only cross-repo dependency campaign path. It requires a campaign --name, resolves active non-external local repos from the org pack, collects dependency manifests and lockfiles from local files, classifies conservative risk, and writes schema-versioned JSON and Markdown when --json-path and --markdown-path are provided. Suggested commands and eval ids are derived from known local evidence and are not executed. The Dependency Campaign Artifacts workflow runs only from workflow_dispatch, uploads .agent-harness/dependency-campaign/ as dependency-campaign-artifacts, and skips missing org packs, missing local paths, or no dependency manifest evidence with discovery.json plus SKIPPED.md. This first workflow does not edit manifests, run package-manager upgrades, open PRs, post comments, mutate approvals, publish, deploy, or block merges.

Approved or verified packs can be refreshed into a repo-local .agent-harness/cache/ directory and exported for a runtime target such as Codex. Draft and investigation states require explicit development flags before export.

Target Harness Architecture

The full harness architecture is broader than skill generation. The runtime must eventually own:

  • an outer act/observe/adjust loop
  • context management and compression
  • tool and skill registries
  • sub-agent delegation with scoped permissions
  • session persistence and recovery
  • system prompt assembly and project context injection
  • lifecycle hooks around tool execution
  • permission and safety enforcement

The current CLI implements the skill, validation, trace, cache, export, proposal, and safety-policy foundation that this runtime will use. It also includes a first runtime vertical slice: harness run <goal> starts a read-only session by default, assembles bounded workspace context, enforces tool permissions, asks either the default deterministic fixture adapter or the subprocess-backed codex-local adapter for tool-call or final-response decisions, writes adapter decisions, observations, tool results, errors, changed-file metadata, and final responses to an append-only session JSONL log under .agent-harness/sessions/, and can inspect/resume an existing session log. --permission workspace-write is an explicit opt-in for bounded local file writes and known validation commands. Destructive, network, deployment, unknown, and full-access requests are still denied and surfaced as diagnostics rather than approval prompts.

Runtime Progress

Progress is tracked against the core runtime roadmap in org-skill-harness-advanced-paths.md and the skill format contract in AGENTS_SKILLS_SPEC.md.

Area Implemented Deferred
Skill lifecycle Repo/org pack generation, validation, approval, eval replay, CI eval replay, cache, export, proposal flow Hosted dashboard, autonomous improvement
Agent Skills contract Generated SKILL.md frontmatter checks, directory-name matching, reference-link validation, bounded exported skill packs Full external spec refresh automation and richer optional metadata policy
Runtime loop Adapter-driven harness run <goal> sessions with read-only default mode, explicit --permission workspace-write opt-in, deterministic fixture/default adapter decisions, optional subprocess-backed codex-local decisions, context assembly, tool calls, observations, max-step/error safeguards, and final response events Approval prompts, broad autonomous operation, context compression
Runtime persistence Append-only session JSONL events for adapter decisions, observations, tool calls/results, errors, final responses, and recovery inspection Durable memory model, compaction checkpoints, write-session repair
Runtime tools Typed tool registry, structured results, read/list/search inspection tools, safe argv shell tool for known validation commands, and workspace-write file writes with changed-file audit metadata Broad shell/network/deployment tools, approval-backed risky dispatch, patch transactions, rollback
Safety and hooks Permission levels, command risk classification, pre-tool denial hooks, post-tool warnings, protected artifact write rejection Interactive approval model, policy plugins, sub-agent permission scopes

Deeper workflow and boundary notes live in:

Directory Map

  • src/orgs_ai_harness/: first-party harness source.
  • tests/: unittest-based regression tests, run through pytest by default.
  • org-agent-skills/: tracked harness-managed org pack and generated artifacts.
  • .agent-harness/: tracked repo-local cache/export artifacts for this harness repo, plus CI-generated eval, PR review, and release readiness artifacts.
  • .github/workflows/: CI gates for verification and security.
  • local-docs/: ignored local planning and alignment notes.
  • .venv/, .coverage*, .pytest_cache/, .ruff_cache/, *.egg-info/: local tool output excluded from normal source review.

Normal Python gates target src/ and tests/. Generated pack directories and local docs are excluded from Ruff and Pyright.

Quality Gates

The Makefile is a thin wrapper around canonical uv commands:

make sync       # uv sync --frozen
make format     # Ruff format
make lint       # Ruff format check and lint
make typecheck  # Pyright basic mode
make test       # pytest
make coverage   # pytest coverage with subprocess tracing, fail_under=81
make verify     # lint, typecheck, coverage
make security   # pip-audit, Bandit, and detect-secrets baseline check
make build      # uv build

CI runs make verify on Python 3.11, 3.12, and 3.13, then runs make security once after the verify matrix passes.

Pre-commit is optional contributor convenience, not the source of truth:

make pre-commit
uv run pre-commit install

The committed .secrets.baseline covers known generated-artifact findings so new secret-like values fail the security gate without rewriting the baseline.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors