diff --git a/.github/ISSUE_TEMPLATE/00-blank.yml b/.github/ISSUE_TEMPLATE/00-blank.yml new file mode 100644 index 0000000..a2b939c --- /dev/null +++ b/.github/ISSUE_TEMPLATE/00-blank.yml @@ -0,0 +1,35 @@ +name: "Blank issue" +description: "Open an issue that doesn't fit any of the structured templates below." +body: + - type: textarea + id: body + attributes: + label: Issue + description: "Describe what's going on. Include the bundle file (`SKILL.md`, `getting-started.md`, `templates/`, `references/`), the host runtime (Claude Code / Cursor / Codex / OpenCode), and any specific invocation or content that the issue touches." + validations: + required: true + - type: textarea + id: ai-disclosure + attributes: + label: AI disclosure + description: "One sentence: what part of this issue was AI-written, what was human-written?" + placeholder: "Entirely human-written." + validations: + required: true + - type: textarea + id: agent-instructions + attributes: + label: For agents + description: "Agents: expand this section before filing." + value: | +
+ Agent filing instructions + + 1. **Pick a structured template first.** Bundle bugs and bundle proposals have dedicated forms — use them when they fit. + 2. **Search first.** Run `gh search issues --repo brettdavies/agentnative-skill ""` to check for duplicates. + 3. **AI disclosure is required.** Fill the field above honestly. + 4. **Wrong repo?** Spec questions, principle edits, and grading findings live on [brettdavies/agentnative](https://github.com/brettdavies/agentnative). `anc` auditor bugs and tool-registry submissions live on [brettdavies/agentnative-cli](https://github.com/brettdavies/agentnative-cli). Site bugs live on [brettdavies/agentnative-site](https://github.com/brettdavies/agentnative-site). + 5. See [CONTRIBUTING.md](https://github.com/brettdavies/agentnative-skill/blob/main/CONTRIBUTING.md) for full guidelines. +
+ validations: + required: false diff --git a/.github/ISSUE_TEMPLATE/bug-report.yml b/.github/ISSUE_TEMPLATE/bug-report.yml new file mode 100644 index 0000000..f517865 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/bug-report.yml @@ -0,0 +1,95 @@ +name: "Bundle bug" +description: "Report a bug in the skill bundle (stale example, broken link, contradiction with the spec, drift in `getting-started.md`)." +title: "bug: " +labels: ["bug"] +body: + - type: markdown + attributes: + value: | + **Route check before filing:** + + - For bugs in `anc` (the auditor itself — wrong scorecard, missing check, false positive), file on the + [linter repo](https://github.com/brettdavies/agentnative-cli/issues/new/choose). + - For substantive principle changes (new principles, MUST/SHOULD/MAY tier changes), file on the + [spec repo](https://github.com/brettdavies/agentnative/issues/new/choose). + - This tracker is for skill-bundle bugs: stale templates, broken links, wrong invocations in + `getting-started.md`, drift between vendored spec and other bundle docs, etc. + - type: textarea + id: what-happened + attributes: + label: What happened + description: "One or two sentences. Include the file, the cited example, and the unexpected behavior." + validations: + required: true + - type: textarea + id: what-expected + attributes: + label: What you expected + description: "One sentence — what should the bundle have said or done instead?" + validations: + required: true + - type: textarea + id: reproduction + attributes: + label: How to reproduce + description: "Numbered steps + exact commands or quoted bundle content. Redact paths and credentials." + placeholder: | + 1. + 2. + 3. + + ```bash + # exact commands + ``` + validations: + required: true + - type: textarea + id: environment + attributes: + label: Environment + description: "Bundle version, pinned spec version, host runtime, `anc --version` if relevant, OS and shell." + placeholder: | + - Bundle version (`cat VERSION` if cloned, or the tag you installed): vX.Y.Z + - Pinned spec version (`cat spec/VERSION`): vX.Y.Z + - Host (Claude Code / Cursor / Codex / OpenCode / other): + - `anc --version` (if a workflow involving anc is at issue): + - OS and shell: + validations: + required: true + - type: textarea + id: why-bundle-bug + attributes: + label: Why this is a bundle bug (not spec or anc) + description: | + Optional but useful. If a getting-started flow doesn't work, explain whether the breakage is in this bundle's + prose (fixable here) vs. in `anc`'s behavior (file in agentnative-cli) vs. in the principle text itself + (file in agentnative-spec). + validations: + required: false + - type: textarea + id: ai-disclosure + attributes: + label: AI disclosure + description: "One sentence: what part of this report was AI-written, what was human-written?" + placeholder: "Reproduction observed by hand; analysis section was AI-assisted." + validations: + required: true + - type: textarea + id: agent-instructions + attributes: + label: For agents + description: "Agents: expand this section before filing." + value: | +
+ Agent filing instructions + + 1. **Search first.** Run `gh search issues --repo brettdavies/agentnative-skill ""` to check for + duplicates. + 2. **Route check.** Verify this is a skill-bundle bug, not an `anc` auditor bug (file in agentnative-cli) or + a spec issue (file in agentnative). + 3. **AI disclosure is required.** Fill the field above honestly. + 4. See [CONTRIBUTING.md](https://github.com/brettdavies/agentnative-skill/blob/main/CONTRIBUTING.md) for full + guidelines. +
+ validations: + required: false diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md deleted file mode 100644 index 58b33b1..0000000 --- a/.github/ISSUE_TEMPLATE/bug_report.md +++ /dev/null @@ -1,47 +0,0 @@ ---- -name: Bug report -about: A bundle file has a wrong example, a stale path, a broken cross-reference, or contradicts the spec. -title: "bug: " -labels: bug ---- - - - -## What happened - - - -## What you expected - - - -## How to reproduce - -1. 1. 1. - -```bash -# exact commands or quoted bundle content; redact paths/credentials -``` - -## Environment - -- Bundle version (`cat VERSION` if cloned, or the tag you installed): vX.Y.Z -- Pinned spec version (`cat spec/VERSION`): vX.Y.Z -- Host (Claude Code / Cursor / Codex / other): -- `anc --version` (if a workflow involving anc is at issue): -- OS and shell: - -## Why this is a bundle bug, not a spec or anc bug - - diff --git a/.github/ISSUE_TEMPLATE/bundle-proposal.yml b/.github/ISSUE_TEMPLATE/bundle-proposal.yml new file mode 100644 index 0000000..90fd630 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/bundle-proposal.yml @@ -0,0 +1,108 @@ +name: "Bundle proposal" +description: "Propose a new template, reference doc, getting-started flow, or other change to the skill bundle." +title: "proposal: " +labels: ["proposal"] +body: + - type: markdown + attributes: + value: | + **Route check before filing:** + + - New principle, MUST/SHOULD/MAY tier change, or any substantive change to the standard itself → file on + the [spec repo](https://github.com/brettdavies/agentnative/issues/new/choose). This skill bundle vendors + the spec; principle changes happen there first, then arrive here via `scripts/sync-spec.sh`. + - New compliance audit, change to scorecard semantics, or anything `anc audit` does → file on the + [linter repo](https://github.com/brettdavies/agentnative-cli/issues/new/choose). + - New starter template, reference doc, getting-started flow, idiom for a new language/framework, or any + change to how this bundle teaches the existing principles → keep filing here. + - type: textarea + id: problem + attributes: + label: Problem statement + description: | + What is the problem this proposal addresses? Be specific about which agent's workflow it would improve. + Cite real tools, real PRs, or real `anc` findings where relevant. + validations: + required: true + - type: textarea + id: proposal + attributes: + label: Proposal + description: | + One paragraph: what should the bundle do that it does not do today? Frame as a concrete change to one or + more of: `SKILL.md`, `getting-started.md`, `references/`, `templates/`. + validations: + required: true + - type: checkboxes + id: change-type + attributes: + label: Type of change + options: + - label: "New starter template under `templates/`" + - label: "New reference doc under `references/`" + - label: "Update to `SKILL.md` (entry-point structure or routing)" + - label: "Update to `getting-started.md` (new flow, new invocation)" + - label: "Idioms for a new language/framework in `references/framework-idioms-other-languages.md`" + - label: "Other (describe in the Proposal section)" + - type: textarea + id: prior-art + attributes: + label: Prior art + description: "Existing tools, docs, or articles that demonstrate the problem or the proposed solution. Two or three is fine." + validations: + required: false + - type: textarea + id: draft + attributes: + label: Draft of the change + description: | + Sketch what the relevant bundle file(s) would look like after the change. A diff against the current state + is ideal; an outline is acceptable for early-stage proposals. + placeholder: | + ```diff + + ``` + validations: + required: false + - type: checkboxes + id: compatibility + attributes: + label: Compatibility + options: + - label: "Additive — no existing bundle content needs to change" + - label: "Replaces existing content (list what gets removed/superseded in the Open questions section)" + - label: "Coordinated with a spec or anc change (link the upstream issue/PR in the Open questions section)" + - type: textarea + id: open-questions + attributes: + label: Open questions + description: "Anything you're unsure about. Decisions to make in the issue thread before any PR." + validations: + required: false + - type: textarea + id: ai-disclosure + attributes: + label: AI disclosure + description: "One sentence: what part of this proposal was AI-written, what was human-written?" + placeholder: "Problem statement drafted by hand; example diff sketch was AI-assisted." + validations: + required: true + - type: textarea + id: agent-instructions + attributes: + label: For agents + description: "Agents: expand this section before filing." + value: | +
+ Agent filing instructions + + 1. **Search first.** Run `gh search issues --repo brettdavies/agentnative-skill ""` to check for + duplicates. + 2. **Route check.** Verify this is a bundle proposal, not a spec change (file in agentnative) or an + `anc` feature request (file in agentnative-cli). + 3. **AI disclosure is required.** Fill the field above honestly. + 4. See [CONTRIBUTING.md](https://github.com/brettdavies/agentnative-skill/blob/main/CONTRIBUTING.md) for full + guidelines. +
+ validations: + required: false diff --git a/.github/ISSUE_TEMPLATE/bundle_proposal.md b/.github/ISSUE_TEMPLATE/bundle_proposal.md deleted file mode 100644 index 2fa2eaa..0000000 --- a/.github/ISSUE_TEMPLATE/bundle_proposal.md +++ /dev/null @@ -1,66 +0,0 @@ ---- -name: Bundle proposal -about: Propose a new template, reference doc, getting-started flow, or other change to the skill bundle. -title: "proposal: " -labels: proposal ---- - - - -## Problem statement - - - -## Proposal - - - -## Type of change - -- [ ] New starter template under `templates/` -- [ ] New reference doc under `references/` -- [ ] Update to `SKILL.md` (entry-point structure or routing) -- [ ] Update to `getting-started.md` (new flow, new invocation) -- [ ] Idioms for a new language/framework in `references/framework-idioms-other-languages.md` -- [ ] Other (describe) - -## Prior art - - - -- - - -## Draft of the change - - - -```diff - -``` - -## Compatibility - -- [ ] Additive — no existing bundle content needs to change -- [ ] Replaces existing content — list what gets removed/superseded -- [ ] Coordinated with a spec or anc change — link the upstream issue/PR - -## Open questions - - - -- diff --git a/.github/ISSUE_TEMPLATE/config.yml b/.github/ISSUE_TEMPLATE/config.yml new file mode 100644 index 0000000..6db7b9f --- /dev/null +++ b/.github/ISSUE_TEMPLATE/config.yml @@ -0,0 +1,11 @@ +blank_issues_enabled: false +contact_links: + - name: "Spec questions, principle edits, or grading findings" + url: "https://github.com/brettdavies/agentnative/issues/new/choose" + about: "For anything about the standard itself — propose changes, submit a grading finding, ask questions — file on the spec repo." + - name: "Auditor bugs, features, or tool-registry submissions" + url: "https://github.com/brettdavies/agentnative-cli/issues/new/choose" + about: "For bugs in the `anc` auditor, feature requests, or proposing a tool for the leaderboard, file on the linter repo." + - name: "Site bugs (rendering, performance, deployment)" + url: "https://github.com/brettdavies/agentnative-site/issues/new/choose" + about: "For bugs on anc.dev, file on the site repo." diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md index fdead33..4f42d08 100644 --- a/.github/pull_request_template.md +++ b/.github/pull_request_template.md @@ -1,6 +1,20 @@ ## Summary - + ## Changelog diff --git a/.github/workflows/guard-main-docs.yml b/.github/workflows/guard-main-docs.yml index eeb3b61..92c8872 100644 --- a/.github/workflows/guard-main-docs.yml +++ b/.github/workflows/guard-main-docs.yml @@ -15,3 +15,5 @@ permissions: jobs: guard-docs: uses: brettdavies/.github/.github/workflows/guard-main-docs.yml@main + with: + extra_paths: 'scripts/sync-prose-tooling.sh' diff --git a/AGENTS.md b/AGENTS.md index 3361655..43f087a 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,11 +1,11 @@ # AGENTS.md -Project-level agent instructions for `agentnative-skill` — the producer repo for the `agent-native-cli` skill. +Project-level agent instructions for `agentnative-skill`, the producer repo for the `agent-native-cli` skill. -This repo is **not** a Rust CLI tool and **not** a compliance checker. It is the agent-facing guide that pairs with -[`anc`](https://github.com/brettdavies/agentnative-cli) (the canonical checker) and +This repo is **not** a Rust CLI tool and **not** a compliance auditor. It is the agent-facing guide that pairs with +[`anc`](https://github.com/brettdavies/agentnative-cli) (the canonical auditor) and [`agentnative-spec`](https://github.com/brettdavies/agentnative) (the canonical principle text, vendored at `spec/`). -The skill teaches agents how to use `anc` and supplies the surrounding context — spec, idioms, templates — that `anc` +The skill teaches agents how to use `anc` and supplies the surrounding context (spec, idioms, templates) that `anc` findings reference. ## Layout @@ -14,10 +14,12 @@ The repo ships to consumers via plain `git clone`. After install, the host (Clau auto-discovers `SKILL.md` at the install root and ignores everything else. Producer-side files (`scripts/`, `docs/`, `.github/`, `cliff.toml`, etc.) clone alongside the skill content but are inert at runtime. + + | Path | Read at runtime by host? | Purpose | | ---------------------------------------------------------------------------------------- | ------------------------ | ---------------------------------------------------------------------------------------------------------------------------------- | | `SKILL.md` | ✓ | Skill metadata + entry-point pointer to `getting-started.md`. The host's first read. | -| `getting-started.md` | ✓ | Three working loops (existing CLI / new Rust / other language); canonical `anc check` invocations. | +| `getting-started.md` | ✓ | Three working loops (existing CLI / new Rust / other language); canonical `anc audit` invocations. | | `bin/check-update` | ✓ | Consumer-side update-check script. Compares local `VERSION` to GitHub `main`; emits `UPGRADE_AVAILABLE` for the SKILL.md preamble. | | `spec/` | ✓ | Vendored copy of `agentnative-spec`. Canonical principle text + machine-readable `requirements[]`. | | `references/` | ✓ | Implementation guidance: framework idioms (Rust + others), project structure, Rust/clap patterns. | @@ -32,6 +34,8 @@ auto-discovers `SKILL.md` at the install root and ignores everything else. Produ | `docs/plans/` | — | Engineering plans (`dev`-only — guarded out of `main`). | | `.markdownlint-cli2.yaml`, `.shellcheckrc`, `.gitattributes`, `.gitignore`, `cliff.toml` | — | Local lint configs, git-cliff config, and repo metadata. | + + ## Lint & Format ```bash @@ -43,6 +47,12 @@ actionlint .github/workflows/*.yml The repo ships a local `.markdownlint-cli2.yaml` (canonical 120-char line length) and `.shellcheckrc` so CI and local tooling agree. +## Voice and prose rules + +Channel-specific design context lives in [`PRODUCT.md`](PRODUCT.md). It inherits from [`BRAND.md`](BRAND.md), which is +vendored from `agentnative-spec` by a dev-only sync script (kept off `main` by the workflow guard). Read both before +authoring skill-bundle prose (`SKILL.md`, `getting-started.md`, `references/`, `templates/`). + ## Spec sync The canonical principle text lives in [`brettdavies/agentnative`](https://github.com/brettdavies/agentnative). This repo @@ -75,35 +85,32 @@ table. - Edit anything under `spec/` by hand. It is vendored from `agentnative-spec`. Any required change is a PR against the spec repo, then a `scripts/sync-spec.sh` bump here. - Reimplement `anc`. The skill does not contain shell-script duplicates of `anc`'s checks. If you find yourself writing - `rg`-based grep checks, you're rebuilding what `anc` already does — use `anc check --output json` instead. + `rg`-based grep checks, you're rebuilding what `anc` already does; use `anc audit --output json` instead. - Commit anything under `docs/plans/`, `docs/solutions/`, `docs/brainstorms/`, or `docs/reviews/` directly to a - `release/*` branch — those paths are filtered by the cherry-pick pattern. Add to `dev` instead. -- Modify `SKILL.md`'s `name` or `description` frontmatter without coordinating with consumers — those fields drive skill + `release/*` branch. Those paths are filtered by the cherry-pick pattern; add to `dev` instead. +- Modify `SKILL.md`'s `name` or `description` frontmatter without coordinating with consumers; those fields drive skill discovery on every host. - Re-tag a published version. Tags are immutable historical anchors for released versions. -- Add Rust/Cargo scaffolding. There is no Rust code in this repo and there should be none — the standard is +- Add Rust/Cargo scaffolding. There is no Rust code in this repo and there should be none; the standard is language-prescriptive but the skill itself is markdown + a tiny bash update-check. ## Common pitfalls - The skill's `templates/agents-md-template.md` is for downstream Rust CLI tools (`cargo build`, `cargo test`, etc.). This top-level `AGENTS.md` describes the producer repo and is intentionally different. -- `markdownlint-cli2` does NOT consult a global config — every repo needs its own `.markdownlint-cli2.yaml`. If line +- `markdownlint-cli2` does NOT consult a global config; every repo needs its own `.markdownlint-cli2.yaml`. If line wrapping looks wrong, the local copy has drifted from `~/.markdownlint-cli2.yaml`. - `spec/` is a vendored copy, not a symlink or submodule. Stale orphan files can appear if the upstream spec renames or removes a principle. `git status` after `scripts/sync-spec.sh` surfaces them; resolve by deletion. -- `CHANGELOG.md` is generated by `scripts/generate-changelog.sh` (git-cliff + PR-body extraction). Never hand-edit it — - fix the input (PR body's `## Changelog` section) and re-run. +- `CHANGELOG.md` is generated by `scripts/generate-changelog.sh` (git-cliff + PR-body extraction). Never hand-edit it. + Fix the input (PR body's `## Changelog` section) and re-run. ## References -- [`SKILL.md`](./SKILL.md) — skill entry point -- [`getting-started.md`](./getting-started.md) — agent's three working loops -- [`spec/README.md`](./spec/README.md) — vendored-spec resync procedure -- [`README.md`](./README.md) — what this repo is, repo layout, install pointer -- [`SECURITY.md`](./SECURITY.md) — vulnerability disclosure -- [`RELEASES.md`](./RELEASES.md) — release procedure -- [`CONTRIBUTING.md`](./CONTRIBUTING.md) — how to propose changes - - - +- [`SKILL.md`](./SKILL.md): skill entry point +- [`getting-started.md`](./getting-started.md): agent's three working loops +- [`spec/README.md`](./spec/README.md): vendored-spec resync procedure +- [`README.md`](./README.md): what this repo is, repo layout, install pointer +- [`SECURITY.md`](./SECURITY.md): vulnerability disclosure +- [`RELEASES.md`](./RELEASES.md): release procedure +- [`CONTRIBUTING.md`](./CONTRIBUTING.md): how to propose changes diff --git a/BRAND.md b/BRAND.md new file mode 100644 index 0000000..5844d8f --- /dev/null +++ b/BRAND.md @@ -0,0 +1,105 @@ +# BRAND.md: agentnative voice and identity + +Source of truth for the voice and identity of the agentnative standard. Shared across the spec, the website, the linter, +the skill bundle, and any future channel. Each channel inherits from this document and adds channel-specific register +and artifacts in its own `PRODUCT.md`. + +> **Source of truth: `agentnative-spec/BRAND.md`.** This file is vendored into each channel repo (`agentnative-site`, +> `agentnative-cli`, `agentnative-skill`) via `scripts/sync-prose-tooling.sh`. Edits in +> a consumer repo will be overwritten on the next sync. File issues and PRs against this repo. + +## Brand identity + +**Three words: opinionated, precise, inviting.** + +- **Opinionated.** The standard has a point of view. It does not enumerate tradeoffs and shrug; it states "MUST do X, + here is the failure mode if you don't, here is the canonical fix." The point of view is what makes the standard worth + citing. +- **Precise.** RFC 2119 language. Anchors stable and citable. Numbers measured, not asserted. Where a contract has a + canonical realization (a flag spelling, an exit code, a path), it is named explicitly. +- **Inviting.** The reader (or agent handler) keeps reading by design. That comes from details: typography that rewards + a slow read, prose that rewards a fast scan, code blocks that read like reference material a reader can trust. + Inviting is not "friendly" and it is not "marketing." It rewards engagement. + +## Voice anchor + +Concrete before abstract. Show then tell. No filler adjectives. The standard speaks as a standard, not a person ( +first-person singular is out), but every channel inherits the same sequence: state the contract, show the failure mode, +name the canonical fix. + +## Audiences + +Two first-class consumers across all channels: + +- **Humans** evaluating, adopting, implementing, or extending the standard. Spec-channel readers are technically deep + and arrive with skepticism; site-channel readers are time-pressured and decide in 60 seconds whether to take the + standard seriously; linter users invoke at the terminal. Each channel narrows further in its own `PRODUCT.md`. +- **AI agents** consuming the standard programmatically: markdown via `Accept: text/markdown`, requirement IDs via + frontmatter parsing, skill bundles via `SKILL.md`/`AGENTS.md` discovery, linter findings via JSON. Their UX is "do + anchors stay stable, do IDs survive reorganizations, does the channel render cleanly across versions." This is not a + nice-to-have. The agent audience is first-class. Decisions that improve a channel for humans at the cost of agent + legibility are regressions. + +## Universal anti-patterns + +These bans apply across every channel. The narrative below explains *why* each category is banned; the executable +contract for *what* is banned lives in [`styles/brand/README.md`](styles/brand/README.md), generated from the Vale rule +pack at `styles/brand/*.yml`. + +- **No marketing register.** First-person belief and recommendation framings are out. The standard speaks in the third + person about contracts, not in the first person about beliefs. +- **No hedge words.** Probabilistic softeners undercut MUST and SHOULD. The contract is the contract. +- **No filler adjectives.** Marketing modifiers do no work. Concrete before abstract; the noun carries the meaning. +- **No verbatim quotation from any single source.** Where multiple sources converge on a claim, the standard's wording + sounds like triangulation, not citation. Lineage belongs in the README's `Acknowledgements` section, not in the + contract. + +## Voice anchors: concrete examples + +The ✓ column shows the contract voice. The ✗ column names the category of failure rather than reproducing literal banned +phrases. Those live in [`styles/brand/README.md`](styles/brand/README.md). The category labels describe the shape of the +failure each ✓ phrasing replaces. + +| ✓ | ✗ | +| ------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------- | +| "CLI tools MUST run without human input." | First-person belief framing with lowercase RFC keyword and audience speculation. | +| "Authentication failed: token expired (`expires_at: 2026-03-25T00:00:00Z`). Run `tool auth refresh` or set `TOOL_TOKEN`." | Apologetic register with vague remediation and no actionable diagnostic. | +| "Numeric output is locale-independent: `.` decimal, no thousands grouping, regardless of `LC_NUMERIC`." | First-person recommendation with hedge word and unspecific "potential issues" framing. | + +## Channels + +The shared identity above applies to every channel. Each channel adds register and artifacts in its own `PRODUCT.md`: + +- **Spec** (`agentnative-spec/PRODUCT.md`): RFC 2119 register, third-person standards voice, present tense, no + first-person plural, no implementation leakage in MUSTs. +- **Site** (`agentnative-site/PRODUCT.md`): visual system (palette, typography, code-block treatment, OG image), + tech-stack decisions (SSG, Worker, content negotiation), JS budget, dark-mode design. +- **Skill bundle**: instructional voice, second-person imperative is allowed, agent-loadable. +- **Linter (`anc`)**: terse error messages, ≤80-column help text, four-part error rubric (offending value, constraint, + valid example, remediation). + +## Channel artifacts + +Each channel's repo carries its own narrow stack on top of this universal `BRAND.md`. The canonical layout: + +| Channel | `PRODUCT.md` location | Deep tier-3 | Vale rule pack | How `BRAND.md` arrives | +| ------------ | ----------------------------------------------- | ------------------------------------------------------ | -------------- | ------------------------------- | +| Spec | `agentnative-spec/PRODUCT.md` | `principles/`, `docs/architecture/`, `docs/decisions/` | `styles/spec/` | (origin — this repo) | +| Site | `agentnative-site/PRODUCT.md` | `DESIGN.md` (root) | (none yet) | `scripts/sync-prose-tooling.sh` | +| CLI (`anc`) | `agentnative-cli/PRODUCT.md` (when warranted) | `src/` (Rust source IS the artifact) | (planned) | `scripts/sync-prose-tooling.sh` | +| Skill bundle | `agentnative-skill/PRODUCT.md` (when warranted) | `bundle/` | (planned) | `scripts/sync-prose-tooling.sh` | + +A channel earns its `PRODUCT.md` when channel-specific decisions (visual system, error rubric, instructional voice, +etc.) accumulate enough that the universal `BRAND.md` cannot carry them. The spec and site channels have crossed that +threshold today. + +**Convention: deep tier-3 artifacts live at the repo root, not in `docs/`.** The site channel's `DESIGN.md` sits at +`agentnative-site/DESIGN.md` (not `docs/DESIGN.md`) so the `/impeccable` skill loader and human readers find it without +traversal. Future deep companions (e.g., a hypothetical `GOVERNANCE.md`) follow the same pattern. Only research +artifacts and historical plans live under `docs/`. + +## Sync + +This document is the source of truth. The site syncs it via `scripts/sync-spec.sh` alongside `principles/*.md`, +`VERSION`, and `CHANGELOG.md`. The skill bundle and linter sync similarly when they grow brand-aware artifacts. A PR +that changes `BRAND.md` flags whether channel sync is needed; channel repos pick up the change in a follow-on PR. diff --git a/CHANGELOG.md b/CHANGELOG.md index fe87cfe..1eeaf38 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,69 +2,280 @@ All notable changes to this project will be documented in this file. +## [0.5.0] - 2026-06-01 + +### Added + +- `references/update-check.md`: pulled-out operational detail for the consumer-side update-check script (prompt copy, + snooze ladder, state-dir layout). by @brettdavies in [#14](https://github.com/brettdavies/agentnative-skill/pull/14) +- New "The anc loop" section in `SKILL.md` documenting scorecard schema 0.5 fields (`coverage_summary.must.verified`, + `badge.eligible`, `badge.score_pct`, `badge.embed_markdown`), the 80% badge eligibility floor, and the four + `--audit-profile` categories (`human-tui`, `file-traversal`, `posix-utility`, `diagnostic-only`). +- `anc skill install ` documented in `getting-started.md` § "Installing anc and this skill bundle" with + `--dry-run`, `eval $(...)` capture, and `--output json` envelope. +- `docs/SYNCS.md`: cross-repo sync map covering inbound (`agentnative` spec → this repo via `scripts/sync-spec.sh`) and + outbound (this repo → consumer hosts; `agentnative-site` daily probe) edges, with manifest-vs-bundle ownership + diagrams. +- `--ref ` flag and matching `SPEC_REF` environment variable on `scripts/sync-spec.sh` for vendoring + `agentnative-spec` from an explicit branch, tag, or commit SHA. Default behavior (no `--ref`) still resolves the + latest `v*` tag. by @brettdavies in [#15](https://github.com/brettdavies/agentnative-skill/pull/15) +- `scripts/hooks/pre-push`: local CI mirror that runs markdownlint-cli2 and shellcheck against the same surfaces CI + checks, gating pushes before they reach GitHub. by @brettdavies in + [#16](https://github.com/brettdavies/agentnative-skill/pull/16) +- New skill-bundle channel-context layer: `PRODUCT.md` (channel design context), `BRAND.md` (universal voice, vendored + from `agentnative-spec`), and `scripts/sync-prose-tooling.sh` (vendoring vehicle, decoupled from + `scripts/sync-spec.sh`). by @brettdavies in [#17](https://github.com/brettdavies/agentnative-skill/pull/17) +- `RELEASES-RATIONALE.md` companion to `RELEASES.md` documents the rationale behind branching, PR conventions, CHANGELOG + generation, spec-vendor pipeline, and branch protection. +- GitHub issue forms: `bug-report.yml`, `bundle-proposal.yml`, `00-blank.yml`, and `config.yml`. +- `scripts/sync-dev-after-release.sh`: release-backport tool that overwrites `VERSION` with the released number and + copies `CHANGELOG.md` verbatim from `origin/main` as one signed commit on `dev`. Idempotent on re-run. by @brettdavies + in [#18](https://github.com/brettdavies/agentnative-skill/pull/18) +- P8 (Discoverable Through Agent Skill Bundles) principle, vendored from agentnative-spec v0.4.0. by @brettdavies in + [#19](https://github.com/brettdavies/agentnative-skill/pull/19) +- `principles/scoring.md` (leaderboard formula, badge eligibility floor, color bands) is now vendored into `spec/`; + `scripts/sync-spec.sh` fetches it alongside the principle files. +- Add `evals/` with three self-contained prompts covering greenfield Rust, remediate-existing-Rust, and multi-language + Python (Click) workflows. by @brettdavies in [#21](https://github.com/brettdavies/agentnative-skill/pull/21) +- Document `anc skill install --all` and `anc skill update [host|--all]` in the install section. +- Document `anc emit schema` for extracting the scorecard JSON Schema embedded in the binary. +- `scripts/generate-changelog.sh`: `--dry-run` flag prints a unified diff of what regeneration would change without + modifying `CHANGELOG.md`. Exits 0 when the file is idempotent vs current PR bodies, exits 1 on drift. by @brettdavies + in [#25](https://github.com/brettdavies/agentnative-skill/pull/25) +- `scripts/sync-dev-after-release.sh`: GitHub Release published-state precondition via `gh release view --json isDraft`. + Exits 67 when the release is missing or draft. +- `scripts/sync-dev-after-release.sh`: post-sync regen-idempotency check via `generate-changelog.sh --dry-run`. Warns + (does not fail) when PR bodies have drifted from main's `CHANGELOG.md`. + +### Changed + +- Vendored-spec prose reference in `SKILL.md` bumped `v0.2.0 → v0.3.0` to match `spec/VERSION`. by @brettdavies in + [#14](https://github.com/brettdavies/agentnative-skill/pull/14) +- `SKILL.md` description expanded with Rust/clap, scorecard, audit-profile, agent-native badge, and `anc skill install` + keywords plus a SKIP clause that routes TUI builders to `--audit-profile human-tui` instead of this skill. +- `SKILL.md` "Update check" block compressed from 35 lines (which buried the first-action intent) to a 6-line "First + action: update check" stub; details moved to `references/update-check.md`. +- `RELEASES.md` § "Releasing dev to main" step 4: single guarded-paths grep replaced with a triple-diff verification + block (A: main→release, B: release→dev, C: dev→main) plus a `git cherry HEAD origin/dev` patch-id check with + squash-merge triage guidance. Mirrors the same step that landed on `agentnative-cli` during v0.3.0 prep. +- `scripts/sync-spec.sh` now uses `gh api` (raw content endpoint) instead of `git clone` for the primary fetch path. All + ref types share one code path; the local-fallback path against `SPEC_ROOT` is preserved for offline runs. by + @brettdavies in [#15](https://github.com/brettdavies/agentnative-skill/pull/15) +- PR template, `RELEASES.md`, and `RELEASES-RATIONALE.md` codify the net-diff PR-body rule: Summary describes the + merged-state diff and excludes verification artifacts. by @brettdavies in + [#17](https://github.com/brettdavies/agentnative-skill/pull/17) +- `RELEASES.md` "Apply" section for branch-protection rulesets past-tensed (all three rulesets installed; apply commands + re-runnable). +- `CONTRIBUTING.md` widens the sibling-repo list to four, adds a Contribution Tiers table (Signal / Proposal / Code), + and points at the spec's AI-disclosure policy. +- `README.md` repo-layout block lists `BRAND.md`, `PRODUCT.md`, `RELEASES-RATIONALE.md`, and + `scripts/sync-prose-tooling.sh`; principle-range link covers `/p1` through `/p8`. +- `AGENTS.md` adds a "Voice and prose rules" pointer to `PRODUCT.md` and `BRAND.md`. +- `RELEASES.md` documents the post-publish backport step under "Releasing dev to main." by @brettdavies in + [#18](https://github.com/brettdavies/agentnative-skill/pull/18) +- `RELEASES-RATIONALE.md` documents the rationale for landing the backport as a direct-to-dev commit (rather than + through a PR) and the load-bearing consequences of skipping it. +- The canonical audit command is now `anc audit` (was `anc check`), matching the renamed `anc` subcommand. Skill docs, + the four-step loop, and all `anc`-compliance prose now read "audit" and "auditor". by @brettdavies in + [#19](https://github.com/brettdavies/agentnative-skill/pull/19) +- Bundled spec bumped 0.3.0 to 0.4.0; the skill now teaches eight principles. +- Re-vendor `spec/` to `agentnative-spec` v0.5.0. by @brettdavies in + [#21](https://github.com/brettdavies/agentnative-skill/pull/21) +- Track `anc` v0.5.0 scorecard surface: schema 0.7, per-row `id` / `audit_id` / `tier` fields, `opt_out` and `n_a` + statuses, 70% badge floor. +- Surface new top-level flags: `--examples`, `--json`, `--raw`, `--color`, `--verbose`. +- `.github/workflows/guard-main-docs.yml`: pass `extra_paths: 'scripts/sync-prose-tooling.sh'` to the reusable guard + workflow. Future PRs to `main` that add or modify the script fail the check. by @brettdavies in + [#24](https://github.com/brettdavies/agentnative-skill/pull/24) +- `RELEASES.md`: add a `### Dev-direct exception` subsection under `## Daily development` that names engineering docs + and the prose-tooling vendoring vehicle as the two categories that commit directly to `dev` without the feature-branch + + PR flow. +- `PRODUCT.md`: reframe the `BRAND.md` inheritance text to name a "dev-only sync script" rather than linking the in-tree + path twice. +- `AGENTS.md`: align the Voice-and-prose-rules section with the same framing. +- `README.md`: annotate the repo-layout entry for the script as `(dev-only; guarded off main)`. + +### Fixed + +- Strip leaked `` / `` XML trailers from `README.md`, `AGENTS.md`, and `CONTRIBUTING.md`. by + @brettdavies in [#20](https://github.com/brettdavies/agentnative-skill/pull/20) +- Correct the "no MUST violations" check: `coverage_summary.must.verified` counts any verdict (including `fail`), so the + right bar is no `results[]` row where `tier == "must" && status == "fail"`. by @brettdavies in + [#21](https://github.com/brettdavies/agentnative-skill/pull/21) +- Clarify that `badge.score_pct` is computed from behavioral-layer rows only. Source- and project-layer audits do not + affect the score. +- `scripts/generate-changelog.sh` no longer prepends a duplicate section when `CHANGELOG.md` already has one for the + current tag. Mirrors `agentnative-cli` PR #68. by @brettdavies in + [#25](https://github.com/brettdavies/agentnative-skill/pull/25) + +### Documentation + +- `docs/SYNCS.md` spec-row mechanism column updated to describe `--ref` / `SPEC_REF`, the cross-repo coordination + workflow, and the `gh api` resolution semantics. by @brettdavies in + [#15](https://github.com/brettdavies/agentnative-skill/pull/15) +- `spec/README.md` now links to the upstream spec landing page (leaderboard, badge convention, acknowledgements) and + documents `scoring.md` in the layout table. by @brettdavies in + [#19](https://github.com/brettdavies/agentnative-skill/pull/19) +- Tighten prose in `SKILL.md`, `AGENTS.md`, `PRODUCT.md`, and `SECURITY.md`. Term-definition bullets switch to colon + style; asides move into parens or commas; strong-contrast sentences split where it reads better. The Layout table in + `AGENTS.md` is wrapped in scoring-skip comment markers because its column indicator is data, not prose. by + @brettdavies in [#22](https://github.com/brettdavies/agentnative-skill/pull/22) + +### Removed + +- Legacy markdown issue templates (`bug_report.md`, `bundle_proposal.md`), replaced by YAML forms. by @brettdavies in + [#17](https://github.com/brettdavies/agentnative-skill/pull/17) + +**Full Changelog**: [v0.2.0...v0.5.0](https://github.com/brettdavies/agentnative-skill/compare/v0.2.0...v0.5.0) + +## [0.4.0] - skipped + +Version skipped. The skill bundle version is now pinned to the `anc` CLI version; the previous skill release was v0.2.0, +and the next is v0.5.0 to match `anc` v0.5.0. See [0.5.0] for the changes that span this range. + +## [0.3.0] - skipped + +Version skipped. The skill bundle version is now pinned to the `anc` CLI version; the previous skill release was v0.2.0, +and the next is v0.5.0 to match `anc` v0.5.0. See [0.5.0] for the changes that span this range. + ## [0.2.0] - 2026-04-29 ### Added -- Version-controlled GitHub repository rulesets for `main`, `dev`, and release tags (`v*`). Apply procedure documented in `.github/rulesets/README.md`. by @brettdavies in [#1](https://github.com/brettdavies/agentnative-skill/pull/1) -- `AGENTS.md` (root) describing the bundle layout, lint commands, branch model, and hard rules for agents working in this producer repo. by @brettdavies in [#2](https://github.com/brettdavies/agentnative-skill/pull/2) -- `RELEASES.md` (root) documenting a release procedure for this repo (later rewritten in #3 to the canonical full `release/*` pattern). +- Version-controlled GitHub repository rulesets for `main`, `dev`, and release tags (`v*`). Apply procedure documented + in `.github/rulesets/README.md`. by @brettdavies in [#1](https://github.com/brettdavies/agentnative-skill/pull/1) +- `AGENTS.md` (root) describing the bundle layout, lint commands, branch model, and hard rules for agents working in + this producer repo. by @brettdavies in [#2](https://github.com/brettdavies/agentnative-skill/pull/2) +- `RELEASES.md` (root) documenting a release procedure for this repo (later rewritten in #3 to the canonical full + `release/*` pattern). - `.github/pull_request_template.md` (canonical PR template). -- `.github/workflows/guard-main-docs.yml` caller for the `brettdavies/.github` reusable workflow that blocks `docs/plans/`, `docs/solutions/`, `docs/brainstorms/`, `docs/reviews/` from PRs targeting `main`. -- `cliff.toml` — git-cliff configuration mirroring sibling repos. by @brettdavies in [#3](https://github.com/brettdavies/agentnative-skill/pull/3) -- `scripts/generate-changelog.sh` — release-time CHANGELOG generator. Reads PR-body `## Changelog` sections and prepends a curated, attributed `[X.Y.Z]` section. Authoritative; never hand-edit `CHANGELOG.md`. -- `CONTRIBUTING.md` — how to propose changes, link to release procedure. -- `.github/ISSUE_TEMPLATE/bug_report.md` — bug report template. -- `.github/ISSUE_TEMPLATE/principle_proposal.md` — substantive standards-change template. -- `**Renamed:**` subsection in `.github/pull_request_template.md` (sync of the canonical update at `~/dotfiles/stow/github/dot-config/github/pull_request_template.md`). Sister sync PRs landing in agentnative-cli (#30 there) and agentnative-site (already on dev as commit 4437435). -- Add vendored `bundle/spec/` tree (agentnative-spec @ v0.2.0): `VERSION`, `CHANGELOG.md`, `README.md`, and seven `principles/p*.md` files with machine-readable `requirements[]` frontmatter — canonical principle text the skill now points at instead of paraphrasing. by @brettdavies in [#4](https://github.com/brettdavies/agentnative-skill/pull/4) -- Add `bundle/getting-started.md` covering three working agent loops (existing CLI / new Rust / other language), canonical `anc check --output json` invocations, and a "where things live" map. +- `.github/workflows/guard-main-docs.yml` caller for the `brettdavies/.github` reusable workflow that blocks + `docs/plans/`, `docs/solutions/`, `docs/brainstorms/`, `docs/reviews/` from PRs targeting `main`. +- `cliff.toml`: git-cliff configuration mirroring sibling repos. by @brettdavies in + [#3](https://github.com/brettdavies/agentnative-skill/pull/3) +- `scripts/generate-changelog.sh`: release-time CHANGELOG generator. Reads PR-body `## Changelog` sections and prepends + a curated, attributed `[X.Y.Z]` section. Authoritative; never hand-edit `CHANGELOG.md`. +- `CONTRIBUTING.md`: how to propose changes, link to release procedure. +- `.github/ISSUE_TEMPLATE/bug_report.md`: bug report template. +- `.github/ISSUE_TEMPLATE/principle_proposal.md`: substantive standards-change template. +- `**Renamed:**` subsection in `.github/pull_request_template.md` (sync of the canonical update at + `~/dotfiles/stow/github/dot-config/github/pull_request_template.md`). Sister sync PRs landing in agentnative-cli (#30 + there) and agentnative-site (already on dev as commit 4437435). +- Add vendored `bundle/spec/` tree (agentnative-spec @ v0.2.0): `VERSION`, `CHANGELOG.md`, `README.md`, and seven + `principles/p*.md` files with machine-readable `requirements[]` frontmatter. This is the canonical principle text the + skill now points at instead of paraphrasing. by @brettdavies in + [#4](https://github.com/brettdavies/agentnative-skill/pull/4) +- Add `bundle/getting-started.md` covering three working agent loops (existing CLI / new Rust / other language), + canonical `anc check --output json` invocations, and a "where things live" map. - Add `scripts/sync-spec.sh` so the bundle can re-vendor agentnative-spec on demand. -- `LICENSE-APACHE` — Apache 2.0 boilerplate, identical to the file in `agentnative-cli`. by @brettdavies in [#6](https://github.com/brettdavies/agentnative-skill/pull/6) -- `bundle/bin/check-update` — script that compares the consumer's local `VERSION` against the producer repo's `main` and emits `UPGRADE_AVAILABLE ` (or empty when up-to-date / snoozed / disabled). Adapts the gstack pattern with cache TTL (60min UP_TO_DATE / 720min UPGRADE_AVAILABLE) and a 3-level snooze (24h / 48h / 7d). State directory: `$HOME/.cache/agent-native-cli/`. by @brettdavies in [#8](https://github.com/brettdavies/agentnative-skill/pull/8) -- `bundle/SKILL.md` `## Update check` section — first non-frontmatter section after the intro. Documents how to invoke the script and inlines the AskUserQuestion-driven upgrade flow with three options ("Yes, upgrade now" / "Not now" / "Never ask again"). +- `LICENSE-APACHE`: Apache 2.0 boilerplate, identical to the file in `agentnative-cli`. by @brettdavies in + [#6](https://github.com/brettdavies/agentnative-skill/pull/6) +- `bundle/bin/check-update`: script that compares the consumer's local `VERSION` against the producer repo's `main` and + emits `UPGRADE_AVAILABLE ` (or empty when up-to-date / snoozed / disabled). Adapts the gstack pattern + with cache TTL (60min UP_TO_DATE / 720min UPGRADE_AVAILABLE) and a 3-level snooze (24h / 48h / 7d). State directory: + `$HOME/.cache/agent-native-cli/`. by @brettdavies in [#8](https://github.com/brettdavies/agentnative-skill/pull/8) +- `bundle/SKILL.md` `## Update check` section, the first non-frontmatter section after the intro. Documents how to + invoke the script and inlines the AskUserQuestion-driven upgrade flow with three options ("Yes, upgrade now" / "Not + now" / "Never ask again"). ### Changed -- `.gitignore` adds `!AGENTS.md` to override the global `**/AGENTS.md` ignore for this repo only. Other repos remain unaffected. by @brettdavies in [#2](https://github.com/brettdavies/agentnative-skill/pull/2) -- **Breaking (install layout):** Skill bundle moved into `bundle/` subdirectory. Installers must fetch `bundle/` rather than the entire repo. Consumer's installed skill directory shape is unchanged (`SKILL.md` at the root). by @brettdavies in [#3](https://github.com/brettdavies/agentnative-skill/pull/3) -- Adopted the full `release/*` cherry-pick release pattern (was lightweight `dev → main`). Plans on `dev` no longer conflict with release PRs because release branches cherry-pick only non-docs commits. +- `.gitignore` adds `!AGENTS.md` to override the global `**/AGENTS.md` ignore for this repo only. Other repos remain + unaffected. by @brettdavies in [#2](https://github.com/brettdavies/agentnative-skill/pull/2) +- **Breaking (install layout):** Skill bundle moved into `bundle/` subdirectory. Installers must fetch `bundle/` rather + than the entire repo. Consumer's installed skill directory shape is unchanged (`SKILL.md` at the root). by + @brettdavies in [#3](https://github.com/brettdavies/agentnative-skill/pull/3) +- Adopted the full `release/*` cherry-pick release pattern (was lightweight `dev → main`). Plans on `dev` no longer + conflict with release PRs because release branches cherry-pick only non-docs commits. - `RELEASES.md` rewritten to the canonical pattern; broken `../../.claude/...` link removed. -- **Breaking (install layout):** Skill bundle no longer ships `bundle/scripts/` or `bundle/checklists/`. Installers and consumers should fetch only the surviving directories: `SKILL.md`, `getting-started.md`, `spec/`, `references/`, `templates/`. The consumer's installed skill-directory shape (`SKILL.md` at the root) is unchanged. by @brettdavies in [#4](https://github.com/brettdavies/agentnative-skill/pull/4) -- Rewrite `bundle/SKILL.md` to drop inline principle prose, link `bundle/getting-started.md` and `bundle/spec/principles/` for progressive disclosure, and frame the spec / `anc` / skill three-artifact ecosystem. -- Reframe `RELEASES.md` SemVer guidance around the bundle's actual surface (markdown + templates + vendored spec) rather than deleted shell-script exit codes; document the spec-bump-vs-skill-version distinction. -- License changed from MIT-only to dual MIT or Apache-2.0 (consumer's choice). The skill bundle, top-level scripts, and all repo content are now dual-licensed; no MIT compatibility regression. by @brettdavies in [#6](https://github.com/brettdavies/agentnative-skill/pull/6) -- Documentation now points at `https://anc.dev/skill` instead of `https://anc.dev/install` for skill installation instructions, the cross-repo re-pin process, and the `bundle/` consumer description. by @brettdavies in [#7](https://github.com/brettdavies/agentnative-skill/pull/7) -- `bundle/SKILL.md`, `bundle/getting-started.md`, `bundle/spec/README.md` — drop "pinned ref" / "pinned upstream tag" / "pinned SPEC_VERSION" framing in favor of "vendored snapshot, refreshed each release". The bundle's behavior is unchanged; the language was misleading because the install command never actually pinned at the consumer side. by @brettdavies in [#8](https://github.com/brettdavies/agentnative-skill/pull/8) -- **BREAKING (install layout):** Skill content moved out of `bundle/` to the repo root. After install, hosts find `SKILL.md` at the skill root (where Claude Code expects it), not at `/bundle/SKILL.md`. Plain `git clone --depth 1` and `git pull --ff-only` are now the load-bearing install + update commands; no sparse-checkout magic, no post-install scripts. by @brettdavies in [#9](https://github.com/brettdavies/agentnative-skill/pull/9) +- **Breaking (install layout):** Skill bundle no longer ships `bundle/scripts/` or `bundle/checklists/`. Installers and + consumers should fetch only the surviving directories: `SKILL.md`, `getting-started.md`, `spec/`, `references/`, + `templates/`. The consumer's installed skill-directory shape (`SKILL.md` at the root) is unchanged. by @brettdavies in + [#4](https://github.com/brettdavies/agentnative-skill/pull/4) +- Rewrite `bundle/SKILL.md` to drop inline principle prose, link `bundle/getting-started.md` and + `bundle/spec/principles/` for progressive disclosure, and frame the spec / `anc` / skill three-artifact ecosystem. +- Reframe `RELEASES.md` SemVer guidance around the bundle's actual surface (markdown + templates + vendored spec) rather + than deleted shell-script exit codes; document the spec-bump-vs-skill-version distinction. +- License changed from MIT-only to dual MIT or Apache-2.0 (consumer's choice). The skill bundle, top-level scripts, and + all repo content are now dual-licensed; no MIT compatibility regression. by @brettdavies in + [#6](https://github.com/brettdavies/agentnative-skill/pull/6) +- Documentation now points at `https://anc.dev/skill` instead of `https://anc.dev/install` for skill installation + instructions, the cross-repo re-pin process, and the `bundle/` consumer description. by @brettdavies in + [#7](https://github.com/brettdavies/agentnative-skill/pull/7) +- `bundle/SKILL.md`, `bundle/getting-started.md`, `bundle/spec/README.md`: drop "pinned ref" / "pinned upstream tag" / + "pinned SPEC_VERSION" framing in favor of "vendored snapshot, refreshed each release". The bundle's behavior is + unchanged; the language was misleading because the install command never actually pinned at the consumer side. by + @brettdavies in [#8](https://github.com/brettdavies/agentnative-skill/pull/8) +- **BREAKING (install layout):** Skill content moved out of `bundle/` to the repo root. After install, hosts find + `SKILL.md` at the skill root (where Claude Code expects it), not at `/bundle/SKILL.md`. Plain `git clone + --depth 1` and `git pull --ff-only` are now the load-bearing install + update commands; no sparse-checkout magic, no + post-install scripts. by @brettdavies in [#9](https://github.com/brettdavies/agentnative-skill/pull/9) - `bin/check-update`: `SKILL_DIR` is now one dir up from the script (was two), since there's no `bundle/` layer. - `scripts/sync-spec.sh` writes to `spec/` (was `bundle/spec/`). -- README, AGENTS, CONTRIBUTING reframe the consumer/producer split from a directory boundary (`bundle/` vs everything else) to an audience boundary (host reads `SKILL.md` + `bin/` + `spec/` + `references/` + `templates/` + `VERSION`; ignores everything else). -- Spec content vendored under `spec/` re-vendored from `agentnative-spec` v0.2.0 to v0.3.0. All 7 principles flip `status: draft` → `status: active` (P1–P7 are now the shipped baseline); prose tightened across P1 (TUI parenthetical), P2 (sysexits acknowledgment), P4 (dependency-gating cleanup), P5 (`--dry-run` write-gate + retry hedge), P6 (SIGPIPE language-neutral + global-flags behavioral lead), P7 (LLM-vs-non-LLM cost generalization). No requirement IDs added/removed/renamed; no level changes. Full upstream context: agentnative `v0.3.0` CHANGELOG. by @brettdavies in [#10](https://github.com/brettdavies/agentnative-skill/pull/10) -- `scripts/sync-spec.sh` no longer accepts `SPEC_REF`. The script always vendors the latest `v*` tag, queried from `SPEC_REMOTE_URL` (default `https://github.com/brettdavies/agentnative.git`) via `git ls-remote --tags --sort=-version:refname` and shallow-cloned for extraction. On any remote failure, falls back to the existing `SPEC_ROOT`-based logic (default `$HOME/dev/agentnative-spec`). New env var `SPEC_REMOTE_URL` overrides the remote; the temp clone is auto-cleaned on script exit via trap. by @brettdavies in [#11](https://github.com/brettdavies/agentnative-skill/pull/11) -- `.markdownlint-cli2.yaml` excludes `CHANGELOG.md` from linting. Aligns its treatment with `spec/CHANGELOG.md` and reflects that the file is regenerated by `scripts/generate-changelog.sh`, not hand-edited. Per-line content is governed by PR-body bullets in source PRs, not by this repo's MD013 line-length rule. by @brettdavies in [#13](https://github.com/brettdavies/agentnative-skill/pull/13) +- README, AGENTS, CONTRIBUTING reframe the consumer/producer split from a directory boundary (`bundle/` vs everything + else) to an audience boundary (host reads `SKILL.md` + `bin/` + `spec/` + `references/` + `templates/` + `VERSION`; + ignores everything else). +- Spec content vendored under `spec/` re-vendored from `agentnative-spec` v0.2.0 to v0.3.0. All 7 principles flip + `status: draft` → `status: active` (P1–P7 are now the shipped baseline); prose tightened across P1 (TUI + parenthetical), P2 (sysexits acknowledgment), P4 (dependency-gating cleanup), P5 (`--dry-run` write-gate + retry + hedge), P6 (SIGPIPE language-neutral + global-flags behavioral lead), P7 (LLM-vs-non-LLM cost generalization). No + requirement IDs added/removed/renamed; no level changes. Full upstream context: agentnative `v0.3.0` CHANGELOG. by + @brettdavies in [#10](https://github.com/brettdavies/agentnative-skill/pull/10) +- `scripts/sync-spec.sh` no longer accepts `SPEC_REF`. The script always vendors the latest `v*` tag, queried from + `SPEC_REMOTE_URL` (default `https://github.com/brettdavies/agentnative.git`) via `git ls-remote --tags + --sort=-version:refname` and shallow-cloned for extraction. On any remote failure, falls back to the existing + `SPEC_ROOT`-based logic (default `$HOME/dev/agentnative-spec`). New env var `SPEC_REMOTE_URL` overrides the remote; + the temp clone is auto-cleaned on script exit via trap. by @brettdavies in + [#11](https://github.com/brettdavies/agentnative-skill/pull/11) +- `.markdownlint-cli2.yaml` excludes `CHANGELOG.md` from linting. Aligns its treatment with `spec/CHANGELOG.md` and + reflects that the file is regenerated by `scripts/generate-changelog.sh`, not hand-edited. Per-line content is + governed by PR-body bullets in source PRs, not by this repo's MD013 line-length rule. by @brettdavies in + [#13](https://github.com/brettdavies/agentnative-skill/pull/13) ### Fixed -- Harden `bundle/bin/check-update` against malformed local `VERSION` (apply SemVer regex; malformed → silent exit) and against curl failure being cached as UP_TO_DATE (skip cache write on network failure so the next invocation retries). by @brettdavies in [#8](https://github.com/brettdavies/agentnative-skill/pull/8) -- Align table pipes in `SKILL.md` and `getting-started.md` after the `bundle/` path strip (markdownlint MD060). MD060 isn't auto-fixable, so violations slipped past the local PostToolUse hook and surfaced in CI. by @brettdavies in [#9](https://github.com/brettdavies/agentnative-skill/pull/9) +- Harden `bundle/bin/check-update` against malformed local `VERSION` (apply SemVer regex; malformed → silent exit) and + against curl failure being cached as UP_TO_DATE (skip cache write on network failure so the next invocation retries). + by @brettdavies in [#8](https://github.com/brettdavies/agentnative-skill/pull/8) +- Align table pipes in `SKILL.md` and `getting-started.md` after the `bundle/` path strip (markdownlint MD060). MD060 + isn't auto-fixable, so violations slipped past the local PostToolUse hook and surfaced in CI. by @brettdavies in + [#9](https://github.com/brettdavies/agentnative-skill/pull/9) ### Documentation -- `README.md` — License section rewritten to reflect dual licensing and link both LICENSE files; tree row updated. by @brettdavies in [#6](https://github.com/brettdavies/agentnative-skill/pull/6) -- `CONTRIBUTING.md` — License section rewritten: contributions are dual-licensed at the consumer's option, no CLA, with an explicit pointer to the Apache §3 patent grant. -- `bundle/spec/README.md` licensing reference catches drift from PR #6: was "MIT-licensed", now reflects the actual dual MIT/Apache-2.0 posture introduced in `18836d8`. by @brettdavies in [#8](https://github.com/brettdavies/agentnative-skill/pull/8) -- `RELEASES.md` gains a `## Spec re-vendoring` section between `## Why branch from main, not dev` and `## Version bump procedure`, documenting the `scripts/sync-spec.sh` re-vendor step. The script auto-resolves the latest upstream tag from the remote, so no manual version selection is needed at re-vendor time. by @brettdavies in [#10](https://github.com/brettdavies/agentnative-skill/pull/10) -- `AGENTS.md` `## Spec sync` section: rewritten — single-step recipe (`scripts/sync-spec.sh` then review); notes the remote-first / local-fallback behavior and the `SPEC_REMOTE_URL` / `SPEC_ROOT` overrides. Commit-message example uses `` placeholder instead of a hard-coded version. by @brettdavies in [#11](https://github.com/brettdavies/agentnative-skill/pull/11) -- `spec/README.md` `## Resync` section: rewritten similarly; drops the manually-maintained `**Current snapshot:**` line and points readers at `spec/VERSION` (which `sync-spec.sh` writes verbatim from upstream). -- `RELEASES.md` post-merge sequence ends at the GitHub Release; replaces deleted step 5 with a one-liner pointing consumers at `bin/check-update`. +- `README.md`: License section rewritten to reflect dual licensing and link both LICENSE files; tree row updated. by + @brettdavies in [#6](https://github.com/brettdavies/agentnative-skill/pull/6) +- `CONTRIBUTING.md`: License section rewritten. Contributions are dual-licensed at the consumer's option, no CLA, with + an explicit pointer to the Apache §3 patent grant. +- `bundle/spec/README.md` licensing reference catches drift from PR #6: was "MIT-licensed", now reflects the actual dual + MIT/Apache-2.0 posture introduced in `18836d8`. by @brettdavies in + [#8](https://github.com/brettdavies/agentnative-skill/pull/8) +- `RELEASES.md` gains a `## Spec re-vendoring` section between `## Why branch from main, not dev` and `## Version bump + procedure`, documenting the `scripts/sync-spec.sh` re-vendor step. The script auto-resolves the latest upstream tag + from the remote, so no manual version selection is needed at re-vendor time. by @brettdavies in + [#10](https://github.com/brettdavies/agentnative-skill/pull/10) +- `AGENTS.md` `## Spec sync` section: rewritten as a single-step recipe (`scripts/sync-spec.sh` then review). Notes the + remote-first / local-fallback behavior and the `SPEC_REMOTE_URL` / `SPEC_ROOT` overrides. Commit-message example uses + `` placeholder instead of a hard-coded version. by @brettdavies in + [#11](https://github.com/brettdavies/agentnative-skill/pull/11) +- `spec/README.md` `## Resync` section: rewritten similarly; drops the manually-maintained `**Current snapshot:**` line + and points readers at `spec/VERSION` (which `sync-spec.sh` writes verbatim from upstream). +- `RELEASES.md` post-merge sequence ends at the GitHub Release; replaces deleted step 5 with a one-liner pointing + consumers at `bin/check-update`. ### Removed -- Remove `bundle/scripts/check-compliance.sh` and 24 `bundle/scripts/checks/check-*.sh` files (plus `_helpers.sh`). `anc check --output json` is the canonical replacement. by @brettdavies in [#4](https://github.com/brettdavies/agentnative-skill/pull/4) -- Remove `bundle/references/principles-deep-dive.md` (419-line hand-typed paraphrase of the spec; canonical text now lives at `bundle/spec/principles/`). +- Remove `bundle/scripts/check-compliance.sh` and 24 `bundle/scripts/checks/check-*.sh` files (plus `_helpers.sh`). `anc + check --output json` is the canonical replacement. by @brettdavies in + [#4](https://github.com/brettdavies/agentnative-skill/pull/4) +- Remove `bundle/references/principles-deep-dive.md` (419-line hand-typed paraphrase of the spec; canonical text now + lives at `bundle/spec/principles/`). - Remove `bundle/checklists/new-tool.md` (pre-anc manual checklist; replaced by `bundle/getting-started.md`). -- All SHA-pin claims from public-facing markdown (`RELEASES.md`, `AGENTS.md`, `README.md`, `spec/README.md`, `CONTRIBUTING.md`): pipeline diagram's "site re-pins to commit SHA" step, the post-merge "site re-pins via its own PR" step, the `protect-tags.json` / `install endpoints` claims that tags are pinned to install endpoints, and the spec-vendor "pinned ref" / "pinned `SPEC_REF`" / "current pin is recorded" vocabulary across all docs. by @brettdavies in [#11](https://github.com/brettdavies/agentnative-skill/pull/11) +- All SHA-pin claims from public-facing markdown (`RELEASES.md`, `AGENTS.md`, `README.md`, `spec/README.md`, + `CONTRIBUTING.md`): pipeline diagram's "site re-pins to commit SHA" step, the post-merge "site re-pins via its own PR" + step, the `protect-tags.json` / `install endpoints` claims that tags are pinned to install endpoints, and the + spec-vendor "pinned ref" / "pinned `SPEC_REF`" / "current pin is recorded" vocabulary across all docs. by @brettdavies + in [#11](https://github.com/brettdavies/agentnative-skill/pull/11) **Full Changelog**: [v0.1.0...v0.2.0](https://github.com/brettdavies/agentnative-skill/compare/v0.1.0...v0.2.0) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 4bb642e..f0e0a9f 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,23 +1,47 @@ # Contributing to `agentnative-skill` -Thanks for your interest. This repo is the agent-facing skill that pairs with two siblings: +Thanks for your interest. This repo is the agent-facing skill that pairs with three siblings: -- [`agentnative`](https://github.com/brettdavies/agentnative) (the spec) — canonical principle text. Vendored here at +- [`agentnative`](https://github.com/brettdavies/agentnative) (the spec): canonical principle text. Vendored here at `spec/`. -- [`agentnative-cli`](https://github.com/brettdavies/agentnative-cli) (`anc`) — the canonical compliance checker. +- [`agentnative-cli`](https://github.com/brettdavies/agentnative-cli) (`anc`): the canonical compliance auditor. +- [`agentnative-site`](https://github.com/brettdavies/agentnative-site) (`anc.dev`): the public site, leaderboard + renderer, live-scoring loop, and skill-distribution endpoint. This skill does **not** define principles (the spec does) and does **not** check compliance (`anc` does). It teaches agents how to use them and supplies surrounding context (idioms, templates, getting-started). Route contributions -accordingly. +accordingly. For cross-repo visitor-facing navigation, see [`anc.dev/contribute`](https://anc.dev/contribute). + +## Contribution tiers + +The skill bundle accepts three shapes of contribution. All three are welcome; none is required. Skill work skews toward +Tier 3 because most improvements are concrete bundle changes; Tier 2 proposals matter when the change spans bundle +structure or host-runtime support. + +| Tier | Shape | Intake | Effort | +| --------------- | ----------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | +| **1. Signal** | Bundle content issue, install path bug, host-runtime detection failure, instructional content critique | [`bug-report`](https://github.com/brettdavies/agentnative-skill/issues/new?template=bug-report.yml) / [`bundle-proposal`](https://github.com/brettdavies/agentnative-skill/issues/new?template=bundle-proposal.yml) | ~5 min | +| **2. Proposal** | A new host runtime to support in `anc skill install`, bundle reorganization, instructional content rework | Issue with the design before opening a PR | ~1-2 hrs | +| **3. Code** | Bundle content improvements, host-runtime additions, `bin/check-update` work, `SKILL.md` / `getting-started.md` prose, template additions | PR against `dev` (branch model below) | Variable | + +For principle-level discussion (the spec's MUST/SHOULD/MAY tiers, including P8 on discoverability), file a +`pressure-test` issue in the +[spec repo](https://github.com/brettdavies/agentnative/issues/new?template=pressure-test.yml). For scoring engine or +`anc audit` work, file in the [CLI repo](https://github.com/brettdavies/agentnative-cli). Those discussions don't belong +here. + +**Response expectations:** Tier 1 and Tier 2 are welcome and get a substantive reply when time allows. Tier 3 PRs are +reviewed when scope and time permit. Real PRs land; no merge-window promise. ## Where to file what | You want to… | Where | | --------------------------------------------------------------- | ----------------------------------------------------------------------------------- | | Propose a new principle, change MUST/SHOULD/MAY tiers, etc. | [`brettdavies/agentnative`](https://github.com/brettdavies/agentnative) (the spec) | -| Report an `anc check` bug, or propose a new checker feature | [`brettdavies/agentnative-cli`](https://github.com/brettdavies/agentnative-cli) | -| Improve a starter template, add a language idiom, fix the guide | This repo — issue + PR. Templates: **Bug report** or **Bundle proposal**. | -| Bump the vendored spec to a newer tag | This repo — run `scripts/sync-spec.sh` and PR the diff. | +| Report an `anc audit` bug, or propose a new auditor feature | [`brettdavies/agentnative-cli`](https://github.com/brettdavies/agentnative-cli) | +| Report a site bug (rendering, deployment, anc.dev surface) | [`brettdavies/agentnative-site`](https://github.com/brettdavies/agentnative-site) | +| Improve a starter template, add a language idiom, fix the guide | This repo. Issue + PR. Templates: **Bug report** or **Bundle proposal**. | +| Bump the vendored spec to a newer tag | This repo. Run `scripts/sync-spec.sh` and PR the diff. | | Operate as an agent in this repo | Read [`AGENTS.md`](./AGENTS.md) first (lint commands, hard rules, common pitfalls). | ## Branch model (TL;DR) @@ -35,9 +59,9 @@ procedure in [`RELEASES.md`](./RELEASES.md). ## Pull requests -- **Title format**: [Conventional Commits](https://www.conventionalcommits.org/) — `type(scope): description`. +- **Title format**: [Conventional Commits](https://www.conventionalcommits.org/) (`type(scope): description`). - **Body**: follow [`.github/pull_request_template.md`](.github/pull_request_template.md). The `## Changelog` section is - the source of truth for `CHANGELOG.md` entries — write for users, not implementers. Never hand-edit `CHANGELOG.md`; + the source of truth for `CHANGELOG.md` entries. Write for users, not implementers. Never hand-edit `CHANGELOG.md`; it's generated by `scripts/generate-changelog.sh` from PR bodies at release time. - **Scope**: keep PRs small and single-purpose where possible. - **Tests**: there is no test runner in this repo, but every PR must pass `markdownlint`, `shellcheck`, and (when @@ -65,33 +89,45 @@ auto-discovers `SKILL.md` at the install root and ignores everything else. Produ - **`spec/`** is vendored. Do not edit by hand. Substantive principle changes happen in `brettdavies/agentnative`; bring them here by re-running `scripts/sync-spec.sh` after a new upstream tag lands. - **`SKILL.md`** is the host-discovered entry point. Changes to its `name` or `description` frontmatter affect skill - discovery on every host — coordinate before changing. + discovery on every host. Coordinate before changing. - **`getting-started.md`** is the agent's first read after `SKILL.md`. Keep it short and concrete; cite spec paths and `anc` invocations rather than restating the principles. - **`bin/check-update`** is the consumer-side update-check script. It compares local `VERSION` to GitHub `main` and - emits `UPGRADE_AVAILABLE` for the SKILL.md preamble. Treat as load-bearing — agents rely on it to detect staleness. + emits `UPGRADE_AVAILABLE` for the SKILL.md preamble. Treat as load-bearing. Agents rely on it to detect staleness. - **`references/`** holds implementation guidance (Rust/clap patterns, framework idioms, project structure). When `anc - --fix` lands upstream, these may shrink — they exist today because the agent has to apply remediations by hand. + --fix` lands upstream, these may shrink. They exist today because the agent has to apply remediations by hand. - **`templates/`** are starter files. They encode principles by construction. Changes here should be informed by `agentnative-cli`'s reference patterns to avoid drift; the cross-repo alignment story is documented in the spec repo's `AGENTS.md`. +## AI disclosure + +Inherits from the spec's AI disclosure policy. See +[agentnative/CONTRIBUTING.md § AI disclosure policy](https://github.com/brettdavies/agentnative/blob/main/CONTRIBUTING.md#ai-disclosure-policy). + ## Security -See [`SECURITY.md`](./SECURITY.md). Do not file security issues in the public tracker — use the GitHub private security +See [`SECURITY.md`](./SECURITY.md). Do not file security issues in the public tracker. Use the GitHub private security advisories channel. ## License By contributing, you agree your contributions are dual-licensed under the same terms as the rest of this repository: -- MIT — see [`LICENSE-MIT`](./LICENSE-MIT) -- Apache License, Version 2.0 — see [`LICENSE-APACHE`](./LICENSE-APACHE) +- MIT: see [`LICENSE-MIT`](./LICENSE-MIT) +- Apache License, Version 2.0: see [`LICENSE-APACHE`](./LICENSE-APACHE) at the consumer's option. No CLA. The Apache-2.0 side carries the standard contributor patent grant under §3 of the license. Vendored content under `spec/` is CC BY 4.0 (upstream); contributions to that directory should happen upstream in [`agentnative-spec`](https://github.com/brettdavies/agentnative). - - + +## Cross-repo navigation + +The full visitor-facing menu lives at [`anc.dev/contribute`](https://anc.dev/contribute). Per-repo intakes: + +- [Spec](https://github.com/brettdavies/agentnative): principle text, pressure-tests, versioning policy +- [Linter](https://github.com/brettdavies/agentnative-cli): `anc`, the scoring engine, the registry +- [Site](https://github.com/brettdavies/agentnative-site): anc.dev source, leaderboard renderer, live-scoring +- This repo: the agent-facing skill bundle, install paths, host-runtime detection diff --git a/PRODUCT.md b/PRODUCT.md new file mode 100644 index 0000000..c2ffc9c --- /dev/null +++ b/PRODUCT.md @@ -0,0 +1,83 @@ +# PRODUCT.md: skill bundle channel design context + +Channel-specific design context for the **skill bundle channel** of agentnative. Inherits the shared identity, voice +anchor, audiences, and universal anti-patterns from [`BRAND.md`](BRAND.md), vendored from +[`agentnative-spec`](https://github.com/brettdavies/agentnative) via a dev-only vendoring script (kept off `main` by the +workflow guard). Read `BRAND.md` first. + +## Inheritance + +The skill bundle channel sits in a three-tier waterfall. Each tier owns a different concern; nothing duplicates. + +1. **Universal layer: [`BRAND.md`](BRAND.md).** Shared identity, voice anchor, audiences, universal anti-patterns. + Vendored from `agentnative-spec` by a dev-only sync script (kept off `main` by the workflow guard; parallel to + [`scripts/sync-spec.sh`](scripts/sync-spec.sh), which vendors `spec/principles/` and does ship to `main`). Read + `BRAND.md` first. +2. **Channel delta: this file (`PRODUCT.md`).** Instructional voice, second-person imperative allowed (the bundle + teaches agents how to invoke `anc`), terse and agent-loadable. The narrative companion to the shipped bundle content + at the repo root. +3. **Bundle layer: repo root.** The shipped skill: [`SKILL.md`](SKILL.md), [`getting-started.md`](getting-started.md), + [`references/`](references/), [`templates/`](templates/), [`bin/`](bin/), and the vendored [`spec/`](spec/). What an + agent loads into runtime via `~/.claude/skills/agent-native-cli/`, `~/.cursor/skills/agent-native-cli/`, and + equivalent paths under Codex / OpenCode / Factory / Kiro. Built from the same commit as `PRODUCT.md`. + +## Channel: skill bundle + +The skill bundle teaches agents how to use `anc` and the spec. The spec describes the contract; the linter verifies it; +the skill bundle teaches the workflow that connects the two. + +## Audience (narrowed) + +- **AI agents** loading the skill via Claude Code's skill discovery (frontmatter triggers), Cursor's skill registry, + Codex's `~/.codex/skills/`, OpenCode's `.opencode/skills/`, and equivalent runtimes. The agent reads `SKILL.md` first, + drills into [`getting-started.md`](./getting-started.md) for one of three loops (existing CLI / new Rust CLI / other + language), and consults `references/` only when remediating a specific `anc` finding. +- **Humans** running `anc skill install ` for the first time, or maintaining the bundle. They expect commands to + be runnable as written, paths to resolve, and the bundle to track what `anc` produces. + +## Register + +- **Instructional voice.** Second-person imperative is the default. "Run `anc audit . --output json` to score the repo." + Not "the skill recommends running `anc audit`." +- **Every action is a runnable command.** Code blocks contain commands the reader can paste. Pseudo-commands and + prose-only "how to think about it" are out. +- **Reference-shaped tables.** Where the artifact is a list of options, files, or trade-offs, render it as a table with + stable column headers. The three-artifact table at the top of `SKILL.md` and the useful-flags paragraph in + `getting-started.md` are the patterns. +- **Triggers and keywords are first-class.** The skill's `description` frontmatter determines which agents discover the + skill. Keep it dense in the canonical vocabulary (`agentic CLI`, `agent-native`, `anc audit`, `audit-profile`, …). The + opposite of marketing prose; closer to a search-index line. +- **Cross-link to the spec, the linter, and the templates.** A skill that paraphrases what the spec says drifts. Link + instead, and quote the requirement ID, not the prose. + +## Skill-specific anti-patterns + +These extend the universal bans in `BRAND.md`: + +- **No "in this guide we will…" preamble.** The agent skips it. Open with the action. +- **No narrative scaffolding.** "First, we'll cover X. Then, Y. Finally, Z." Out. The TOC and section headings carry the + structure; prose that recapitulates them is noise. +- **No "consider doing X" hedges.** State the action. "Run `anc audit`." Not "you might want to consider running `anc + check`." +- **No paraphrased spec content.** When the contract matters, link to `spec/principles/p-*.md` and quote the + requirement ID (`p1-must-no-interactive`), not the prose. Spec drift is silent and expensive when paraphrased. +- **No screenshots, no images.** The skill is consumed by agents; images are tokens-for-no-signal. +- **No version numbers in prose.** "Currently `v0.3.0`" goes stale. Reference [`VERSION`](./VERSION) and + [`spec/VERSION`](./spec/VERSION) directly, or link to the canonical source. Frontmatter aside. + +## Voice anchor application + +The pattern in `BRAND.md`, specialized for the skill channel: + +- **`SKILL.md`** opens with the three-artifact table (spec / linter / skill), the first action (update check), then the + launch link to `getting-started.md`. No preamble. +- **`getting-started.md`** opens with the three loops, each as a runnable command sequence with inline annotation. +- **`references/*.md`** are exhaustive references for one technical area each (Rust/clap patterns, framework idioms, + project structure, update check). Each section stands alone; readers arrive via deep links from `anc` findings. +- **`templates/*`** are the smallest functional starting points, not "examples." A copied template runs. + +## Status + +This file is the skill-channel `PRODUCT.md`. The spec channel's equivalent lives at +[`agentnative-spec/PRODUCT.md`](https://github.com/brettdavies/agentnative/blob/main/PRODUCT.md); the site channel's at +`agentnative-site/PRODUCT.md`. Cross-channel content is in `BRAND.md`, synced from the spec repo. diff --git a/README.md b/README.md index fa72072..e404444 100644 --- a/README.md +++ b/README.md @@ -1,14 +1,15 @@ # agentnative-skill -The producer repo for the [`agent-native-cli`](./SKILL.md) skill — an agent-facing guide to designing, building, and +The producer repo for the [`agent-native-cli`](./SKILL.md) skill: an agent-facing guide to designing, building, and auditing CLI tools for use by AI agents. -This skill is the third artifact in a three-repo ecosystem: +This skill is the fourth artifact in a four-repo ecosystem: | Repo | Role | | --------------------------------------------------------------------------- | -------------------------------------------------------------- | -| [`agentnative`](https://github.com/brettdavies/agentnative) (the spec) | Canonical text of the seven principles. CC BY 4.0. | -| [`agentnative-cli`](https://github.com/brettdavies/agentnative-cli) (`anc`) | The compliance checker. MIT / Apache-2.0. | +| [`agentnative`](https://github.com/brettdavies/agentnative) (the spec) | Canonical text of the eight principles. CC BY 4.0. | +| [`agentnative-cli`](https://github.com/brettdavies/agentnative-cli) (`anc`) | The compliance auditor. MIT / Apache-2.0. | +| [`agentnative-site`](https://github.com/brettdavies/agentnative-site) | `anc.dev`: leaderboard, scorecards, live scoring. | | **This repo** (`agentnative-skill`) | The agent-facing guide. Vendors the spec; teaches `anc` usage. | ## Repository layout @@ -24,13 +25,20 @@ agentnative-skill/ ├── templates/ drop-in starter files (clap-main, error-types, output-format, agents-md-template) ├── VERSION single-line current version (read by bin/check-update) ├── scripts/ -│ ├── sync-spec.sh vendor the latest agentnative-spec v* tag into spec/ -│ └── generate-changelog.sh release-time CHANGELOG generator (git-cliff + PR-body extraction) +│ ├── sync-spec.sh vendor the latest agentnative-spec v* tag into spec/ +│ ├── sync-prose-tooling.sh vendor BRAND.md from agentnative-spec main HEAD (dev-only; guarded off main) +│ ├── sync-dev-after-release.sh post-release backport: replay release/* artifacts onto dev +│ ├── generate-changelog.sh release-time CHANGELOG generator (git-cliff + PR-body extraction) +│ └── hooks/pre-push local CI mirror (markdownlint + shellcheck), installed via core.hooksPath +├── evals/ self-contained eval prompts for fresh-agent dispatch (producer-side; not loaded by hosts) ├── docs/plans/ engineering plans (dev-only — guarded out of main) ├── .github/ workflows, rulesets, issue templates, PR template ├── AGENTS.md project-level agent instructions FOR THIS REPO (producer-side) +├── BRAND.md universal voice and identity (vendored from agentnative-spec) +├── PRODUCT.md skill-bundle channel design context (inherits from BRAND.md) ├── CONTRIBUTING.md how to propose changes ├── RELEASES.md release procedure (cherry-pick from dev → release/* → main) +├── RELEASES-RATIONALE.md rationale companion to RELEASES.md (the WHY) ├── SECURITY.md vulnerability disclosure ├── CHANGELOG.md released versions (generated, never hand-edited) ├── cliff.toml git-cliff configuration @@ -40,34 +48,34 @@ agentnative-skill/ ``` Consumer-facing files (`SKILL.md`, `getting-started.md`, `bin/`, `spec/`, `references/`, `templates/`, `VERSION`, -`LICENSE-*`) are read by the agent at runtime. Producer-side files (`scripts/`, `docs/`, `.github/`, `AGENTS.md`, -`CONTRIBUTING.md`, `RELEASES.md`, `cliff.toml`) ship to consumers via `git clone` but are inert at runtime — the host -discovers `SKILL.md` and ignores everything else. +`LICENSE-*`) are read by the agent at runtime. Producer-side files (`scripts/`, `evals/`, `docs/`, `.github/`, +`AGENTS.md`, `BRAND.md`, `PRODUCT.md`, `CONTRIBUTING.md`, `RELEASES.md`, `RELEASES-RATIONALE.md`, `cliff.toml`) ship to +consumers via `git clone` but are inert at runtime. The host discovers `SKILL.md` and ignores everything else. ## Install See [anc.dev/skill](https://anc.dev/skill) for the supported hosts (Claude Code, Cursor, Codex, etc.) and the exact -install commands. The install model is plain `git clone --depth 1` into the host's skills directory — for example +install commands. The install model is plain `git clone --depth 1` into the host's skills directory: for example `~/.claude/skills/agent-native-cli/`. The host auto-discovers `SKILL.md` at the install root; `SKILL.md` then points the agent at `getting-started.md` for progressive disclosure. Updates are `git pull --ff-only` from inside the install dir, prompted by `bin/check-update`. ## Skill contents -- [`SKILL.md`](./SKILL.md) — skill metadata + entry-point pointer. -- [`getting-started.md`](./getting-started.md) — three working loops (existing CLI / new Rust / other language); - canonical `anc check` invocations; "where things live" map. -- [`bin/check-update`](./bin/check-update) — periodic version check. Compares local `VERSION` to GitHub `main`, emits +- [`SKILL.md`](./SKILL.md): skill metadata + entry-point pointer. +- [`getting-started.md`](./getting-started.md): three working loops (existing CLI / new Rust / other language); + canonical `anc audit` invocations; "where things live" map. +- [`bin/check-update`](./bin/check-update): periodic version check. Compares local `VERSION` to GitHub `main`, emits `UPGRADE_AVAILABLE` so the agent can offer to `git pull`. -- [`spec/`](./spec/) — vendored canonical principle text from +- [`spec/`](./spec/): vendored canonical principle text from [`agentnative-spec`](https://github.com/brettdavies/agentnative). See [`spec/README.md`](./spec/README.md) for the resync procedure. **Do not edit by hand.** -- [`references/`](./references/) — implementation guidance: framework idioms (Rust + others), project structure, +- [`references/`](./references/): implementation guidance: framework idioms (Rust + others), project structure, Rust/clap patterns. Used when remediating `anc` findings. -- [`templates/`](./templates/) — drop-in starting points for greenfield Rust CLIs (`clap-main.rs`, `error-types.rs`, +- [`templates/`](./templates/): drop-in starting points for greenfield Rust CLIs (`clap-main.rs`, `error-types.rs`, `output-format.rs`, `agents-md-template.md`). -The principles are also published as a stable web reference at [anc.dev/p1](https://anc.dev/p1) through `/p7`. +The principles are also published as a stable web reference at [anc.dev/p1](https://anc.dev/p1) through `/p8`. ## Versioning @@ -79,14 +87,14 @@ The skill's own version is independent of the spec it vendors. The currently-ven ## Contributing -Issues and PRs welcome — see [`CONTRIBUTING.md`](./CONTRIBUTING.md). Routing: +Issues and PRs welcome. See [`CONTRIBUTING.md`](./CONTRIBUTING.md). Routing: - **Spec questions or principle proposals** → file in [`brettdavies/agentnative`](https://github.com/brettdavies/agentnative) (the spec repo). This skill vendors the spec; substantive principle changes happen there first. - **`anc` bugs or feature requests** → file in [`brettdavies/agentnative-cli`](https://github.com/brettdavies/agentnative-cli). The skill teaches `anc` usage but - doesn't implement the checker. + doesn't implement the auditor. - **Skill issues** (templates, references, getting-started, layout) → file here. Branch + release model documented in [`RELEASES.md`](./RELEASES.md). @@ -99,8 +107,8 @@ See [`SECURITY.md`](./SECURITY.md) for vulnerability disclosure. Dual-licensed under either of: -- MIT — see [`LICENSE-MIT`](./LICENSE-MIT) -- Apache License, Version 2.0 — see [`LICENSE-APACHE`](./LICENSE-APACHE) +- MIT: see [`LICENSE-MIT`](./LICENSE-MIT) +- Apache License, Version 2.0: see [`LICENSE-APACHE`](./LICENSE-APACHE) at your option. Matches the licensing on `agentnative-cli` so producers can adapt the skill's content into their own tooling without re-licensing friction. @@ -108,5 +116,3 @@ tooling without re-licensing friction. Vendored spec content under `spec/` is CC BY 4.0 (upstream from [`brettdavies/agentnative`](https://github.com/brettdavies/agentnative)); attribution is in [`spec/README.md`](./spec/README.md). - - diff --git a/RELEASES-RATIONALE.md b/RELEASES-RATIONALE.md new file mode 100644 index 0000000..6bee33c --- /dev/null +++ b/RELEASES-RATIONALE.md @@ -0,0 +1,228 @@ +# Releases rationale + +Companion to [`RELEASES.md`](./RELEASES.md). RELEASES.md is the runbook (commands, paths, decision tables). This file +holds the WHY behind those rules: branching model, PR conventions, CHANGELOG generation, spec-vendor pipeline, +flat-bundle convention, `bin/check-update` semantics, cross-host install considerations, branch-protection pitfalls. + +Read this when: + +- A rule in RELEASES.md doesn't make sense and you're tempted to change it. +- A new contributor asks "why do we do X this way". +- You're adding a new release-flow rule and need to know where it fits the existing model. + +## Branching model + +### Forever `dev`, ephemeral release branches + +`dev` is never deleted, even after a release. The next release cycle reuses the same `dev`. The repo's +`delete_branch_on_merge: true` setting doesn't touch `dev` because `dev` is never the head of a PR. Using a short-lived +`release/*` head is what keeps the setting compatible with a forever integration branch. + +Engineering docs (`docs/plans/`, `docs/solutions/`, `docs/brainstorms/`, `docs/reviews/`) live on `dev` only. They never +reach `main`. `guard-main-docs.yml` blocks any `added` or `modified` engineering-doc files in PRs targeting `main`. The +release-branch cherry-pick handles this cleanly: docs commits stay on `dev`, only feature/fix/chore commits go onto +`release/*`. + +### Why cherry-pick from `main`, not branch from `dev` + +Branching from `dev` and then `gio trash`-ing the guarded paths seems simpler but produces `add/add` merge conflicts +whenever `dev` and `main` have diverged. The file appears as "added" on both sides with different content. Always branch +from `origin/main` and cherry-pick onto it. + +### Why `main` is the default branch + release pointer + +Consumers install the bundle via `git clone --depth 1` (see `SKILL.md` install commands), which lands on the default +branch. `main` is therefore the published-release pointer; `dev` is the integration branch. Each release requires the +skill maintainer to fast-forward `main` to the new tag (the `release/* → main` PR squash-merge does this). + +## PR body conventions + +### No explainer prose in the body + +Every section of a PR body is user-facing substance only: what is changing for the consumer that was not already there — +the **net diff**, not the commit history or intermediate state that produced it. Workflow mechanics (cherry-pick, +regenerate, pre-push gate, CI behavior) is documented in RELEASES.md and `.github/`, NOT in the PR body. Triple-diff +output ("A: 12 files, B: none, C: clean"), leak-check narration ("`guard-main-docs` runs clean", "no guarded paths +leaked"), patch-id cherry-check counts, pre-push gate results, CI check status, exclusion rationale, and other +verification artifacts stay local; anomalies get fixed before push, not audit-trailed in the body. + +The PR body is read by humans reviewing what shipped. Workflow mechanics and tool-fix provenance are noise from that +perspective; they belong in this file, the script outputs, and the commit history respectively. + +## Triple-diff verification + +The release-PR procedure runs three diffs (A: main→release, B: release→dev for non-doc paths, C: dev→main) plus a +patch-id cherry check. This is belt-and-suspenders because missed cherry-picks have shipped to `main` on this and +sibling repos before, and the file-level diff in B alone doesn't catch the patch-id false-negative class. + +### Why patch-id cherry-check output is noisy + +In a squash-merge workflow, `git cherry HEAD origin/dev` produces many `+` lines that need human triage. They do NOT +auto-block the release. Expected sources of false positives: + +1. **Historical commits squash-merged in prior releases.** The squash commit on main has a different patch-id than the + dev commits it consolidates, so old commits show as `+` forever. Anything older than the previous release tag is + almost always this. +2. **Cherry-picks where conflict resolution stripped guarded paths** (`docs/plans/`, `docs/brainstorms/`, etc.) or + otherwise altered the tree. Same source-code intent, different patch-id. +3. **Intentionally skipped commits** (docs-only commits, release-prep backports, revert-and-redo prep steps). + +A real miss looks like: a recent feat/fix/chore commit on dev whose *file content* is not yet on main. To triage a `+` +line: + +```bash +git show --stat # what did it touch? +git diff origin/main..HEAD -- # already on release? +``` + +If every touched file is guarded (`docs/plans/`, `docs/brainstorms/`, etc.) OR the content is already on main via a +prior squash, it's a false positive (no action). Otherwise cherry-pick the commit and re-run the triple-diff. + +## CHANGELOG generation + +### Generated, never hand-written + +`scripts/generate-changelog.sh` (with `cliff.toml`) is the only sanctioned way to update `CHANGELOG.md`. The script: + +- Runs `git-cliff` to prepend a versioned entry for commits since the last tag. +- Walks each squash-merged PR's body, extracts the `## Changelog → ### Added / Changed / Fixed / Documentation` + subsections, and replaces the auto-generated bullets with the curated PR-body content (with author and PR-link + attribution). + +If a PR's `## Changelog` section is empty, that PR's entry is omitted from the changelog (empty section = no user-facing +change). To fix a wrong CHANGELOG entry, fix the input: edit the squash-merged PR body, then re-run the script. Do +**not** edit `CHANGELOG.md` directly. + +`scripts/generate-changelog.sh --check` verifies that `CHANGELOG.md` has a versioned section (not just `[Unreleased]`): +wire this into the release-branch CI if/when one is added. + +### Why `cliff.toml` skips chore/style/test/ci/build + +These commit types do not produce user-facing content. If a cherry-picked PR has user-facing `## Changelog` content but +its commit subject starts with one of those types, its bullets get silently dropped. After running the script, +cross-check the generated section against `gh pr view --json body` for each cherry-picked PR; correct mistyped PR +titles (e.g. `chore` → `feat`) and re-amend the cherry-pick subject before re-running. See "Prefer `feat`/`fix` over +`chore`" in global CLAUDE.md for prevention. + +## Spec-vendor pipeline + +The bundle vendors a snapshot of [`agentnative-spec`](https://github.com/brettdavies/agentnative) under `spec/`. When +the spec ships a new tag (e.g., `v0.3.0`), this skill re-vendors via `scripts/sync-spec.sh` on the `release/v` +branch (same commit as the version bump, message `chore(spec): re-vendor spec to `). The script auto-resolves +the latest upstream tag from the remote, so no manual version selection is needed. + +Without re-vendoring, the bundle ships stale spec content while consumers see the new version on `anc.dev`. Re-vendoring +on the release branch keeps the on-disk snapshot in lockstep with the published version that consumers will detect via +`bin/check-update`. + +### Skill version is independent of spec version + +The skill's version is independent of the spec it vendors. A spec bump that doesn't affect the skill's surface (e.g., +prose-only edits) can ship as a patch even when the spec went minor. SemVer guidance applies to the *skill's* observable +behaviour, not the spec's: + +- **Patch** (doc updates, internal cleanups, non-substantive template edits, vendoring a patch-level spec bump). +- **Minor** (new templates, new reference docs, new bundle files (backward-compatible additions), vendoring a + minor-level spec bump that adds requirements without tightening existing tiers). +- **Major** (breaking changes to the bundle's contract: renaming `SKILL.md` frontmatter fields, restructuring directory + layout in ways that break existing skill installations, moving content between subdirectories and the producer-ops + root, or vendoring a major-level spec bump (renamed/removed principles or tightened MUSTs that would regress existing + consumers)). + +## `bin/check-update` semantics + +Update detection at install sites is delegated to the bundle's `bin/check-update`, which compares the local bundle's +`VERSION` against `main` on GitHub. This is a pull-side mechanism: there is no push or notification. Consumers detect +the new release on their next `bin/check-update` run. + +This is why `main` (not `dev`) must be the published-release pointer: a `git clone --depth 1` lands on `main`, and +`bin/check-update` compares against `main`. Cutting a release without fast-forwarding `main` would mean consumers never +see the new VERSION. + +## Why backport `main` → `dev` after publish + +Once a release tag publishes, `scripts/sync-dev-after-release.sh` backports the release-bookkeeping files from `main` to +`dev` so future feature branches inherit the correct baseline. Two files move: `VERSION` (overwritten with the released +number) and `CHANGELOG.md` (copied verbatim from `origin/main`, which is authoritative for the changelog). + +Without the backport, `dev` keeps the pre-release `VERSION` indefinitely (the release bump lives only on the `release/*` +branch that was squash-merged to `main` and never touched `dev`). Feature branches cut from `dev` then carry a stale +baseline — confusing during review, and load-bearing in two places: (a) `bin/check-update` compares the caller's local +`VERSION` against the producer repo's `main`, so a stale local `VERSION` from a `dev` clone would falsely report +`UPGRADE_AVAILABLE` on a current main; (b) the `chore(spec)` re-vendor commit message references the current bundle +version. + +The backport lands directly on `dev` (one signed commit, no PR). This is a deliberate exception to the PR-only norm on +`dev` documented in [`RELEASES.md`](./RELEASES.md) — the change is mechanical (no design content) and the script's +idempotency makes it safe to re-run. The exception holds only for this script's output; everything else still lands on +`dev` via PR. + +The script is idempotent: it exits 0 with no commit when `VERSION` and `CHANGELOG.md` already match `main`. Safe to +re-run, safe to invoke from automation that doesn't track whether the last release was already backported. + +Mirror of `~/dev/agentnative-cli/scripts/sync-dev-after-release.sh`. The cli variant additionally regenerates +`Cargo.lock` via `cargo build --release` after surgically updating `Cargo.toml`'s `[package].version`; the skill bundle +is markdown-only and ships no lock file, so those steps drop. + +## Prose scrubbing scope + +Three release-flow artifacts live outside any automated prose check and need a manual scrub before they ship: + +- **PR bodies.** `gh pr create` and `gh pr edit` send body text directly to GitHub; no automated prose check has reach + there. +- **`CHANGELOG.md`.** A generated artifact built from upstream PR bodies. Findings inherit whatever prose those PR + bodies carry. +- **Release-PR bodies.** The `release/* → main` PR carries contributor-authored wrap-up text composed after + `CHANGELOG.md` has been generated, and the same out-of-repo gap applies. + +The canonical Vale + LanguageTool rule packs and orchestrator behaviour live in the spec repo at +[`~/dev/agentnative-spec/docs/architecture/voice-enforcement.md`](https://github.com/brettdavies/agentnative/blob/dev/docs/architecture/voice-enforcement.md). +Until those packs are vendored into this repo via a `scripts/sync-spec.sh` extension (a deferred follow-up tracked in +the spec plan), the scrub commands point at the spec checkout directly. + +Scrub-before-submit (author in `/tmp/`, scrub there, submit via `--body-file`) avoids the round-trip of "submit, scrub, +edit, scrub again". Every fix lands locally and the public PR sees only clean text. The auto-format hook skips `/tmp/` +paths so the body keeps its authored shape and no soft-wrapping is injected. + +For a `CHANGELOG.md` finding, fix the upstream PR body (which `generate-changelog.sh` re-fetches every run) and +regenerate. Hand-editing `CHANGELOG.md` directly produces drift the next regeneration overwrites. + +## Branch protection + +### Why three rulesets + +This repo ships three rulesets (`protect-main.json`, `protect-dev.json`, `protect-tags.json`) where peer repos ship two. +The third (`protect-tags.json`) treats `v*` tags as immutable historical anchors for released versions: deletion, +force-push (re-tag), and updates are all blocked. The bundle's `bin/check-update` and the `git clone --depth 1` install +path both rely on `main` and on tag identity; a re-tagged release would lie to consumers about what they're installing. + +### Why the apply step is re-runnable + +The three rulesets ship in `.github/rulesets/` and are applied via the GitHub API. The apply commands in RELEASES.md are +deliberately idempotent so they survive: (a) the original public-flip from the bootstrap window when the repo was +private (GitHub's free tier does not allow rulesets on private repos, so rulesets could not be applied until visibility +flipped); (b) any future ruleset reset; (c) the same procedure being copied into a new repo's bootstrap. + +### Status-check context strings + +The `required_status_checks[].context` strings in `protect-main.json` MUST match exactly what GitHub publishes for each +check: + +- **Inline job** (with `name:` field): published as just `` (no workflow-name prefix). +- **Reusable-workflow caller** (`uses: .../foo.yml@ref`): published as ` / `. + +Mixing these produces a stuck-but-green PR: all actual checks report green, but the ruleset waits forever on a context +that will never appear. Confirm the real contexts after a first CI run with: + +```bash +gh api repos/brettdavies/agentnative-skill/commits//check-runs --jq '.check_runs[].name' +``` + +## Related docs + +- [`RELEASES.md`](./RELEASES.md) (operational runbook: commands, paths, decision tables) +- [`AGENTS.md`](./AGENTS.md) (repo layout, lint commands, what agents must not do) +- [`CONTRIBUTING.md`](./CONTRIBUTING.md) (how to propose changes) +- [`.github/pull_request_template.md`](.github/pull_request_template.md) (PR body structure with changelog sections) +- [`.github/rulesets/README.md`](.github/rulesets/README.md) (ruleset apply + verify procedure) +- [`CHANGELOG.md`](./CHANGELOG.md) (released versions and their notes) diff --git a/RELEASES.md b/RELEASES.md index e9f7616..bc320ee 100644 --- a/RELEASES.md +++ b/RELEASES.md @@ -1,7 +1,6 @@ # Releasing `agentnative-skill` -Every change reaches `main` via this pipeline. Direct commits to `dev` or `main` are not permitted — every change has a -PR number in its squash commit message, which keeps the history scannable, attributable, and changelog-ready. +Operational runbook. Rationale lives in [`RELEASES-RATIONALE.md`](./RELEASES-RATIONALE.md). ```text feature branch (feat/*, fix/*, chore/*, docs/*) → PR to dev (squash merge) @@ -10,9 +9,7 @@ feature branch (feat/*, fix/*, chore/*, docs/*) → PR to dev (squash merge) → tag v* on main → GitHub Release ``` -This is the canonical brettdavies release pattern with `release/*` cherry-pick branches. Plans live on `dev` forever and -`guard-main-docs.yml` blocks any `added` or `modified` engineering-doc files in PRs targeting `main`. The release-branch -cherry-pick handles this cleanly: docs commits stay on `dev`, only feature/fix/chore commits go onto `release/*`. +Direct commits to `dev` or `main` are not permitted: every change has a PR number in its squash commit message. ## Branches @@ -20,13 +17,10 @@ cherry-pick handles this cleanly: docs commits stay on `dev`, only feature/fix/c | -------------------------------------- | ------------------------------------------------------- | ------------------------------------------- | ------------------------------------ | | `main` | Released bundle. Only release-merged commits. | Forever. | `.github/rulesets/protect-main.json` | | `dev` | Integration. All feature PRs land here. Default branch. | Forever. Never delete. | `.github/rulesets/protect-dev.json` | -| `feat/*`, `fix/*`, `chore/*`, `docs/*` | Feature work. | One PR's worth. Auto-deleted on merge. | None — squash into `dev` freely. | +| `feat/*`, `fix/*`, `chore/*`, `docs/*` | Feature work. | One PR's worth. Auto-deleted on merge. | None. Squash into `dev` freely. | | `release/*` | Head of a `release/* → main` PR. | One release's worth. Auto-deleted on merge. | None. | -`dev` is a **forever branch**. Never delete it locally or remotely, even after a `release/* → main` merge. The next -release cycle reuses the same `dev`. The repo's `delete_branch_on_merge: true` setting doesn't touch `dev` because `dev` -is never the head of a PR — using a short-lived `release/*` head is what keeps the setting compatible with a forever -integration branch. +→ Rationale: [`RELEASES-RATIONALE.md` § Branching model](./RELEASES-RATIONALE.md#branching-model). ## Daily development (feature → dev) @@ -41,59 +35,108 @@ gh pr create --base dev --title "feat(scope): what changed" - **Commit style**: [Conventional Commits](https://www.conventionalcommits.org/). - **PR body**: follow `.github/pull_request_template.md`. The `## Changelog` section is the source of truth for - user-facing release notes — `CHANGELOG.md` entries derive from it directly. + user-facing release notes; `CHANGELOG.md` entries derive from it directly. See [§ PR body](#pr-body). +- **PR body prose scrub**: see [§ Prose scrubbing](#prose-scrubbing). + +### Dev-direct exception + +Two categories of change commit directly to `dev` without going through the feature-branch + PR flow: + +- **Engineering docs**: `docs/plans/`, `docs/solutions/`, `docs/brainstorms/`, `docs/reviews/`. These live on `dev` + only; `guard-main-docs.yml` blocks them from reaching `main` so a release branch's cherry-pick from `dev` naturally + excludes them. +- **Prose-tooling vendoring vehicle**: `scripts/sync-prose-tooling.sh`. The script vendors `BRAND.md` from + `agentnative-spec` and is a producer-side dev convenience, not part of the shipped bundle. The workflow guard's + `extra_paths` list keeps it off `main`; future cherry-picks that try to bring it onto a `release/*` branch will fail + the guard. `BRAND.md` itself still ships to `main` (consumers read it), but the script that vendors it does not. + +Everything else (consumer-facing markdown like `README`, `AGENTS`, `CONTRIBUTING`, `CHANGELOG`, the skill bundle content +under `SKILL.md` / `getting-started.md` / `spec/` / `references/` / `templates/`, and any in-repo runbook) goes through +the standard feature-branch + PR flow. + +## PR body + +Every PR (feature, fix, docs, release) uses `.github/pull_request_template.md` verbatim. + +- **No explainer prose anywhere in the body.** User-facing substance only. +- **Summary describes the net diff only** — what merged `main` looks like vs the base branch. Not commit history, + intermediate state, or cherry-pick mechanics. +- **Zero verification artifacts in the body.** No triple-diff stats, leak-check output ("`guard-main-docs` runs clean"), + patch-id cherry-check counts, pre-push gate results, CI status, or prose-scrub findings. Anomalies get fixed before + push, not audit-trailed. +- **Changelog** subsections (`### Added` / `### Changed` / `### Fixed` / `### Removed` / `### Security`): 1-5 bullets + each, delete empty subsections, each bullet starts with a verb. +- A PR with no user-facing impact (pure refactor, test-only, CI-only) leaves `## Changelog` empty or omits it. + +→ Rationale: [`RELEASES-RATIONALE.md` § PR body conventions](./RELEASES-RATIONALE.md#pr-body-conventions). ## Releasing dev to main Engineering docs (`docs/plans/`, `docs/solutions/`, `docs/brainstorms/`, `docs/reviews/`) live on `dev` only. -`guard-main-docs.yml` blocks any `added` or `modified` files under those paths from reaching `main`. Branching from -`dev` and deleting docs on the way produces `add/add` merge conflicts whenever `dev` and `main` have diverged (the norm -after the first squash merge). The cherry-pick pattern avoids this. +`guard-main-docs.yml` blocks any `added` or `modified` files under those paths from reaching `main`. **Branch naming**: `release/v` (preferred) or `release/-`. Keep the slug short and descriptive. ```bash -# 1. Cut release/* from main, NOT dev. Branching from dev causes add/add -# conflicts when dev and main have divergent histories. +# 1. Cut release/* from main, NOT dev. git fetch origin git checkout -b release/v origin/main -# 2. List the dev commits not yet on main: +# 2. List the dev commits not yet on main. git log --oneline dev --not origin/main -# 3. Cherry-pick non-docs commits onto release/v. Docs commits -# (anything that touched only docs/plans/, docs/solutions/, -# docs/brainstorms/, or docs/reviews/) stay on dev. +# 3. Cherry-pick non-docs commits onto release/v. Docs commits stay on dev. git cherry-pick ... -# 4. Verify no guarded paths leaked through: -git diff origin/main --name-only | grep -E '^docs/(plans|solutions|brainstorms|reviews)/' && \ - echo 'FAIL: guarded docs in release branch' || echo 'OK: no guarded docs' +# 4. Triple-diff verification. +git diff origin/main..HEAD --stat # A: ship surface +git diff HEAD..origin/dev --name-only | grep -v '^docs/' || echo "(none)" # B: no missed picks +git diff origin/dev..origin/main --stat | tail -5 # C: phantom-commits sanity + +# Re-confirm no guarded paths leaked. +git diff origin/main..HEAD --name-only \ + | grep -E '^(docs/plans|docs/brainstorms|docs/ideation|docs/reviews|docs/solutions|\.context)' \ + && echo "LEAKED — reset and redo" || echo "(clean)" + +# Patch-id cherry check (noisy in squash-merge workflow; triage per-line). +git cherry HEAD origin/dev | grep '^+' || echo "(none)" # 5. Bump VERSION on the release branch. echo '' > VERSION -# 6. Generate CHANGELOG entries from PR bodies. NEVER hand-edit CHANGELOG.md — -# the script is authoritative. It reads cliff.toml + each cherry-picked PR's -# ## Changelog section and prepends a versioned [] entry. +# 6. Re-vendor the spec if a new tag has shipped upstream. +scripts/sync-spec.sh +git add spec/ && git commit -m "chore(spec): re-vendor spec to " || true + +# 7. Generate CHANGELOG entries from PR bodies. scripts/generate-changelog.sh # (the script extracts from the branch name release/v) -# 7. Commit the version bump and generated changelog. +# 8. Scrub CHANGELOG.md via Vale + LanguageTool + unslop. See § Prose scrubbing. +# Fix findings on upstream PR bodies, never by hand-editing CHANGELOG.md. + +# 9. Commit the version bump and generated changelog. git add VERSION CHANGELOG.md git commit -m "chore(release): v" -# 8. Push and open the PR: +# 10. Push and open the PR. Scrub body in /tmp/ first. git push -u origin release/v -gh pr create --base main --head release/v --title "release: v" +gh pr create --base main --head release/v \ + --title "release: v" --body-file /tmp/body.md ``` When the PR merges: 1. The squash commit lands on `main` with the PR body as its message. 2. `release/v` is auto-deleted. -3. Tag the new `main` HEAD: `git checkout main && git pull && git tag -a v -m "v" && git push origin - v`. +3. Tag the new `main` HEAD: + + ```bash + git checkout main && git pull + git tag -a v -m "v" + git push origin v + ``` + 4. Create the GitHub Release using the generated CHANGELOG section: ```bash @@ -103,87 +146,104 @@ When the PR merges: Consumers detect the new release on their next `bin/check-update` run; nothing else to do here. -`dev` keeps moving forward. Never reset or rebase `dev` after a release — it is forever. - -### CHANGELOG is generated, never hand-written - -`scripts/generate-changelog.sh` (with `cliff.toml`) is the only sanctioned way to update `CHANGELOG.md`. The script: +`dev` keeps moving forward. Never reset or rebase `dev` after a release: it is forever. -- Runs `git-cliff` to prepend a versioned entry for commits since the last tag. -- Walks each squash-merged PR's body, extracts the `## Changelog` section's `### Added` / `### Changed` / `### Fixed` / - `### Documentation` subsections, and replaces the auto-generated bullets with the curated PR-body content (with author - and PR-link attribution). +→ Rationale + triple-diff false-positive triage: +[`RELEASES-RATIONALE.md` § Triple-diff verification](./RELEASES-RATIONALE.md#triple-diff-verification). CHANGELOG +mechanics: [`RELEASES-RATIONALE.md` § CHANGELOG generation](./RELEASES-RATIONALE.md#changelog-generation). Spec +re-vendoring: [`RELEASES-RATIONALE.md` § Spec-vendor pipeline](./RELEASES-RATIONALE.md#spec-vendor-pipeline). -If a PR's `## Changelog` section is empty, that PR's entry is omitted from the changelog (the convention in -[`.github/pull_request_template.md`](.github/pull_request_template.md): empty section = no user-facing change). To fix a -wrong CHANGELOG entry, fix the input — edit the squash-merged PR body, then re-run the script. Do **not** edit -`CHANGELOG.md` directly. +### After publish — sync `dev` with the release -`scripts/generate-changelog.sh --check` verifies that `CHANGELOG.md` has a versioned section (not just `[Unreleased]`) — -wire this into the release-branch CI if/when one is added. +Once the release tag is published, backport the release-bookkeeping files from `main` to `dev`: -### Why branch from main, not dev +```bash +./scripts/sync-dev-after-release.sh v +git push origin dev +``` -Branching from `dev` and then `gio trash`-ing the guarded paths seems simpler but produces `add/add` merge conflicts -whenever `dev` and `main` have diverged. The file appears as "added" on both sides with different content. Always branch -from `origin/main` and cherry-pick onto it. +The script overwrites `VERSION` with the released number and copies `CHANGELOG.md` verbatim from `origin/main`, then +commits the result directly to `dev` as one signed commit (no PR). Without this step `dev`'s `VERSION` and +`CHANGELOG.md` stay frozen at the pre-release state, and future feature branches inherit the wrong baseline. -## Spec re-vendoring +The backport is idempotent: re-running on a `dev` already in sync exits 0 with no commit. -The bundle vendors a snapshot of [`agentnative-spec`](https://github.com/brettdavies/agentnative) under `spec/`. When -the spec ships a new tag (e.g., `v0.3.0`), this skill re-vendors via `scripts/sync-spec.sh` on the `release/v` -branch — same commit as the version bump, message `chore(spec): re-vendor spec to `. The script auto-resolves -the latest upstream tag from the remote, so no manual version selection is needed. Without re-vendoring, the bundle -ships stale spec content while consumers see the new version on `anc.dev`. +→ Rationale: +[`RELEASES-RATIONALE.md` § Why backport `main` → `dev` after publish](./RELEASES-RATIONALE.md#why-backport-main--dev-after-publish). ## Version bump procedure -The version bump and CHANGELOG generation both happen on the `release/v` branch (steps 5–6 of the cherry-pick +The version bump and CHANGELOG generation both happen on the `release/v` branch (steps 5-7 of the cherry-pick flow above). There is no separate version-bump PR to `dev`. Picking the version is the only manual decision: -- **Patch** — doc updates, internal cleanups, non-substantive template edits, vendoring a patch-level spec bump. -- **Minor** — new templates, new reference docs, new bundle files (backward-compatible additions), vendoring a +- **Patch**: doc updates, internal cleanups, non-substantive template edits, vendoring a patch-level spec bump. +- **Minor**: new templates, new reference docs, new bundle files (backward-compatible additions), vendoring a minor-level spec bump that adds requirements without tightening existing tiers. -- **Major** — breaking changes to the bundle's contract: renaming `SKILL.md` frontmatter fields, restructuring directory - layout in ways that break existing skill installations, moving content between `` and the producer-ops root, or - vendoring a major-level spec bump (renamed/removed principles or tightened MUSTs that would regress existing - consumers). +- **Major**: breaking changes to the bundle's contract (renaming `SKILL.md` frontmatter fields, restructuring directory + layout in ways that break existing skill installations, moving content between subdirectories and the producer-ops + root, or vendoring a major-level spec bump). + +→ Rationale: [`RELEASES-RATIONALE.md` § Spec-vendor pipeline](./RELEASES-RATIONALE.md#spec-vendor-pipeline) (skill +version is independent of spec version). + +## Prose scrubbing + +Three release-flow artifacts live outside any automated prose check and need a manual scrub before they ship: + +- PR bodies (`gh pr create` / `gh pr edit` send body text directly to GitHub). +- `CHANGELOG.md` (a generated artifact built from upstream PR bodies). +- Release-PR bodies (composed after `CHANGELOG.md` has been generated). + +The canonical Vale + LanguageTool rule packs and orchestrator behaviour live in the spec repo at +[`~/dev/agentnative-spec/docs/architecture/voice-enforcement.md`](https://github.com/brettdavies/agentnative/blob/dev/docs/architecture/voice-enforcement.md). +Until those packs are vendored into this repo via a `scripts/sync-spec.sh` extension (a deferred follow-up), the scrub +commands point at the spec checkout directly. + +```bash +# 1. Save the artifact to /tmp/. +gh pr view --json body --jq .body > /tmp/body.md # for PR body edits +# cp CHANGELOG.md /tmp/body.md # for changelog scrub -The skill's version is independent of the spec it vendors. A spec bump that doesn't affect the skill's surface (e.g., -prose-only edits) can ship as a patch even when the spec went minor. Use the SemVer guidance above against the *skill's* -observable behaviour, not the spec's. +# 2. Vale (against the spec's rule packs). +vale --no-global --config ~/dev/agentnative-spec/.vale.ini --output=line --minAlertLevel=error /tmp/body.md -## PRs and changelog generation +# 3. LanguageTool grammar check via lt_check (~/dotfiles/config/shell/languagetool.sh). +# Skips cleanly if LT is unreachable. Inspect: `lt_rules`, `lt_info`. See +# ~/dev/agentnative-spec/CONTRIBUTING.md § Voice enforcement for the +# install-vs-required nuance. +lt_check /tmp/body.md -Every PR **must** follow `.github/pull_request_template.md`. The template's `## Changelog` section has these -subsections: +# 4. unslop (em-dash density and AI-unique structural patterns). +~/.claude/skills/unslop/scripts/score.py /tmp/body.md -- `### Added` — new user-visible features or capabilities (new principles, new checks, new templates). -- `### Changed` — changes to existing behavior (e.g., a check's pass/fail criteria tightens). -- `### Fixed` — bug fixes (e.g., a check produces false positives). -- `### Removed` — removed features or APIs. -- `### Security` — security-relevant changes (e.g., a script that ran on user machines now refuses an unsafe path). +# 5. Apply fixes per finding. Re-run until 0 blocking and unslop score is 0. -A PR that lands with an empty or missing `## Changelog` section silently drops its user-facing notes from the next -release. If a PR truly has no user-facing impact (pure refactor, test-only, CI-only), leave the section empty — the PR -still appears in git history. +# 6. Apply the cleaned version. +gh pr edit --body-file /tmp/body.md # for PR body edits +# scripts/generate-changelog.sh # for CHANGELOG.md (re-runs the PR-body fetch from GitHub) +``` + +For a `CHANGELOG.md` finding, fix the upstream PR body and regenerate. Hand-editing `CHANGELOG.md` directly produces +drift the next regeneration overwrites. + +→ Rationale + which artifacts need this: +[`RELEASES-RATIONALE.md` § Prose scrubbing scope](./RELEASES-RATIONALE.md#prose-scrubbing-scope). ## Branch protection Three rulesets are committed under `.github/rulesets/` and applied to the repo via the GitHub API: -- **`protect-main.json`** — required signatures, linear history, squash-only merges via PR with CODEOWNERS review, +- **`protect-main.json`**: required signatures, linear history, squash-only merges via PR with CODEOWNERS review, required status checks (`markdownlint`, `shellcheck`, `guard-docs / check-forbidden-docs`), creation/deletion blocked, non-fast-forward blocked. -- **`protect-dev.json`** — required signatures, deletion blocked, non-fast-forward blocked. No PR-requirement at the - ruleset level; the PR-only norm is enforced by convention. -- **`protect-tags.json`** — `v*` tags: deletion, force-push (re-tag), and updates all blocked. Tags are immutable +- **`protect-dev.json`**: required signatures, deletion blocked, non-fast-forward blocked. PR-only norm is enforced by + convention. +- **`protect-tags.json`**: `v*` tags. Deletion, force-push (re-tag), and updates all blocked. Tags are immutable historical anchors for released versions. -### Apply (post-public-flip) +### Apply -The repo ships **PRIVATE** through the bootstrap window. GitHub's free tier does not allow rulesets on private repos. -After visibility flips to public from the agentnative-site session: +All three rulesets are already installed on this repo. Re-runnable for new repos or after a ruleset reset: ```bash gh api repos/brettdavies/agentnative-skill/rulesets -X POST --input .github/rulesets/protect-main.json @@ -191,6 +251,12 @@ gh api repos/brettdavies/agentnative-skill/rulesets -X POST --input .github/rule gh api repos/brettdavies/agentnative-skill/rulesets -X POST --input .github/rulesets/protect-tags.json ``` +Verify installed rulesets: + +```bash +gh api repos/brettdavies/agentnative-skill/rulesets --jq '.[] | "\(.id)\t\(.name)\t\(.target)"' +``` + See [`.github/rulesets/README.md`](.github/rulesets/README.md) for verification + negative tests. ### Updating a ruleset @@ -205,9 +271,7 @@ gh api repos/brettdavies/agentnative-skill/rulesets --jq '.[] | "\(.id)\t\(.name gh api -X PUT repos/brettdavies/agentnative-skill/rulesets/ --input .github/rulesets/protect-main.json ``` -### Status-check context pitfall - -`required_status_checks[].context` strings must match exactly what GitHub publishes for each check. For this repo: +### Status-check contexts (verified) | Check | Source | Context (verified) | | ------------------ | ---------------------------------------------- | ----------------------------------- | @@ -221,10 +285,15 @@ Confirm post-CI with: gh api repos/brettdavies/agentnative-skill/commits//check-runs --jq '.check_runs[].name' ``` +→ Rationale (inline vs reusable, three-ruleset shape): +[`RELEASES-RATIONALE.md` § Branch protection](./RELEASES-RATIONALE.md#branch-protection). + ## Related docs -- [`AGENTS.md`](./AGENTS.md) — repo layout, lint commands, what agents must not do. -- [`CONTRIBUTING.md`](./CONTRIBUTING.md) — how to propose changes. -- [`.github/pull_request_template.md`](.github/pull_request_template.md) — PR body structure with changelog sections. -- [`.github/rulesets/README.md`](.github/rulesets/README.md) — ruleset apply + verify procedure. -- [`CHANGELOG.md`](./CHANGELOG.md) — released versions and their notes. +- [`RELEASES-RATIONALE.md`](./RELEASES-RATIONALE.md) (release flow rationale, CHANGELOG pipeline, branch-protection + pitfalls) +- [`AGENTS.md`](./AGENTS.md) (repo layout, lint commands, what agents must not do) +- [`CONTRIBUTING.md`](./CONTRIBUTING.md) (how to propose changes) +- [`.github/pull_request_template.md`](.github/pull_request_template.md) (PR body structure with changelog sections) +- [`.github/rulesets/README.md`](.github/rulesets/README.md) (ruleset apply + verify procedure) +- [`CHANGELOG.md`](./CHANGELOG.md) (released versions and their notes) diff --git a/SECURITY.md b/SECURITY.md index 987b819..d32f5cf 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -2,8 +2,8 @@ ## Reporting a Vulnerability -If you discover a security vulnerability in this skill bundle — for example, a script in `scripts/` or a template under -`templates/` that could execute unintended code on a user's machine at install time — please report it via GitHub's +If you discover a security vulnerability in this skill bundle (for example, a script in `scripts/` or a template under +`templates/` that could execute unintended code on a user's machine at install time), please report it via GitHub's private security advisories rather than filing a public issue: [Open a private security advisory](https://github.com/brettdavies/agentnative-skill/security/advisories/new) @@ -26,7 +26,7 @@ Out of scope: - Vulnerabilities in tools that this skill *describes* but does not ship (e.g., bugs in clap, Rust, or other CLI tools referenced by the principles). -- Best-practice debates about the principles themselves — open a regular issue or PR for those. +- Best-practice debates about the principles themselves; open a regular issue or PR for those. ## Supported Versions diff --git a/SKILL.md b/SKILL.md index cd5340a..2d2b49d 100644 --- a/SKILL.md +++ b/SKILL.md @@ -2,72 +2,54 @@ name: agent-native-cli description: >- Guide to designing, building, and auditing CLI tools for use by AI agents. Pairs with - [`anc`](https://github.com/brettdavies/agentnative-cli) (the canonical compliance checker) and + [`anc`](https://github.com/brettdavies/agentnative-cli) (the canonical compliance auditor) and [`agentnative-spec`](https://github.com/brettdavies/agentnative) (the canonical principle text, vendored at - `spec/`). Provides starter templates, language-specific implementation idioms, and a short - getting-started guide that points agents at `anc check --output json`. Use when designing a new CLI tool, - reviewing one for agent-readiness, or remediating findings from `anc`. Triggers on agentic CLI, agent-native, - CLI design, CLI standard, agent-first, CLI for agents, agent-friendly CLI, CLI compliance, anc. + `spec/`). Provides starter templates, language-specific implementation idioms (Rust/clap, Python Click & argparse, + Go Cobra, JS Commander/yargs/oclif, Ruby Thor), and a getting-started guide that points agents at + `anc audit --output json` (scorecard schema 0.7) and `anc skill install ` / `anc skill update --all`. Use + when designing a new CLI tool, building a Rust/clap binary intended for agents, reviewing one for agent-readiness, + claiming the agent-native badge (≥70% credit-weighted score), or remediating findings from `anc`. Triggers on + agentic CLI, agent-native, CLI design, CLI standard, agent-first, CLI for agents, agent-friendly CLI, CLI + compliance, agent-readiness, anc, anc audit, anc emit schema, anc skill install, anc skill update, agent-native + badge, scorecard, audit_id, audit-profile, opt_out, n_a, Rust CLI, clap derive. SKIP when the user is building a + TUI app meant for humans (use `--audit-profile human-tui` rather than this skill), writing a non-CLI library, or + asking unrelated Rust questions not specifically about agent-readiness of a CLI. --- # Agent-Native CLI The standard for CLI tools designed to be operated by AI agents. Three artifacts work together: -| Artifact | Role | -| ---------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| [`agentnative-spec`](https://github.com/brettdavies/agentnative) | Canonical text of the seven principles. Frontmatter `requirements[]` is the machine-readable contract. Vendored into [`spec/`](./spec/) — snapshot refreshed each release. | -| [`anc`](https://github.com/brettdavies/agentnative-cli) | The compliance checker. Reads target source/binary, emits a JSON scorecard whose entries cite spec `requirement_id`s. The runtime authority. | -| **This skill** (`agent-native-cli`) | The agent-facing guide. Tells the agent how to invoke `anc`, how to navigate the spec when remediating findings, and where the implementation patterns and starter templates live. | +| Artifact | Role | +| ---------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| [`agentnative-spec`](https://github.com/brettdavies/agentnative) | Canonical text of the eight principles. Frontmatter `requirements[]` is the machine-readable contract. Vendored into [`spec/`](./spec/); snapshot refreshed each release. | +| [`anc`](https://github.com/brettdavies/agentnative-cli) | The compliance auditor. Reads target source/binary, emits a JSON scorecard whose entries cite the spec requirement `id` (e.g. `p1-must-no-interactive`) and the probe `audit_id`. The runtime authority. | +| **This skill** (`agent-native-cli`) | The agent-facing guide. Tells the agent how to invoke `anc`, how to navigate the spec when remediating findings, and where the implementation patterns and starter templates live. | -The skill does **not** implement principles checking. `anc` does. The skill teaches agents to use `anc` and supplies the +The skill does **not** implement principles auditing. `anc` does. The skill teaches agents to use `anc` and supplies the surrounding context (spec, idioms, templates) that `anc`'s findings reference. -## Update check +## First action: update check -On first invocation per session, run `bin/check-update`. It compares this bundle's `VERSION` against `main` on GitHub -and prints one of: - -| Output | Meaning | -| ------------------------------------ | ------------------------------------------------------------------------------- | -| (empty) | Up to date, snoozed, disabled, or check skipped (broken install, no network). | -| `UPGRADE_AVAILABLE ` | A newer release is on `main`. Surface the upgrade flow below before continuing. | +Run once per session before doing real work: ```bash bash "$(dirname "$0")/bin/check-update" ``` -Exit code is always 0; failures degrade silently. - -### Inline upgrade flow - -When stdout is `UPGRADE_AVAILABLE `, ask the user via `AskUserQuestion`: - -> `agent-native-cli` **v{remote}** is available (you're on v{local}). Upgrade now? - -Three options: - -- **"Yes, upgrade now"** — run `git -C pull --ff-only`. Report the new HEAD and the upgrade outcome. - The bundle root is the parent of ``; `git -C ../.. pull --ff-only` from `bin/` works for the default install layout - (`~//skills/agent-native-cli/`). If `--ff-only` rejects (uncommitted edits or divergent history), surface git's - error verbatim and stop — do not auto-stash. -- **"Not now"** — write `$HOME/.cache/agent-native-cli/update-snoozed` in the format ` `, where - `` is `1` (24h reminder), `2` (48h), or `3` (7 days), escalating each time the user defers. Tell the user the - next reminder window. -- **"Never ask again"** — `touch $HOME/.cache/agent-native-cli/disabled` and tell the user how to re-enable (`rm - $HOME/.cache/agent-native-cli/disabled`). - -State directory: `$HOME/.cache/agent-native-cli/`. All three files (`last-update-check`, `update-snoozed`, `disabled`) -live there; the script auto-creates the directory on first slow-path fetch. +Empty output → continue. `UPGRADE_AVAILABLE ` → prompt the user via `AskUserQuestion` (Yes / Not now / +Never). Full prompt text, snooze ladder, and state-file layout are in +[`references/update-check.md`](./references/update-check.md). Exit code is always `0`; failures degrade silently. ## Start here -→ **[`getting-started.md`](./getting-started.md)** — the three working loops (existing CLI / new Rust CLI / other -language), the canonical `anc check` invocations, and a "where things live" map. +→ **[`getting-started.md`](./getting-started.md):** the three working loops (existing CLI / new Rust CLI / other +language), the canonical `anc audit` invocations, the `anc skill install ` installer, and a "where things live" +map. -## The seven principles +## The eight principles -Defined in [`spec/principles/`](./spec/principles/) (vendored from `agentnative-spec` — currently `v0.2.0`; see +Defined in [`spec/principles/`](./spec/principles/) (vendored from `agentnative-spec`, currently `v0.5.0`; see [`spec/README.md`](./spec/README.md) for resync instructions). One file per principle, each with machine-readable `requirements[]` frontmatter: @@ -80,13 +62,90 @@ Defined in [`spec/principles/`](./spec/principles/) (vendored from `agentnative- | P5 | [`p5-safe-retries-mutation-boundaries.md`](./spec/principles/p5-safe-retries-mutation-boundaries.md) | Safe Retries, Mutation Boundaries | | P6 | [`p6-composable-predictable-command-structure.md`](./spec/principles/p6-composable-predictable-command-structure.md) | Composable, Predictable Structure | | P7 | [`p7-bounded-high-signal-responses.md`](./spec/principles/p7-bounded-high-signal-responses.md) | Bounded, High-Signal Responses | +| P8 | [`p8-discoverable-skill-bundle.md`](./spec/principles/p8-discoverable-skill-bundle.md) | Discoverable Skill Bundles | + +Do not paraphrase the principles inside this skill. Read the spec files directly; they are the source of truth. + +## The anc loop: audit → fix → re-audit → claim badge + +Once `anc` is installed (one-line install in [`getting-started.md`](./getting-started.md)), the work is a four-step +loop: + +**1. Audit.** `anc audit --output json . > scorecard.json`. The JSON envelope is schema `0.7` and contains: + +- `summary`: counter object covering the full status set (`total / pass / warn / fail / skip / error / opt_out / n_a`). + Spans all three audit layers (behavioral, source, project). +- `coverage_summary`: `must / should / may`, each with `total` + `verified`. **`verified` counts any verdict; `pass`, + `warn`, `fail`, and `skip` all increment it.** "Was this MUST audited at all?" not "was it satisfied." The actual bar + for "no MUST violations" is no `results[]` row where `tier == "must" && status == "fail"` (equivalently, every MUST + row is `pass` / `warn` / `opt_out` / `skip` / `n_a`). +- `badge.eligible` (bool), `badge.score_pct` (int), `badge.embed_markdown` (string or `null`), `badge.scorecard_url`, + `badge.badge_url`, `badge.convention_url`. **70%** is the eligibility floor; below it, `embed_markdown` is `null` and + the convention says do not advertise a badge. **The score is computed over behavioral-layer rows only.** Source- and + project-layer audits do not affect `score_pct` or badge eligibility (`spec/principles/scoring.md` § "Scope: + shipped-binary behavior only"). Under the current flat tier weights the formula reduces to a credit-weighted ratio: + `pass = 1.0`, `warn = 0.5`, `fail` and `opt_out` = `0.0` (all in the denominator); `n_a`, `skip`, and `error` are + excluded. The general form is tier-weighted (`w(must) · w(should) · w(may)`) but currently flat; see + `spec/principles/scoring.md` for the formula and the cohort bands above the floor. +- `results[]`: one entry per requirement-row. Each carries `id` (the spec requirement id matching + `spec/principles/p-*.md` frontmatter, e.g. `p1-must-no-interactive`), `audit_id` (the probe that produced the row, + e.g. `p1-non-interactive`), `tier` (`must` / `should` / `may`), `status`, `evidence`, `group`, `layer`, `confidence`, + and `label`. A single probe like `p3-version` emits two rows (one tier-stamped `must`, one `should`), so you can + attribute the verdict to a specific RFC 2119 level without joining the coverage matrix. +- `audit_profile`: the exemption category in effect (or `null`). +- `tool / anc / run / target` metadata: identifies the scored tool, the `anc` build, the invocation, and the resolved + target. + +**Status set.** `pass` / `warn` / `fail` / `skip` / `error` are the live verdicts. `opt_out` marks a deliberate +non-adoption (e.g. no `--output` flag → P2 schema-discovery rows collapse). `n_a` propagates from a conditional whose +antecedent is `opt_out` or `n_a`; the `evidence` field names the antecedent so the chain is legible from JSON alone. The +process exit code reflects live verdicts only: `n_a` from an `opt_out` antecedent does not force a non-zero exit. + +**2. Fix.** For each `fail`, look up the cited `id` (e.g. `p1-must-no-interactive`) in `spec/principles/p-*.md`'s +`requirements[]` frontmatter. Apply the fix using the implementation references below. Re-run with `--principle ` to +focus on one principle while iterating. + +**3. Re-audit.** Re-run `anc audit --output json .` until no MUST row is in `fail`. Query the JSON with `jq '[.results[] +| select(.tier == "must" and .status == "fail")] | length'` and confirm it's `0`. Do not rely on +`coverage_summary.must.verified == coverage_summary.must.total` as the satisfaction bar; the `verified` counter +increments on any verdict, so it can equal `total` while MUST fails still exist. Use `--audit-profile ` to +suppress audits that don't apply to the tool class: `human-tui` (TUIs that legitimately intercept the TTY), +`file-traversal` (reserved), `posix-utility` (cat / sed / awk style), `diagnostic-only` (read-only tools). Suppressed +audits emit `Skip` with structured evidence so readers see what was excluded. + +**4. Claim the badge.** Once `badge.eligible == true` (≥70%), copy `badge.embed_markdown` into the project's README. The +`text` output appends an embed hint after the summary line whenever the floor is cleared; below the floor, nothing +badge-related is printed (the convention's "do not nag" rule). + +### Useful flags + +- `--principle `: filter the audit to one principle while iterating. +- `--binary` / `--source`: scope to one layer (skip the other). +- `--audit-profile `: suppress audits that don't apply (`human-tui`, `posix-utility`, `diagnostic-only`, + `file-traversal`). +- `--examples`: print a curated invocation block and exit (pair with `--output json` or `--json` for structured). +- `--json`: short alias for `--output json` (the `p2-should-json-aliases` convention). +- `--raw`: strip headers and summary, emit one `idstatus` line per audit. Pipe-friendly for `grep` / `awk`. +- `--color ` (env `AGENTNATIVE_COLOR`): control ANSI styling; honors `NO_COLOR` in `auto` mode. +- `-v` / `--verbose` (env `AGENTNATIVE_VERBOSE`): escalate diagnostic detail when debugging unexpected results. + +### Get the scorecard JSON Schema + +The canonical JSON Schema for the scorecard envelope ships embedded in the binary. Extract it without a network round +trip: -Do not paraphrase the principles inside this skill — read the spec files directly. They are the source of truth. +```bash +anc emit schema # print to stdout +anc emit schema | jq '.title' # inspect a field +``` + +The schema `$id` is `https://anc.dev/scorecard-v0.7.schema.json`. Pre-0.6 consumers treated `opt_out` / `n_a` as +unknown, so feature-detect the status enum rather than pinning to an exact list. ## Implementation guidance (when fixing findings) -Once `anc check` reports a failure, the agent has the cited `requirement_id` and the spec text. The next question is -"how do I write code that satisfies this requirement?" — answered by: +Once `anc audit` reports a failure, the agent has the cited `id` (requirement-row id) and the spec text. The next +question is "how do I write code that satisfies this requirement?" The references below answer that: | Need | File | | ------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- | @@ -106,21 +165,31 @@ Drop-in starting points for greenfield Rust CLIs. Each encodes the relevant prin | [`templates/output-format.rs`](./templates/output-format.rs) | `OutputConfig`, `OutputFormat`, `diag!`, NO_COLOR / IsTerminal | | [`templates/agents-md-template.md`](./templates/agents-md-template.md) | Project-level AGENTS.md scaffold | -## Compliance checking +## Compliance auditing -Use `anc`. Install once: +Use `anc`. Install once via Homebrew, cargo, or self-install: ```bash brew install brettdavies/tap/agentnative # binary is `anc` cargo install agentnative ``` -Recommended invocations and the full agent loop are in [`getting-started.md`](./getting-started.md). Do not write shell -scripts to grep for principle violations — `anc` already implements (and supersedes) every check that approach could -produce. +To install **this skill bundle** into a host's canonical skills directory, use `anc`'s built-in installer: + +```bash +anc skill install claude_code # also: codex, cursor, factory, kiro, opencode +anc skill install --all # install into every known host in one invocation +anc skill install --dry-run claude_code # print resolved git command without spawning +anc skill update claude_code # refresh an existing install (guards on SKILL.md marker) +anc skill update --all # refresh every known host +``` + +Recommended `anc audit` invocations and the full agent loop are in [`getting-started.md`](./getting-started.md). Do not +write shell scripts to grep for principle violations; `anc` already implements (and supersedes) every audit that +approach could produce. ## Sources -- [`agentnative-spec`](https://github.com/brettdavies/agentnative) — canonical principle text (CC BY 4.0) -- [`agentnative-cli`](https://github.com/brettdavies/agentnative-cli) — `anc`, the canonical checker (MIT / Apache-2.0) -- [`agentnative-skill`](https://github.com/brettdavies/agentnative-skill) — this repo (MIT) +- [`agentnative-spec`](https://github.com/brettdavies/agentnative): canonical principle text (CC BY 4.0) +- [`agentnative-cli`](https://github.com/brettdavies/agentnative-cli): `anc`, the canonical auditor (MIT / Apache-2.0) +- [`agentnative-skill`](https://github.com/brettdavies/agentnative-skill): this repo (MIT) diff --git a/VERSION b/VERSION index 0ea3a94..8f0916f 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -0.2.0 +0.5.0 diff --git a/docs/SYNCS.md b/docs/SYNCS.md new file mode 100644 index 0000000..f43c124 --- /dev/null +++ b/docs/SYNCS.md @@ -0,0 +1,128 @@ +# Cross-repo sync map + +How spec content flows in and how the skill bundle flows out. Source of truth for the sync mechanisms that connect this +repo to its three sibling repos (`agentnative` spec, `agentnative-site`, `agentnative-cli`) and to consumer agent hosts. + +> This document is a routing map. Per-script details (env vars, fallback behavior, exit codes) live in the script +> headers. Re-vendor procedure during a release lives in [`RELEASES.md`](../RELEASES.md). Vendored-spec mechanics live +> in [`spec/README.md`](../spec/README.md). + +## Upstream — data flowing INTO this repo + +| Source | Mechanism | What's synced | Trigger / cadence | Drift check | +| -------------------------------------------------------------------------------------------- | ------------------------------- | ----------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `brettdavies/agentnative` (spec); default latest `v*` tag, or any branch/tag/SHA via `--ref` | `scripts/sync-spec.sh` (manual) | `principles/p*-*.md` + `VERSION` + `CHANGELOG.md` → `spec/` | Re-run on the `release/v` branch as part of every release (see `RELEASES.md` §"Spec re-vendoring"). Cross-repo coordination: rerun with `--ref dev` (or a SHA) to consume in-flight spec work before a release cuts. | None automated — relies on the release-branch checklist. The script itself is idempotent; `git status` after a re-run surfaces orphan files from spec renames. The script prints the resolved short SHA on every run regardless of ref type so the release branch can record the exact pin. | + +**Mechanism notes:** + +- Resolution uses `gh api` against `https://github.com/brettdavies/agentnative.git` (override with `SPEC_REMOTE_URL`). + Pulls VERSION, CHANGELOG.md, and `principles/p*-*.md` individually via the contents endpoint at the resolved ref. No + clone, no tarball — branches, tags, and SHAs all take the same code path via `?ref=`. +- Default ref is the latest `v*` tag, resolved via the same API. `--ref ` (or `SPEC_REF=` env var) + vendors any explicit ref for cross-repo coordination of in-flight spec work that hasn't released yet. +- Falls back to a local checkout (`$HOME/dev/agentnative-spec`, override with `SPEC_ROOT`) only when `gh api` is + unreachable (offline / unauthenticated). The local-fallback path uses `git` against `SPEC_ROOT`. +- Mirror of `~/dev/agentnative-cli/scripts/sync-spec.sh` — only `DEST_DIR` differs. Keep them aligned when changing + either. + +## Inbound + outbound data map + +```mermaid +flowchart LR + subgraph Inbound + SPEC[brettdavies/agentnative
spec repo] + end + + SKILL[brettdavies/agentnative-skill
this repo
bundle + git tags] + + subgraph Outbound + CC[Claude Code
git clone] + CUR[Cursor
git clone] + CDX[Codex
git clone] + end + + SITE[brettdavies/agentnative-site] + + SPEC -- "scripts/sync-spec.sh
(MANUAL — NOT on spec's
repository_dispatch list)" --> SKILL + SKILL -- "release tag v<X.Y.Z>
= bundle pin" --> CC + SKILL -- "release tag v<X.Y.Z>
= bundle pin" --> CUR + SKILL -- "release tag v<X.Y.Z>
= bundle pin" --> CDX + SITE -. "git ls-remote (daily 13:00 UTC)
skill-availability.yml
visibility probe" .-> SKILL +``` + +This repo is intentionally **off** the spec's `repository_dispatch` list — re-vendoring is a deliberate release-time +act, not a webhook reflex. The site's daily probe is the only automated edge that touches this repo from outside, and +it's read-only (`git ls-remote`) — it never mutates state here. + +## Downstream — data flowing OUT of this repo + +This repo is **not** the source of truth for `skill.json`. The `agentnative-site` repo holds the canonical manifest at +`src/data/skill.json`; this repo's role downstream is (a) to be the git target that `skill.json`'s `source.url` points +at, and (b) to be the bundle that `install.` commands clone. + +### Manifest source-of-truth vs bundle content + +```mermaid +flowchart LR + subgraph ManifestPath[Manifest text — owned by agentnative-site] + SJSON[src/data/skill.json
SoT for manifest fields
incl. source.commit] + EMITTER[src/build/skill.mjs
validator + emitter] + ENDPOINT[/skill.json endpoint/] + SJSON --> EMITTER --> ENDPOINT + end + + subgraph BundlePath[Bundle content — owned by this repo] + BUNDLE[agentnative-skill
SKILL.md + spec/ + scripts/
+ git tags v<X.Y.Z>] + end + + SJSON -. "source.commit pins
a SHA from this repo
(hand-edited at release)" .-> BUNDLE +``` + +The two paths are independent artifacts that meet only at the `source.commit` pin. The site decides what the manifest +*says*; this repo decides what the bundle *is*. Editing `skill.json` here would be a category error — the field +ownership lives across the boundary in `agentnative-site/src/data/skill.json`. + +| Consumer | Mechanism | What's distributed | Trigger / cadence | Drift check | +| ----------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Agent hosts (Claude Code, Codex, Cursor, Factory, Kiro, OpenCode) | Plain `git clone --depth 1 https://github.com/brettdavies/agentnative-skill.git /agent-native-cli`. Update path is `git -C pull --ff-only` triggered inline by `bin/check-update`'s `UPGRADE_AVAILABLE`. | The full repo as a flat tree. Host auto-discovers `SKILL.md` at the install root and ignores producer-side files. | New release lands on `main` + tag `v`. Consumers pick it up on next `bin/check-update` invocation (first invocation per session, with snooze/disable state in `~/.cache/agent-native-cli/`). | None on the producer side — no telemetry. `bin/check-update` is the consumer-side mechanism; it compares local `VERSION` to `main` and prompts the user. | +| `brettdavies/agentnative-site` `/skill` + `/skill.json` endpoints | The site holds the source-of-truth manifest at `src/data/skill.json`. `src/build/skill.mjs` validates and emits `dist/skill.json` byte-stably during the site build. The manifest's `source.commit` is hand-co-edited at release time to pin a specific `agentnative-skill` SHA. | The `/skill.json` machine surface, `/skill.html` human page, and `/skill.md` markdown twin — all derived from the site's own `src/data/skill.json`. | On every site deploy. The site's release procedure includes hand-bumping `source.commit` (and `version`) to match the `agentnative-skill` release being announced. | Synthetic probe — `agentnative-site/.github/workflows/skill-availability.yml` runs `git ls-remote` against this repo daily at 13:00 UTC and fails the run if the repo 404s, gets renamed, or flips private. | +| `brettdavies/agentnative-cli` `src/skill_install/skill.json` | `~/dev/agentnative-cli/scripts/sync-skill-fixture.sh` pulls `src/data/skill.json` from `agentnative-site` (default ref: `dev`). The fixture is also the build-time input for `build.rs`'s host-map codegen — `cargo build` regenerates `SkillHost`, `KNOWN_HOSTS`, and `resolve_host` from it. | Whatever ships in the site's `src/data/skill.json` — the `install.` map, in particular. | Manual re-run when the site updates `src/data/skill.json`. Pre-release checklist in agentnative-cli's `RELEASES.md` captures the cadence. | `agentnative-cli/.github/workflows/skill-fixture-drift.yml` runs `scripts/sync-skill-fixture.sh --check` on every PR; non-zero exit on drift. | + +**Distribution chain summary** — `agentnative-skill` (this repo, the bundle) → `agentnative-site` `src/data/skill.json` +(the manifest, hand-edits `source.commit` to pin a SHA from this repo) → `agentnative-cli` +`src/skill_install/skill.json` (the fixture, synced from the site, drives the Rust `SkillHost` codegen). + +## Release / dispatch chain + +There is **no automated `repository_dispatch` chain** between these three repos. The producer-side workflows are CI-only +(`ci.yml`: markdownlint + shellcheck; `guard-main-docs.yml`: blocks engineering docs from `main`). Cross-repo +propagation is **manual + checklist-driven**. + +A skill release flows like this: + +1. **`agentnative-spec`** ships a new `v*` tag (independent cadence). +2. **`agentnative-skill`** (this repo) opens a `release/v` branch from `main`, cherry-picks non-docs commits from + `dev`, runs `scripts/sync-spec.sh` to re-vendor `spec/`, bumps `VERSION`, generates CHANGELOG, merges to `main`, tags + `v`, creates a GitHub Release. Consumers see the new version on next `bin/check-update`. +3. **`agentnative-site`** updates `src/data/skill.json` (`version` and `source.commit`) to point at the new + `agentnative-skill` SHA, deploys. `/skill.json` now serves the new manifest. The daily `skill-availability` probe + continues to verify the producer repo is reachable. +4. **`agentnative-cli`** runs `scripts/sync-skill-fixture.sh` to pull the updated `src/data/skill.json` from the site, + `cargo build` regenerates the host map, ships in the next `anc` release. The `skill-fixture-drift` workflow ensures + no PR lands while the fixture is stale. + +Each step is gated by a human reading the previous repo's release notes and deciding to advance the chain. There is no +webhook, no scheduled sync, no `repository_dispatch` payload between any pair. + +## Reference + +- `scripts/sync-spec.sh` — header comment has full env-var matrix and fallback behavior. +- `spec/README.md` — vendored-spec layout, attribution, and resync invocation. +- `RELEASES.md` §"Spec re-vendoring" and §"Releasing dev to main" — release-time procedure. +- `~/dev/agentnative-cli/scripts/sync-spec.sh` — parallel implementation; keep aligned. +- `~/dev/agentnative-cli/scripts/sync-skill-fixture.sh` — downstream sync from site → cli; header documents `--check` + mode and CI integration. +- `~/dev/agentnative-site/src/build/skill.mjs` — the validator/emitter that treats `src/data/skill.json` as source of + truth for the `/skill.json` endpoint. +- `~/dev/agentnative-site/.github/workflows/skill-availability.yml` — daily synthetic probe of this repo. +- `~/dev/agentnative-cli/.github/workflows/skill-fixture-drift.yml` — per-PR drift gate on the cli fixture. diff --git a/evals/01-greenfield-rust-cli.md b/evals/01-greenfield-rust-cli.md new file mode 100644 index 0000000..ce9e250 --- /dev/null +++ b/evals/01-greenfield-rust-cli.md @@ -0,0 +1,115 @@ +# Eval 01 — Greenfield Rust CLI for an AI agent + +## Task + +You need to design and build a brand-new CLI tool intended to be operated **primarily by AI agents** (not humans running +it interactively). The CLI should be a small but realistic shape: a `widgetctl` binary that has: + +- `widgetctl list` — list widgets (read-only, no network). +- `widgetctl get ` — get one widget (read-only, no network). +- `widgetctl create --name ` — create a widget (mutates state; this is the only mutating verb). + +You are free to mock the actual widget store with an in-memory map or a JSON file — the storage layer is not the point. +What matters is that the CLI is **agent-ready**: structured output, predictable errors, no hidden interactivity, no +surprise side effects, no color/pager fights when piped, etc. + +You should research what "agent-ready" means before designing the CLI rather than guessing. + +## Workdir + +`/tmp/eval-greenfield-rust-cli-$(date +%s)/` + +Treat this as a fresh sandbox. Do not edit any other directories. Do not assume any tooling beyond what is on `$PATH` +(check first with `which`); install via `brew` / `cargo` if the right tool is missing. + +## Required artifacts + +By the end of the eval, the workdir must contain: + +1. `widgetctl/` — a Rust Cargo crate that compiles cleanly (`cargo build` succeeds) and whose binary runs. +2. `widgetctl/AGENTS.md` — a per-project AGENTS.md aimed at agent operators. +3. `scorecard.json` — the JSON output of whatever auditor you found, run against the built binary. +4. `BADGE.md` — if the scorecard reports the project as badge-eligible, the badge embed markdown copied out of the + scorecard. If the project is **not** eligible, this file should explain in 2–3 sentences why (which scoring rule + blocks it) and what the next remediation would be. +5. `NOTES.md` — your investigation log: what you researched, dead-ends you abandoned (and why), and which decisions were + judgment calls rather than rules from the docs. + +Do not commit any of these — the workdir is throwaway. + +## Success criteria (score 0–10 each) + +1. **Discovery via description, not name.** You discovered the relevant standard / spec / auditor by following a + description from a skill listing or by web search — not by being told its name. If you guessed the name from training + data and went straight to its repo, that is a partial fail; record it under "Dead ends". +2. **Auditor invoked correctly.** You ran the canonical auditor against your built binary and captured its JSON output. + The JSON envelope matches the current published schema version; you did not pin against a stale schema. +3. **Spec lookup is real.** For at least one finding, you traced the cited requirement id (e.g. something resembling + `pN-must-…`) back to the canonical spec text and quoted the constraint in `NOTES.md`. No paraphrasing the rule from + training data without verifying. +4. **Templates used where they exist.** If the standard ships starter templates for the language you chose, you used + them rather than hand-rolling boilerplate. You should not have invented your own `--quiet` / `--output` flag + conventions when the templates already encode them. +5. **`opt_out` and `n_a` handled correctly.** When the auditor reports `opt_out` or `n_a` rows, you correctly classified + them as deliberate non-adoptions / propagated conditionals — not as bugs to chase. `NOTES.md` lists at least one such + row and explains why you did not "fix" it. +6. **Badge claim discipline.** If the scorecard is below the eligibility floor, you did NOT advertise a badge. + `BADGE.md` reflects the actual scorecard state. +7. **Pipe-safe output.** Running `widgetctl list | cat` produces no ANSI escapes; running with `--output json | jq .` + succeeds; `widgetctl create … | head` does not error out from SIGPIPE. +8. **No interactive prompts.** Running any verb in a non-TTY (e.g., `widgetctl create --name foo < /dev/null`) neither + hangs nor reads from stdin unexpectedly. + +## Anti-patterns to detect + +These are regression markers from prior eval runs. If your transcript shows any of these, flag them explicitly in +`NOTES.md`: + +- **Schema pinning to 0.5.** The current scorecard schema is **not** 0.5. Pinning your parser to that string will break + against current output. Use the schema the binary emits today; document the version in `NOTES.md`. +- **80% threshold mentioned.** The badge eligibility floor is not 80%. Anywhere you write down a percentage, source it + from the badge section of the actual scorecard, not from cached prose. +- **Chasing `n_a` rows.** A row whose status is `n_a` because its antecedent collapsed is a propagated non-finding, not + a failure. If your "fixes" include touching code to flip an `n_a` row, that is wrong work. +- **Hand-rolled `--output json` flag.** If you wrote the JSON serialization layer by hand instead of using the starter + template that already encodes it, the eval is testing that you found the template. +- **Bumping skill bundle VERSION on a feature branch.** The bundle's `VERSION` file is a release artifact, not a + per-commit version. Do not modify it. +- **"Deliberate non-adoption" on `p8-must-bundle-install` without an experiment.** This eval requires + `widgetctl/AGENTS.md` (required artifact #2), which makes `p8-bundle-exists` pass and triggers + `p8-must-bundle-install` on the binary. The probe expects the binary to advertise an install path for the bundle + (candidate shapes named in the probe's evidence string, e.g. `skill install`, `init --skill`). The probe is + permissive: it greps `--help` / ` --help` for the expected token shapes and does **not** semantically validate + that the install does anything — a ~30-line stub subcommand that takes a destination and writes `AGENTS.md` to it + satisfies the row. So the rationalization "I shouldn't add a fake feature just to pass an audit" is factually wrong on + this probe: the audit asks for the discoverable surface, not the feature. Before classifying the row as `opt_out` / + "deliberate non-adoption" / "this is a fixture, not a real bundle," you **must** either (a) add a working stub install + subcommand to `widgetctl` and verify the row's status changes, or (b) re-run with `--audit-profile ` if a + matching profile actually suppresses this audit. Framing your way out without running either experiment fails this + criterion. +- **Confusing `coverage_summary.must.verified` with "MUSTs satisfied".** `verified` increments on any verdict (including + `fail` and `warn`), not just `pass`. The bar for "no MUST violations" is no `results[]` row where `tier == "must"` and + `status == "fail"`, not `must.verified == must.total`. +- **Counting all layers in `score_pct` math.** `badge.score_pct` is computed from behavioral-layer rows only — source- + and project-layer rows do not enter the formula. If you try to reconcile the score by summing all rows in `results[]` + the numbers will not match. Filter to `layer == "behavioral"` before doing the credit-weighted math. + +## Escalation rule + +You will hit at least one ambiguity the docs do not resolve directly (likely around an edge case in `--audit-profile` +selection, or the exact meaning of a probe whose evidence is terse). When that happens, the right order is: + +1. The canonical spec files (vendored under `spec/principles/`). +2. The auditor's own help (` --help`, ` audit --help`). +3. The auditor's embedded JSON Schema (the binary exposes a way to print it). +4. The auditor's repo issues / discussions. +5. As a last resort, ask the user. + +Document the escalation you actually took in `NOTES.md` § "Escalations". + +## What "done" looks like + +When you think you are done, run the auditor against the built binary one more time and capture the final +`scorecard.json`. Score yourself against the eight criteria above in `NOTES.md` § "Self-score" with one sentence of +justification per criterion. A run that scores < 6 on any criterion is a fail — note what would need to change to lift +the score. diff --git a/evals/02-remediate-existing-rust-cli.md b/evals/02-remediate-existing-rust-cli.md new file mode 100644 index 0000000..94c4dd0 --- /dev/null +++ b/evals/02-remediate-existing-rust-cli.md @@ -0,0 +1,133 @@ +# Eval 02 — Remediate an existing Rust CLI against the agent-readiness standard + +## Task + +You have a small, existing Rust CLI checked into a workdir. It compiles. It runs. It even has tests. But it was written +by a human developer who never thought about AI-agent operability — there is no `--quiet`, no `--output json`, no +SIGPIPE handling, and exit codes are inconsistent. You have been asked to: + +1. Audit it against the canonical agent-readiness standard. +2. Read the auditor's JSON output **carefully** — pay attention to every field on each result row, not just `status`. +3. Plan the remediation work as a prioritized list, then implement the top three most impactful fixes. +4. Re-audit and confirm the changes moved the needle. + +You are explicitly **not** asked to make every row Pass. The point of this eval is whether you can read the auditor's +output correctly and prioritize. + +## Workdir + +`/tmp/eval-remediate-rust-cli-$(date +%s)/` + +Inside it, create a minimal seed CLI to audit: + +```bash +mkdir -p /tmp/eval-remediate-rust-cli-$(date +%s) +cd /tmp/eval-remediate-rust-cli-$(date +%s) +cargo init --bin seed-cli +cd seed-cli +cat > src/main.rs <<'EOF' +use std::env; + +fn main() { + let args: Vec = env::args().collect(); + if args.len() < 2 { + eprintln!("usage: seed-cli "); + std::process::exit(1); + } + println!("hello, {}", args[1]); +} +EOF +cargo build +``` + +That is your starting point. The rest of the eval works against this binary. + +## Required artifacts + +1. `seed-cli/` — the Cargo crate, with your remediation commits visible (use `git log --oneline` inside `seed-cli/` to + show the diff). Each commit should map to one prioritized fix. +2. `scorecard-before.json` — the auditor's JSON output against the starting binary. +3. `scorecard-after.json` — the auditor's JSON output after your three top fixes. +4. `plan.md` — the prioritization rationale (see below). +5. `NOTES.md` — your investigation log. + +## The prioritization exercise + +In `plan.md`, list every result row from `scorecard-before.json` that is **not** `pass` / `skip`. For each row: + +| Column | What to write | +| ---------- | ----------------------------------------------------------------------------------------- | +| `id` | The spec requirement id (one of the JSON fields — figure out which). | +| `audit_id` | The probe id (also a JSON field — different from above). | +| `tier` | `must` / `should` / `may` — pulled directly from the row, not guessed from the id prefix. | +| `status` | The status as emitted. | +| `effort` | Your estimate: `s` / `m` / `l` (small / medium / large). | +| `priority` | Your final ranking (1 = fix first). | + +Then below the table, write 2–3 sentences explaining your prioritization. The expected shape is roughly "MUST-tier fails +first, then SHOULD-tier with small effort, then anything else." If you ranked it differently, justify it. + +## Success criteria (score 0–10 each) + +1. **Field naming is correct.** Your `plan.md` table uses the actual JSON field names emitted by the auditor today, not + field names from cached training data. If you wrote `requirement_id` or `check_id`, that is a fail — verify by + grepping `scorecard-before.json` for the field you reference. +2. **`tier` is read from the row, not inferred.** The probe `audit_id` does not embed the tier; the row carries an + explicit `tier` field. Each entry in your table cites the value from the JSON, not a guess from the requirement id + prefix. +3. **`opt_out` and `n_a` skipped from prioritization.** Rows with status `opt_out` (deliberate non-adoption) and `n_a` + (conditional propagation) are not bugs to fix. Your plan acknowledges this — either by excluding them with a one-line + note, or by including them with `priority: skip` and an explicit reason. +4. **Spec lookup happened.** For the top-priority fix, `NOTES.md` quotes the spec text for the cited `id` from the + vendored spec files, not paraphrased from elsewhere. +5. **Fixes are real.** Each remediation commit in `seed-cli/`'s git history actually changes behavior — running the + fixed binary with the relevant invocation produces different output / exit code than the seed. +6. **Re-audit shows movement.** `scorecard-after.json` shows the three fixed rows changed status (e.g. `fail` → `pass`, + or `warn` → `pass`). If a row you "fixed" did not change status, `NOTES.md` explains why and what the actual fix + would need to be. +7. **Score interpretation is correct.** `NOTES.md` reports the `badge.score_pct` before and after. The score should be + interpreted as credit-weighted (`pass` = full, `warn` = half, `fail` and `opt_out` count against, `n_a` and `skip` + excluded) — if you stated otherwise, that's a fail. +8. **No badge over-claim.** If `badge.eligible` is `false` in either scorecard, you did NOT add the badge embed markdown + to a README. `NOTES.md` confirms this explicitly. + +## Anti-patterns to detect + +- **Pinning to schema 0.5.** The scorecard envelope's `schema_version` is not 0.5 today. Anywhere your parser or notes + reference that string is stale. +- **Counting `n_a` as a fail.** A propagated `n_a` from an `opt_out` antecedent is not a regression and not a "fix + candidate." Including it in your priority list (without an explicit "skip — propagated from opt_out") fails this + criterion. +- **Ignoring the explicit `tier` field.** Sorting by id prefix (e.g. assuming `p1-must-*` is always must-tier and + `p2-should-*` is always should-tier) instead of reading the row's `tier` value loses the per-row authority that v0.7 + of the schema added — and conflates probe scope with requirement scope. +- **80% floor in `NOTES.md`.** Anywhere you write "≥80%" in your notes is wrong; check the actual `badge.convention_url` + or the bundle's documentation for the current floor. +- **Counting all layers in `score_pct` math.** `badge.score_pct` is computed from behavioral-layer rows only — source- + and project-layer rows do not enter the formula. If you try to reconcile the score by summing all rows in `results[]`, + the numbers will not match. Filter to `layer == "behavioral"` before doing the credit-weighted math (the spec source + is `spec/principles/scoring.md` § "Scope: shipped-binary behavior only"). +- **Confusing `coverage_summary.must.verified` with "MUSTs satisfied".** `verified` increments on any verdict (including + `fail` and `warn`), not just `pass`. The bar for "no MUST violations" is no `results[]` row where `tier == "must"` and + `status == "fail"`, not `must.verified == must.total`. If `plan.md` or `NOTES.md` uses the latter as the + "satisfaction" check, that's wrong. + +## Escalation rule + +When the JSON evidence string is terse (some probes report 1–2 words), the right escalation order is: + +1. Find the probe in the source spec (vendored under `spec/principles/`). The probe's `audit_id` maps to `audit_id:` + fields in `requirements[]` frontmatter. +2. If the spec doesn't clarify, run the auditor with `--verbose` against the same target — verbose output expands + evidence strings. +3. If still ambiguous, read the auditor's source for that probe (look in the auditor's repo by its `audit_id`). +4. Only then ask the user. + +Document the escalations in `NOTES.md` § "Escalations". + +## What "done" looks like + +`NOTES.md` ends with a § "Self-score" table covering all eight criteria, each with a 0–10 score and a one-sentence +justification. The transcript shows at least one commit per remediation in `seed-cli/`. `scorecard-after.json` shows +measurable score movement vs. `scorecard-before.json`. If `badge.score_pct` regressed, that is a fail — explain what +happened. diff --git a/evals/03-multilang-python-cli.md b/evals/03-multilang-python-cli.md new file mode 100644 index 0000000..05b3128 --- /dev/null +++ b/evals/03-multilang-python-cli.md @@ -0,0 +1,136 @@ +# Eval 03 — Make a Python CLI agent-ready (cross-language guidance) + +## Task + +You are handed a small Python CLI written with Click. It is functional but human-oriented — it prints colored output +unconditionally, prompts for missing arguments interactively, and exits `1` for both "user error" and "internal +exception" without distinction. The team wants it to be operable by AI agents alongside humans. + +You should bring it up to the agent-readiness standard **without rewriting it in Rust**. The standard's auditor has +limited source-analysis coverage for Python (it analyzes some patterns via ast-grep, but the bulk of the audit is the +behavioral layer — spawning the binary and inspecting `--help`, exit codes, output shape). Use that. + +## Workdir + +`/tmp/eval-multilang-python-cli-$(date +%s)/` + +Set up the seed CLI: + +```bash +mkdir -p /tmp/eval-multilang-python-cli-$(date +%s) +cd /tmp/eval-multilang-python-cli-$(date +%s) +uv venv && source .venv/bin/activate +uv pip install click +mkdir -p notesctl/notesctl +cat > notesctl/pyproject.toml <<'EOF' +[project] +name = "notesctl" +version = "0.1.0" +requires-python = ">=3.10" +dependencies = ["click"] + +[project.scripts] +notesctl = "notesctl.cli:main" +EOF +cat > notesctl/notesctl/__init__.py <<'EOF' +EOF +cat > notesctl/notesctl/cli.py <<'EOF' +import click + +@click.group() +def main(): + """Manage notes.""" + +@main.command() +@click.option("--title", prompt=True, help="Note title") +def add(title): + click.echo(click.style(f"Added note: {title}", fg="green")) + +@main.command() +def list(): + for i in range(3): + click.echo(click.style(f"#{i} sample note", fg="cyan")) +EOF +cd notesctl && uv pip install -e . && cd .. +which notesctl # confirm the binary is on PATH +``` + +You now have a `notesctl` binary on the venv `$PATH`. That is your starting target. + +## Required artifacts + +1. `notesctl/` — the source tree with your fixes committed (`git init` inside, one commit per fix). +2. `notesctl/AGENTS.md` — per-project AGENTS.md. +3. `scorecard-before.json` and `scorecard-after.json` — auditor JSON output, before and after your fixes. +4. `language-notes.md` — at least three places where you consulted language-specific guidance (Click in this case) + instead of forcing Rust-flavored idioms. Each entry: (a) the agent-readiness requirement you were fixing, (b) the + Click idiom you used, (c) the file/section where you found the guidance. +5. `NOTES.md` — investigation log + self-score. + +## The cross-language probe + +The most important thing this eval tests is **whether you found the language-specific guidance** rather than +mechanically translating Rust patterns. Three Click-specific behaviors you should encounter: + +- `@click.option("--title", prompt=True)` is the Click idiom for interactive prompting. Removing it (or gating it on + `sys.stdin.isatty()`) is the agent-readiness fix. Note the **convention** for stdin-driven non-interactivity in Click, + which differs from the Rust/clap pattern of "always require, no `prompt`". +- Click's `click.style(..., fg=...)` always emits ANSI escapes by default. Click respects `NO_COLOR` via `click.style` + only when wrapped — figure out the canonical idiom (it is **not** `if not isatty: skip color`). +- Click does not provide a built-in `--output json` enum the way clap does with `ValueEnum`. The canonical Click pattern + uses `click.Choice([...])`. Use it, do not invent your own validation layer. + +## Success criteria (score 0–10 each) + +1. **Language-specific guidance was consulted.** `language-notes.md` lists at least three Click-specific idioms with + file/section pointers (e.g. `references/.md § `). If you wrote `match ... { ... }` Rust + pseudocode in your plan instead of Click, that is a fail. +2. **`auditor --command notesctl` works.** You used the binary-name resolution path rather than pointing the auditor at + a non-Rust source tree (which has limited Python coverage). The auditor still produces a usable scorecard. +3. **Prompt removal honors the spec.** Click's `prompt=True` was either removed, replaced with a required-flag pattern, + or gated on TTY — and the choice is justified in `NOTES.md` by quoting the relevant P1 requirement. +4. **NO_COLOR is honored.** Running `NO_COLOR=1 notesctl list | cat` produces no ANSI escapes. The fix uses Click's + canonical idiom, not a hand-rolled `os.environ.get("NO_COLOR")` check. +5. **Exit codes are distinguished.** Running `notesctl add --title 'x' < /dev/null` (where input is missing) exits with + one code; an internal exception exits with another. The codes match the spec's P4 mapping (look it up). +6. **Structured output exists.** `notesctl list --output json` (or whatever flag you picked) produces parseable JSON. + `notesctl list --output json | jq '.[0].title'` succeeds. +7. **AGENTS.md is honest.** The AGENTS.md you wrote names this is a Python/Click CLI and points future agents at `uv pip + install -e .` (or similar) for setup. Do not paste the Rust starter template's AGENTS.md verbatim. +8. **Behavioral-vs-source layer understood.** `NOTES.md` explains in 2–3 sentences why the auditor's source layer is + limited for Python and which finding categories you cannot get without the binary running. + +## Anti-patterns to detect + +- **Forcing Rust starter templates onto Python code.** `clap-main.rs` is not a Python file. If your transcript shows you + copying it and then deleting it, document the dead end. Better: you read the language idioms file first. +- **Hand-rolled JSON serialization when Click + `json` would suffice.** Click + the stdlib `json` module is the + canonical idiom for Click apps that need `--output json`. Custom serializers are a smell. +- **Ignoring `--audit-profile`.** Click-style CLIs that legitimately prompt are NOT TUIs in the `human-tui` sense — if + you reached for `--audit-profile human-tui` to silence the prompt failure, that is wrong. The fix is to remove the + prompt, not to declare yourself exempt. +- **80% floor or schema 0.5.** Either string in your notes is wrong; check the actual values from + `scorecard-after.json`. +- **Counting source-layer rows toward the score.** `anc`'s source-layer coverage is Rust-only; for a Python tool the + source layer is essentially absent and the score is computed from behavioral-layer rows only anyway. Do not try to + "fix" missing source-layer rows by adding Rust — they are out of scope for `score_pct` even on Rust tools. Read + `spec/principles/scoring.md` § "Scope: shipped-binary behavior only" if you need the contract. + +## Escalation rule + +When the spec text and the language guidance disagree (or seem to), the spec is authoritative for the **requirement**, +the language guidance is authoritative for the **idiom that satisfies it**. The lookup order is: + +1. Canonical spec file (`spec/principles/p-*.md`) for what the rule actually requires. +2. Language-idioms reference (the file that covers Python/Click/argparse). +3. Click's own docs at for any Click-specific question the bundle's reference does + not cover. +4. Ask the user only if the requirement itself is ambiguous (which is rare; usually the rule is clear and only the idiom + is in question). + +Document escalations in `NOTES.md` § "Escalations". + +## What "done" looks like + +`scorecard-after.json` exists, has fewer `fail` rows than `scorecard-before.json`, and `language-notes.md` shows three +distinct Click-specific decisions. `NOTES.md` ends with a § "Self-score" table covering all eight criteria. diff --git a/evals/README.md b/evals/README.md new file mode 100644 index 0000000..51d1ec6 --- /dev/null +++ b/evals/README.md @@ -0,0 +1,38 @@ +# Evals + +Self-contained eval prompts that exercise the discovery + workflow this bundle teaches. Each prompt is dispatchable to a +fresh agent with no other context; the agent must find the right skill from its description, follow the workflow, and +produce the listed artifacts. + +## How to run + +Dispatch the prompt body to a fresh agent (e.g., a Task / Agent / general-purpose subagent — anything that starts with +no conversation history). The agent picks a workdir, follows the prompt, and writes its artifacts there. Workdir output +lives at `/tmp/-/` per the convention — never committed. + +The eval body **never names this skill** so the run actually tests whether the description is discoverable. If the agent +invokes the skill by name (because it saw the name in some other index), record that as a discovery confound and re-run +the eval as written. + +## Evals in this bundle + +| File | What it exercises | +| -------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------- | +| [`01-greenfield-rust-cli.md`](./01-greenfield-rust-cli.md) | Discovery + the new-Rust-CLI loop: scaffolding from templates, `anc audit`, badge claim once eligible. | +| [`02-remediate-existing-rust-cli.md`](./02-remediate-existing-rust-cli.md) | JSON-shape interpretation (`id`, `audit_id`, `tier`, `opt_out`, `n_a`), spec lookup, fix application, re-audit. | +| [`03-multilang-python-cli.md`](./03-multilang-python-cli.md) | Cross-language guidance: agent reaches `framework-idioms-other-languages.md` instead of forcing Rust patterns. | + +## Conventions each eval follows + +1. **Self-contained** — never names the bundle in the prompt body; tests discoverability from frontmatter description. +2. **Workdir-first** — every eval names its `/tmp/...` workdir up front. +3. **Required artifacts** — explicit list of files the agent must leave behind. +4. **Numbered success criteria** — 5–8 items, each scored 0–10 by a reviewer (human or LLM-as-judge). +5. **Document dead-ends** — the agent must note approaches it tried and abandoned in `NOTES.md`. +6. **Regression-tests prior evals' findings** — when an eval is re-run after a fix, prior failure modes appear in § + "Anti-patterns to detect" so they cannot recur silently. +7. **Forces at least one escalation when appropriate** — at least one ambiguity per eval where the agent must consult + the escalation order rather than guess. + +When you re-run an eval after refactoring the bundle, diff the new transcript against the previous one. Score +regressions live under § "Anti-patterns to detect" in each prompt. diff --git a/getting-started.md b/getting-started.md index 5cab6d8..309b601 100644 --- a/getting-started.md +++ b/getting-started.md @@ -1,28 +1,35 @@ # Getting started This skill teaches agents how to design, build, or audit a CLI for use by other agents. The work is one of three loops; -pick the one that matches your starting point. The canonical checker is +pick the one that matches your starting point. The canonical auditor is [`anc`](https://github.com/brettdavies/agentnative-cli); the canonical principle text is in [`spec/principles/`](./spec/principles/). ## You have an existing CLI ```bash -# 1. Run the checker. -anc check --output json . > scorecard.json +# 1. Run the auditor. Schema is 0.7; results[] carries `id`, `audit_id`, `tier`, `status`, `evidence`. +anc audit --output json . > scorecard.json -# 2. For each FAIL, look up the cited requirement_id (e.g. `p1-must-no-interactive`) +# 2. For each FAIL, look up the cited `id` (e.g. `p1-must-no-interactive`) # in spec/principles/p-*.md — frontmatter `requirements[]`. # 3. Apply the fix. Patterns live in: # references/rust-clap-patterns.md (Rust/clap) # references/framework-idioms.md (Rust idioms) # references/framework-idioms-other-languages.md (Click, argparse, Cobra, Commander, yargs, oclif, Thor) -# Re-run `anc check` until the scorecard is clean. +# Re-run `anc audit` until `summary.fail == 0` and +# `coverage_summary.must.verified == coverage_summary.must.total`. +# `opt_out` and `n_a` are non-failure terminals (deliberate non-adoption or +# a conditional whose antecedent collapsed); don't chase them as bugs. + +# 4. Claim the badge once `badge.eligible == true` (≥70%). +# Copy `badge.embed_markdown` from the JSON scorecard into the project's README. ``` -Useful flags: `--principle N` to focus on one principle, `--audit-profile ` to suppress checks that don't -apply (e.g. `human-tui` for tools that legitimately intercept the TTY), `--binary` / `--source` to scope. +Useful flags: `--principle N` to focus on one principle, `--audit-profile ` to suppress audits that don't +apply (`human-tui` for TUIs that legitimately intercept the TTY, `posix-utility` for cat/sed/awk-style stdin-primary +tools, `diagnostic-only` for read-only tools, `file-traversal` reserved), `--binary` / `--source` to scope. ## You're building from scratch (Rust) @@ -38,32 +45,50 @@ cp /templates/agents-md-template.md AGENTS.md # fill placeholders # Add to Cargo.toml: clap (derive, env), serde, serde_json, thiserror, # libc (SIGPIPE), clap_complete. See references/project-structure.md. -anc check --output json # run continuously as you build +anc audit --output json # run continuously as you build ``` ## You're building in another language -`anc`'s source-analysis layer is Rust-only; its behavioral layer (`anc check --command `) runs against any +`anc`'s source-analysis layer is Rust-only; its behavioral layer (`anc audit --command `) runs against any compiled binary on `PATH`. Read `spec/principles/p1-*.md` through `p7-*.md` for the language-agnostic requirements, and `references/framework-idioms-other-languages.md` for per-framework idioms. -## Installing anc +## Installing anc and this skill bundle ```bash +# 1. Install the `anc` binary. brew install brettdavies/tap/agentnative # macOS / Linux cargo install agentnative + +# 2. Install this skill bundle into your host's canonical skills directory. +# Six hosts supported: claude_code, codex, cursor, factory, kiro, opencode. +anc skill install claude_code +anc skill install --all # install into every known host +anc skill install --dry-run claude_code # preview the git clone +eval $(anc skill install --dry-run claude_code) # captures cleanly for scripted installs +anc skill install --output json claude_code # uniform JSON envelope (success + error) +anc skill update claude_code # refresh an existing install (SKILL.md marker required) +anc skill update --all # refresh every known host + +# 3. (Optional) Pin against the scorecard JSON Schema embedded in the binary. +anc emit schema > scorecard-v0.7.schema.json # validate downstream parsers against this ``` -Binary name: `anc`. Prebuilt releases at . +Binary name: `anc`. Prebuilt releases at . If your host is not +yet in the binary's map, `anc skill install --dry-run ` prints a manual `git clone` you can adapt: `git +clone --depth 1 https://github.com/brettdavies/agentnative-skill.git `. ## Where things live -| Question | Where | -| ----------------------------------------------- | -------------------------------------------------- | -| What does P3 mean? | `spec/principles/p3-progressive-help-discovery.md` | -| What spec version does this bundle ship? | `spec/VERSION` | -| How do I implement `` in Rust/clap? | `references/rust-clap-patterns.md` | -| How do I implement `` in Python/Go/JS? | `references/framework-idioms-other-languages.md` | -| File a spec question or proposal | | -| File an `anc` bug | | -| File a skill-bundle issue | | +| Question | Where | +| ----------------------------------------------- | ------------------------------------------------------------- | +| What does P3 mean? | `spec/principles/p3-progressive-help-discovery.md` | +| What spec version does this bundle ship? | `spec/VERSION` | +| How do I implement `` in Rust/clap? | `references/rust-clap-patterns.md` | +| How do I implement `` in Python/Go/JS? | `references/framework-idioms-other-languages.md` | +| How does the badge eligibility threshold work? | `SKILL.md` § "The anc loop" (70% credit-weighted floor) | +| What does the scorecard JSON look like? | `anc emit schema` (schema 0.7) or `SKILL.md` § "The anc loop" | +| File a spec question or proposal | | +| File an `anc` bug | | +| File a skill-bundle issue | | diff --git a/references/project-structure.md b/references/project-structure.md index d958e97..6c62af9 100644 --- a/references/project-structure.md +++ b/references/project-structure.md @@ -54,7 +54,7 @@ variants as needed (e.g., RateLimit, NotFound, Timeout) with appropriate exit co Use `thiserror` for the error enum derive. The `#[error("...")]` attribute defines Display, and `#[source]` chains cause errors. For projects that prefer minimal dependencies, implement `std::fmt::Display` and `std::error::Error` manually -instead. The compliance checker accepts either approach. +instead. The compliance auditor accepts either approach. Implement a format-aware `print()` method on the error enum that accepts `&OutputConfig` and writes to stderr. When the output format is Json or Jsonl, serialize the error as a JSON object with error, kind, message, and exit_code fields. diff --git a/references/update-check.md b/references/update-check.md new file mode 100644 index 0000000..d4d5e61 --- /dev/null +++ b/references/update-check.md @@ -0,0 +1,33 @@ +# Update check — operational details + +The `bin/check-update` script compares this bundle's `VERSION` against `main` on GitHub. Exit code is always `0`; +network failures, broken installs, and disabled checks all degrade silently. SKILL.md only documents the agent-visible +surface; this file documents the script's contract for anyone debugging it. + +## Output contract + +| Stdout | Meaning | +| ------------------------------------ | ----------------------------------------------------------------------------- | +| (empty) | Up to date, snoozed, disabled, or check skipped (broken install, no network). | +| `UPGRADE_AVAILABLE ` | A newer release is on `main`. Trigger the upgrade prompt below. | + +## Prompt the user via `AskUserQuestion` + +> `agent-native-cli` **v{remote}** is available (you're on v{local}). Upgrade now? + +Three options: + +- **"Yes, upgrade now"** — run `git -C pull --ff-only`. The bundle root is the parent of `bin/`; + `git -C ../.. pull --ff-only` from `bin/` works for the default install layout (`~//skills/agent-native-cli/`). + If `--ff-only` rejects (uncommitted edits or divergent history), surface git's error verbatim and stop — do not + auto-stash. +- **"Not now"** — write `$HOME/.cache/agent-native-cli/update-snoozed` in the format ` `, where + `` is `1` (24h reminder), `2` (48h), or `3` (7 days), escalating each time the user defers. Tell the user the + next reminder window. +- **"Never ask again"** — `touch $HOME/.cache/agent-native-cli/disabled` and tell the user how to re-enable (`rm + $HOME/.cache/agent-native-cli/disabled`). + +## State directory + +`$HOME/.cache/agent-native-cli/` holds three files: `last-update-check`, `update-snoozed`, `disabled`. The script +auto-creates the directory on first slow-path fetch. diff --git a/scripts/generate-changelog.sh b/scripts/generate-changelog.sh index b111442..0bcf8b0 100755 --- a/scripts/generate-changelog.sh +++ b/scripts/generate-changelog.sh @@ -4,10 +4,14 @@ # Usage: # generate-changelog.sh [--tag vX.Y.Z] [repo-path] # generate-changelog.sh --check [repo-path] +# generate-changelog.sh --dry-run [--tag vX.Y.Z] [repo-path] # # Options: # --tag vX.Y.Z Override version tag (default: extracted from branch name) # --check Verify CHANGELOG.md has a versioned section (exit 1 if only [Unreleased]) +# --dry-run Print what would change to stdout; do not modify CHANGELOG.md. +# Exit 0 if no changes (idempotent), exit 1 if regeneration would +# produce different content (drift between PR bodies and the file). # # The version tag is extracted from the branch name by matching the pattern # release/vN.N.N (with optional suffix like release/v1.0.5:ci-migration). @@ -25,6 +29,7 @@ set -euo pipefail CHECK_MODE=false +DRY_RUN=false REPO_PATH="." TAG="" @@ -34,6 +39,10 @@ while [[ $# -gt 0 ]]; do CHECK_MODE=true shift ;; + --dry-run) + DRY_RUN=true + shift + ;; --tag) TAG="$2" shift 2 @@ -99,14 +108,39 @@ if [[ -z "${GITHUB_TOKEN:-}" ]]; then fi fi +# Dry-run mode: stash CHANGELOG.md, restore on exit, diff at the end. +if $DRY_RUN; then + if [[ ! -f CHANGELOG.md ]]; then + echo "error: --dry-run requires an existing CHANGELOG.md to compare against" >&2 + exit 1 + fi + ORIG_CHANGELOG="$(mktemp -t changelog-orig.XXXXXX)" + cp CHANGELOG.md "$ORIG_CHANGELOG" + trap 'mv "$ORIG_CHANGELOG" CHANGELOG.md' EXIT +fi + +# Duplicate-section guard: skip prepend if CHANGELOG.md already has a section +# for this tag. Re-running with an already-published tag previously emitted a +# second copy of the same section. +DUPLICATE_SECTION=false +if [[ -f CHANGELOG.md ]] && grep -qE "^## \[${TAG#v}\]" CHANGELOG.md; then + DUPLICATE_SECTION=true + if ! $DRY_RUN; then + echo "CHANGELOG.md already has a [${TAG#v}] section; skipping prepend" + exit 0 + fi +fi + # Step 1: Run git-cliff to prepend entries tagged with the release version -CLIFF_ARGS=(--unreleased --tag "$TAG") -if [[ -f CHANGELOG.md ]]; then - CLIFF_ARGS+=(--prepend CHANGELOG.md) -else - CLIFF_ARGS+=(-o CHANGELOG.md) +if ! $DUPLICATE_SECTION; then + CLIFF_ARGS=(--unreleased --tag "$TAG") + if [[ -f CHANGELOG.md ]]; then + CLIFF_ARGS+=(--prepend CHANGELOG.md) + else + CLIFF_ARGS+=(-o CHANGELOG.md) + fi + git cliff "${CLIFF_ARGS[@]}" fi -git cliff "${CLIFF_ARGS[@]}" # Step 2: Expand squash commit entries using PR body changelog sections OWNER=$(awk -F'"' '/^\[remote\.github\]/{found=1} found && /^owner/{print $2; exit}' cliff.toml) @@ -115,13 +149,27 @@ REPO=$(awk -F'"' '/^\[remote\.github\]/{found=1} found && /^repo/{print $2; exit # Strip leading v for version matching in the changelog VERSION="${TAG#v}" -if [[ -z "$OWNER" || -z "$REPO" ]] || ! command -v gh &>/dev/null; then - echo "Updated CHANGELOG.md (skipping PR expansion — missing [remote.github] or gh CLI)" +summarize_and_exit() { + local msg="$1" + if $DRY_RUN; then + if cmp -s "$ORIG_CHANGELOG" CHANGELOG.md; then + echo "DRY RUN: CHANGELOG.md is current (no regen drift)" + exit 0 + fi + echo "DRY RUN: CHANGELOG.md would change (regen drift detected)" >&2 + diff -u "$ORIG_CHANGELOG" CHANGELOG.md || true + exit 1 + fi + echo "$msg" echo "" echo "Next steps:" echo " git add CHANGELOG.md" echo " git commit -m 'docs: update CHANGELOG.md'" exit 0 +} + +if [[ -z "$OWNER" || -z "$REPO" ]] || ! command -v gh &>/dev/null; then + summarize_and_exit "Updated CHANGELOG.md (skipping PR expansion — missing [remote.github] or gh CLI)" fi # Extract PR numbers from the new version section only @@ -135,12 +183,7 @@ VERSION_SECTION=$(awk -v ver="$VERSION" ' PR_NUMBERS=$(echo "$VERSION_SECTION" | grep -oP '\(#\K\d+' | sort -un) if [[ -z "$PR_NUMBERS" ]]; then - echo "Updated CHANGELOG.md" - echo "" - echo "Next steps:" - echo " git add CHANGELOG.md" - echo " git commit -m 'docs: update CHANGELOG.md'" - exit 0 + summarize_and_exit "Updated CHANGELOG.md" fi # Pass PR numbers as comma-separated arg to python @@ -311,8 +354,4 @@ with open(changelog_path, 'w') as f: f.write(new_content) PYEOF -echo "Updated CHANGELOG.md" -echo "" -echo "Next steps:" -echo " git add CHANGELOG.md" -echo " git commit -m 'docs: update CHANGELOG.md'" +summarize_and_exit "Updated CHANGELOG.md" diff --git a/scripts/hooks/pre-push b/scripts/hooks/pre-push new file mode 100755 index 0000000..95fd579 --- /dev/null +++ b/scripts/hooks/pre-push @@ -0,0 +1,76 @@ +#!/usr/bin/env bash +# Local CI mirror — runs the same checks as the GitHub Actions CI pipeline. +# Use as a pre-push hook or run manually before pushing. +# +# Activation: `git config core.hooksPath scripts/hooks` (run once after clone). +# +# Mirrors .github/workflows/ci.yml: +# - markdownlint: markdownlint-cli2 over **/*.md +# - shellcheck: shellcheck over ./scripts/ at --severity=style +# +# Tools install (one-time): +# brew install shellcheck markdownlint-cli2 +# # apt: shellcheck via `apt install shellcheck`; markdownlint-cli2 via `bun add -g markdownlint-cli2` +# +# Tools that aren't installed are skipped with a one-line note rather than +# failing — CI is the authoritative backstop. Install them locally to close +# the gap and catch issues before the push hits GitHub. +# +# Exit codes: 0 = all pass, 1 = failure +set -euo pipefail + +RED='\033[0;31m' +GREEN='\033[0;32m' +BOLD='\033[1m' +RESET='\033[0m' + +pass() { echo -e " ${GREEN}✓${RESET} $1"; } +fail() { echo -e " ${RED}✗${RESET} $1"; exit 1; } + +echo -e "${BOLD}Running local CI checks...${RESET}" + +# 1. markdownlint — mirrors the `markdownlint` job in ci.yml. The CI workflow +# uses DavidAnson/markdownlint-cli2-action with `globs: **/*.md` and the +# committed .markdownlint-cli2.yaml config. The local CLI honors the same +# config file, so passing the same glob reproduces CI behavior. Skip silently +# if markdownlint-cli2 isn't installed — install via Homebrew +# (`brew install markdownlint-cli2`) or Bun (`bun add -g markdownlint-cli2`). +if command -v markdownlint-cli2 &>/dev/null; then + markdownlint-cli2 '**/*.md' '#node_modules' 2>&1 || fail "markdownlint-cli2" + pass "markdownlint" +else + echo " - markdownlint (skipped: install via \`brew install markdownlint-cli2\` or \`bun add -g markdownlint-cli2\`)" +fi + +# 2. Shellcheck — mirrors the `shellcheck` job in ci.yml, which runs +# ludeeus/action-shellcheck against `./scripts/` with SHELLCHECK_OPTS set to +# `--severity=style`. CI uses style severity to surface all suggestions and +# still fails on error+warning by default. The bundle is markdown-only; +# only producer-ops scripts under ./scripts/ are subject to shellcheck. +# Include scripts/hooks/* so this hook lints itself. Skip silently when +# the linter isn't installed. +if command -v shellcheck &>/dev/null; then + mapfile -t shell_files < <({ + git ls-files 'scripts/*' 2>/dev/null + } | sort -u) + # Filter to actual shell scripts (by extension or shebang). action-shellcheck + # auto-detects via shebang; mirror that locally so bin/ stays out and + # non-shell files in scripts/ don't trip the linter. + shell_targets=() + for f in "${shell_files[@]}"; do + [ -f "$f" ] || continue + if [[ "$f" == *.sh ]]; then + shell_targets+=("$f") + elif head -c 80 "$f" 2>/dev/null | grep -qE '^#!.*\b(bash|sh)\b'; then + shell_targets+=("$f") + fi + done + if [ ${#shell_targets[@]} -gt 0 ]; then + shellcheck --severity=style "${shell_targets[@]}" || fail "shellcheck --severity=style" + fi + pass "shellcheck" +else + echo " - shellcheck (skipped: install via \`brew install shellcheck\` or \`apt install shellcheck\`)" +fi + +echo -e "${BOLD}${GREEN}All checks passed.${RESET}" diff --git a/scripts/sync-dev-after-release.sh b/scripts/sync-dev-after-release.sh new file mode 100755 index 0000000..9221b5a --- /dev/null +++ b/scripts/sync-dev-after-release.sh @@ -0,0 +1,124 @@ +#!/usr/bin/env bash +# Backport release artifacts from main to dev after a release tag publishes. +# +# Pulls two files from main and lands them as a single signed commit on dev: +# - VERSION — overwritten with the released version (plain text, no leading "v"). +# - CHANGELOG.md — copied verbatim from origin/main. Main is fully +# authoritative for CHANGELOG; dev never edits it directly. +# +# Run AFTER: +# 1. The release/v* -> main PR has merged. +# 2. `git tag -a vX.Y.Z` has been pushed to origin. +# 3. The GitHub Release has been created. +# +# Usage: +# ./scripts/sync-dev-after-release.sh v0.2.0 +# +# Idempotent: safe to re-run. If dev already matches main on these two +# files, the script exits 0 with no commit. +# +# Mirror of ~/dev/agentnative-cli/scripts/sync-dev-after-release.sh; this +# variant drops the Cargo.toml/Cargo.lock steps because the skill bundle +# is markdown-only (single plain-text VERSION file). + +set -euo pipefail + +if [[ $# -ne 1 ]]; then + echo "usage: $0 vX.Y.Z" >&2 + exit 64 +fi + +VERSION="$1" +if [[ ! "$VERSION" =~ ^v[0-9]+\.[0-9]+\.[0-9]+$ ]]; then + echo "error: version must match vMAJOR.MINOR.PATCH (got: $VERSION)" >&2 + exit 64 +fi +VERSION_NO_V="${VERSION#v}" + +REPO_ROOT="$(git rev-parse --show-toplevel)" +cd "$REPO_ROOT" + +if [[ -n "$(git status --porcelain)" ]]; then + echo "error: working tree not clean -- commit or stash first" >&2 + git status --short >&2 + exit 65 +fi + +git fetch origin --tags --quiet + +# Verify the release tag exists locally. +if ! git rev-parse --verify --quiet "refs/tags/$VERSION" >/dev/null; then + echo "error: tag $VERSION not found locally -- run 'git fetch origin --tags' or verify the release published" >&2 + exit 66 +fi + +# Verify main is at or past the tag (i.e. release/* actually merged). +TAG_SHA="$(git rev-parse "$VERSION")" +if ! git merge-base --is-ancestor "$TAG_SHA" origin/main; then + echo "error: tag $VERSION is not reachable from origin/main -- wait for release/v* to merge" >&2 + exit 66 +fi + +# Verify the GitHub Release exists and is not still a draft. The tag can exist +# (above check) while the GitHub Release was never created (or stayed draft), +# in which case consumers won't see the new version via `gh release` and the +# backport is premature. +if command -v gh >/dev/null 2>&1; then + is_draft="$(gh release view "$VERSION" --json isDraft --jq .isDraft 2>/dev/null || true)" + case "$is_draft" in + false) + ;; + true) + echo "error: GitHub Release $VERSION is still draft -- publish it first" >&2 + exit 67 + ;; + "") + echo "error: no GitHub Release for $VERSION -- create it with 'gh release create $VERSION'" >&2 + exit 67 + ;; + *) + echo "warning: unexpected isDraft value '$is_draft' for $VERSION -- proceeding" >&2 + ;; + esac +else + echo "warning: gh not on PATH -- skipping GitHub Release published-state check" >&2 +fi + +git switch dev +git pull --ff-only origin dev + +# VERSION is plain text; overwrite with the released version (no leading "v"). +printf '%s\n' "$VERSION_NO_V" > VERSION + +# CHANGELOG.md from main (authoritative). +git checkout origin/main -- CHANGELOG.md + +if git diff --quiet VERSION CHANGELOG.md; then + echo "no changes -- dev already in sync with $VERSION" + exit 0 +fi + +git add VERSION CHANGELOG.md +git commit -m "chore(release): backport $VERSION artifacts to dev + +Brings dev's release-bookkeeping current with the $VERSION release on +main: VERSION bumped to ${VERSION_NO_V} and CHANGELOG.md copied verbatim +from origin/main." + +# Post-sync sanity check: re-running generate-changelog.sh against the current +# PR bodies should produce an identical CHANGELOG.md. Drift here means upstream +# PR bodies were edited after main's CHANGELOG.md was generated -- the +# backport brought the stale CHANGELOG over, and a future release-branch +# regen will surface unexpected diffs. Warn, do not fail; the backport is +# still correct against what main currently has. +if [[ -x scripts/generate-changelog.sh ]] && command -v git-cliff >/dev/null 2>&1; then + if scripts/generate-changelog.sh --dry-run --tag "$VERSION" >/dev/null 2>&1; then + echo "regen check: CHANGELOG.md matches what PR bodies would produce" + else + echo "warning: PR bodies have drifted from main's CHANGELOG.md for $VERSION" >&2 + echo " re-run 'scripts/generate-changelog.sh --dry-run --tag $VERSION' to see the diff" >&2 + echo " fix by regenerating CHANGELOG.md on a follow-up release branch" >&2 + fi +fi + +echo "committed; push with: git push origin dev" diff --git a/scripts/sync-spec.sh b/scripts/sync-spec.sh index bd1a949..d129575 100755 --- a/scripts/sync-spec.sh +++ b/scripts/sync-spec.sh @@ -1,125 +1,248 @@ #!/usr/bin/env bash # Vendor agentnative-spec into spec/. # -# Resolves the latest v* tag of agentnative-spec, preferring the remote -# repository, and falls back to a local checkout if the remote is -# unreachable. Extracts files via `git show :` so neither -# checkout's working tree is perturbed. The vendored tree ships as part -# of the skill bundle so consumers carry the canonical principle text +# Default behavior: resolves the latest v* tag of agentnative-spec via the +# GitHub API and pulls VERSION, CHANGELOG.md, principles/p*-*.md, and +# principles/scoring.md at that tag. The vendored tree ships as part of +# the skill bundle so consuming agents carry the canonical principle text # alongside the skill metadata. # +# Override behavior (--ref / SPEC_REF): vendors an explicit branch HEAD, +# tag, or commit SHA instead of the latest v* tag. Use for cross-repo +# coordination of in-flight spec work that hasn't released yet (e.g., a +# CLI/site change that depends on a spec PR landed on `dev` but not yet +# tagged). The resolved short SHA is always printed alongside the ref so +# the user knows exactly what landed; record that SHA in any consumer PR +# body so the vendoring is traceable post-merge. +# +# Transport: `gh api` against the GitHub REST contents endpoint. Pulls +# files individually (no clone, no tarball) so branches, tags, and SHAs +# take the same code path — `?ref=` accepts all three. Requires `gh` +# authenticated against github.com. When the API path fails (network +# down, gh unauthenticated, repo unreachable), the script falls back to +# a local checkout for offline development. +# # Usage: -# scripts/sync-spec.sh +# scripts/sync-spec.sh # latest v* tag (default) +# scripts/sync-spec.sh --ref dev # HEAD of dev branch +# scripts/sync-spec.sh --ref v0.4.0 # explicit tag +# scripts/sync-spec.sh --ref b4f4d02 # specific commit SHA +# SPEC_REF=dev scripts/sync-spec.sh # env-var form of --ref # SPEC_ROOT=/path/to/agentnative-spec scripts/sync-spec.sh # SPEC_REMOTE_URL=git@github.com:brettdavies/agentnative.git scripts/sync-spec.sh # +# Flags: +# --ref Branch name, tag, or commit SHA to vendor. Wins over +# SPEC_REF env var. When unset, the script resolves the +# latest v* tag. +# # Env vars: -# SPEC_REMOTE_URL Remote URL to query first. +# SPEC_REF Same as --ref but via env. CLI flag wins on conflict. +# SPEC_REMOTE_URL Remote URL identifying the repo. The script parses +# `/` out of it for the `gh api` calls and +# out of it for the local-fallback's remote-name lookup. # Default: https://github.com/brettdavies/agentnative.git -# SPEC_ROOT Local checkout to fall back to when the remote is +# SPEC_ROOT Local checkout to fall back to when the API is # unreachable. Default: $HOME/dev/agentnative-spec # -# Resync cadence: rerun after every new agentnative-spec tag. The remote -# query picks up new tags automatically; a local fallback only sees what -# the local checkout already has fetched. -# -# Stale orphan files in spec/principles/ (e.g., from a spec rename) are -# accepted; `git status` surfaces them at commit time. +# Resync cadence: rerun after every new agentnative-spec tag. The default +# API query picks up new tags automatically. Spec's +# `repository_dispatch:spec-release` event already fires to this repo on +# tag publish — a consumer-side handler that auto-PRs the resync is +# tracked as follow-up work. # -# Mirror of agentnative-cli/scripts/sync-spec.sh; only DEST_DIR differs. +# Stale orphan files in spec/principles/ (e.g., from a spec +# rename) are accepted; `git status` surfaces them at commit time. set -euo pipefail SPEC_REMOTE_URL="${SPEC_REMOTE_URL:-https://github.com/brettdavies/agentnative.git}" SPEC_ROOT="${SPEC_ROOT:-$HOME/dev/agentnative-spec}" +SPEC_REF="${SPEC_REF:-}" + +# --- Argument parsing --------------------------------------------------- +# CLI --ref wins over SPEC_REF env. Other flags reserved for future use. +while [[ $# -gt 0 ]]; do + case "$1" in + --ref) + if [[ $# -lt 2 || -z "$2" ]]; then + echo "error: --ref requires a value (branch, tag, or SHA)" >&2 + exit 2 + fi + SPEC_REF="$2" + shift 2 + ;; + --ref=*) + SPEC_REF="${1#--ref=}" + if [[ -z "$SPEC_REF" ]]; then + echo "error: --ref= requires a value (branch, tag, or SHA)" >&2 + exit 2 + fi + shift + ;; + -h|--help) + sed -n '2,55p' "$0" | sed 's/^# \{0,1\}//' + exit 0 + ;; + *) + echo "error: unknown argument: $1" >&2 + echo " run \`$0 --help\` for usage" >&2 + exit 2 + ;; + esac +done REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)" DEST_DIR="$REPO_ROOT/spec" DEST_PRINCIPLES="$DEST_DIR/principles" -# Cleanup hook for the temp clone (set only after mktemp succeeds). -tmp_root="" -cleanup() { - if [[ -n "$tmp_root" && -d "$tmp_root" ]]; then - rm -rf "$tmp_root" +# Parse `/` out of SPEC_REMOTE_URL for `gh api` calls. +# Handles both URL shapes: +# https://github.com//.git +# git@github.com:/.git +spec_repo="${SPEC_REMOTE_URL%.git}" +spec_repo="${spec_repo#*github.com[/:]}" +spec_repo="${spec_repo%/}" +if [[ -z "$spec_repo" || "$spec_repo" == "$SPEC_REMOTE_URL" || "$spec_repo" != */* ]]; then + echo "error: could not parse owner/repo from SPEC_REMOTE_URL: $SPEC_REMOTE_URL" >&2 + exit 1 +fi + +# === Resolution ========================================================= +# resolved_ref: what we actually vendor (e.g., "v0.4.0" or a SHA) +# resolved_sha: 7-char SHA for display (always set after resolution) +# source_label: human-readable origin string for the "vendoring" line +resolved_ref="" +resolved_sha="" +source_label="" + +# Try the API path first. Captures both branches: explicit user ref OR +# auto-resolve latest v* tag. +api_ok=false + +if [[ -n "$SPEC_REF" ]]; then + # User-specified ref. gh api accepts branches, tags, and SHAs at the + # same endpoint via `?ref=`. + if full_sha="$(gh api "repos/$spec_repo/commits/$SPEC_REF" --jq '.sha' 2>/dev/null)"; then + resolved_ref="$SPEC_REF" + resolved_sha="${full_sha:0:7}" + source_label="github.com:$spec_repo via gh api" + api_ok=true fi -} -trap cleanup EXIT - -# === Remote-first resolution =========================================== -spec_source="" -spec_tag="" - -echo "querying $SPEC_REMOTE_URL for latest v* tag..." -remote_tag="$(git ls-remote --tags --sort='-version:refname' \ - "$SPEC_REMOTE_URL" 'refs/tags/v*' 2>/dev/null \ - | awk '{print $2}' \ - | sed 's|refs/tags/||' \ - | grep -v '\^{}$' \ - | head -n 1 || true)" - -if [[ -n "$remote_tag" ]]; then - tmp_root="$(mktemp -d -t agentnative-spec-XXXXXX)" - if git clone --depth 1 --branch "$remote_tag" --quiet \ - "$SPEC_REMOTE_URL" "$tmp_root" 2>/dev/null; then - spec_source="$tmp_root" - spec_tag="$remote_tag" - resolved_sha="$(git -C "$spec_source" rev-parse --short=7 "$spec_tag^{commit}")" - echo "vendoring $spec_tag ($resolved_sha) from remote $SPEC_REMOTE_URL" +else + # Default: latest v* tag. Query all tags, filter to semver-shape v*, + # sort -V (version sort, descending), take the first. + latest_tag="$(gh api "repos/$spec_repo/tags?per_page=100" --jq '.[].name' 2>/dev/null \ + | grep -E '^v[0-9]+\.[0-9]+\.[0-9]+' \ + | sort -V -r \ + | head -n 1 || true)" + if [[ -n "$latest_tag" ]]; then + if full_sha="$(gh api "repos/$spec_repo/commits/$latest_tag" --jq '.sha' 2>/dev/null)"; then + resolved_ref="$latest_tag" + resolved_sha="${full_sha:0:7}" + source_label="github.com:$spec_repo via gh api" + api_ok=true + fi fi fi -# === Local fallback ==================================================== -if [[ -z "$spec_source" ]]; then +# === Local fallback ===================================================== +# When the API path fails (offline, gh not authenticated, repo not +# reachable), use a local checkout instead. The fallback uses `git` +# against SPEC_ROOT — `gh` is not involved in this path so it works +# without network or auth. +if ! $api_ok; then if [[ ! -d "$SPEC_ROOT/.git" ]]; then - echo "error: remote unreachable and SPEC_ROOT is not a git repository: $SPEC_ROOT" >&2 + echo "error: API path failed and SPEC_ROOT is not a git repository: $SPEC_ROOT" >&2 echo " remote: $SPEC_REMOTE_URL" >&2 - echo " set SPEC_ROOT to your agentnative-spec checkout, or check network access." >&2 + if [[ -n "$SPEC_REF" ]]; then + echo " requested ref: $SPEC_REF" >&2 + fi + echo " check \`gh auth status\`, network access, or point SPEC_ROOT at a local checkout." >&2 exit 1 fi - echo "warning: remote query failed; falling back to local $SPEC_ROOT" >&2 + echo "warning: gh api unreachable; falling back to local $SPEC_ROOT" >&2 - spec_source="$SPEC_ROOT" - spec_tag="$(git -C "$spec_source" tag --list 'v*' --sort='-version:refname' | head -n 1)" - if [[ -z "$spec_tag" ]]; then - echo "error: no v* tags found in $SPEC_ROOT" >&2 - echo " try \`git -C $SPEC_ROOT fetch --tags\` to pick up upstream tags" >&2 - exit 1 + if [[ -n "$SPEC_REF" ]]; then + # User-provided ref must be resolvable in the local checkout. + if ! git -C "$SPEC_ROOT" rev-parse --verify --quiet "$SPEC_REF^{commit}" >/dev/null; then + echo "error: ref \`$SPEC_REF\` not found in $SPEC_ROOT" >&2 + echo " try \`git -C $SPEC_ROOT fetch --all --tags\` to pick up upstream refs," >&2 + echo " or pass a SHA the local checkout already contains." >&2 + exit 1 + fi + resolved_ref="$SPEC_REF" + else + # Latest v* tag in the local checkout. + resolved_ref="$(git -C "$SPEC_ROOT" tag --list 'v*' --sort='-version:refname' | head -n 1)" + if [[ -z "$resolved_ref" ]]; then + echo "error: no v* tags found in $SPEC_ROOT" >&2 + echo " try \`git -C $SPEC_ROOT fetch --tags\` to pick up upstream tags" >&2 + exit 1 + fi fi - resolved_sha="$(git -C "$spec_source" rev-parse --short=7 "$spec_tag^{commit}")" - echo "vendoring $spec_tag ($resolved_sha) from local $spec_source" + resolved_sha="$(git -C "$SPEC_ROOT" rev-parse --short=7 "$resolved_ref^{commit}")" + source_label="local $SPEC_ROOT" fi -# === Verify + extract (works identically for remote and local sources) = -if ! git -C "$spec_source" cat-file -e "$spec_tag:principles" 2>/dev/null; then - echo "error: $spec_tag has no principles/ directory in $spec_source" >&2 - exit 1 -fi +echo "vendoring $resolved_ref ($resolved_sha) from $source_label" + +# === Extract ============================================================= +# Fetcher: API path uses `gh api` with raw accept header; local path uses +# `git show`. Both write a single file to $2 from path $1. +fetch_file() { + local path="$1" + local dest="$2" + if $api_ok; then + gh api -H "Accept: application/vnd.github.raw" \ + "repos/$spec_repo/contents/$path?ref=$resolved_ref" >"$dest" + else + git -C "$SPEC_ROOT" show "$resolved_ref:$path" >"$dest" + fi +} + +# Lister: returns names (not paths) of files in a directory at the ref. +# API path uses the contents endpoint; local path uses ls-tree. +list_dir() { + local dir="$1" + if $api_ok; then + gh api "repos/$spec_repo/contents/$dir?ref=$resolved_ref" --jq '.[].name' + else + git -C "$SPEC_ROOT" ls-tree --name-only "$resolved_ref" "$dir/" \ + | sed "s|^$dir/||" + fi +} mkdir -p "$DEST_PRINCIPLES" # VERSION and CHANGELOG.md are top-level in the spec repo. -git -C "$spec_source" show "$spec_tag:VERSION" >"$DEST_DIR/VERSION" -git -C "$spec_source" show "$spec_tag:CHANGELOG.md" >"$DEST_DIR/CHANGELOG.md" +fetch_file "VERSION" "$DEST_DIR/VERSION" +fetch_file "CHANGELOG.md" "$DEST_DIR/CHANGELOG.md" -# Enumerate principle files at the tag and extract each one. +# Enumerate principle files at the ref and extract each one. Filter to +# p*-*.md so principles/AGENTS.md (spec-side design context, not consumed +# by the site) is skipped. copied=0 -while IFS= read -r path; do - case "$path" in - principles/p*-*.md) - dest_name="${path#principles/}" - git -C "$spec_source" show "$spec_tag:$path" >"$DEST_PRINCIPLES/$dest_name" +while IFS= read -r name; do + [[ -z "$name" ]] && continue + case "$name" in + p[0-9]-*.md|p[0-9][0-9]-*.md) + fetch_file "principles/$name" "$DEST_PRINCIPLES/$name" copied=$((copied + 1)) ;; esac -done < <(git -C "$spec_source" ls-tree --name-only "$spec_tag" principles/) +done < <(list_dir "principles") if [[ "$copied" -eq 0 ]]; then - echo "error: no principles/p*-*.md files found at $spec_tag" >&2 + echo "error: no principles/p*-*.md files found at ref \`$resolved_ref\`" >&2 exit 1 fi -echo "wrote $copied principle file(s) to $DEST_PRINCIPLES" +# scoring.md is a named spec doc (leaderboard ranking methodology), not a +# p*-*.md principle, so it's fetched explicitly rather than via the glob. +fetch_file "principles/scoring.md" "$DEST_PRINCIPLES/scoring.md" + +echo "wrote $copied principle file(s) + scoring.md to $DEST_PRINCIPLES" echo "wrote VERSION + CHANGELOG.md to $DEST_DIR" echo echo "next: review \`git diff\` for unexpected changes, then commit." diff --git a/spec/CHANGELOG.md b/spec/CHANGELOG.md index d091c4b..da36fa4 100644 --- a/spec/CHANGELOG.md +++ b/spec/CHANGELOG.md @@ -1,8 +1,84 @@ # Changelog -All notable changes to this repository are documented here — governance, validator, release infrastructure, README, decision records. +All notable changes to this repository are documented here: governance, validator, release infrastructure, README, decision records. -Changes to the standard itself — principle MUST/SHOULD/MAY tier moves, requirement IDs added/removed/renamed, applicability shifts — are tracked per-principle in `principles/p*-*.md` via the `last-revised:` calver frontmatter field and the `## Pressure test notes` section appended to each file. +Changes to the standard itself (principle MUST/SHOULD/MAY tier moves, requirement IDs added/removed/renamed, applicability shifts) are tracked per-principle in `principles/p*-*.md` via the `last-revised:` calver frontmatter field and the `## Pressure test notes` section appended to each file. + +## [0.5.0] - 2026-05-30 + +### Added + +- Three-tier contribution framework (Signal / Proposal / Code) in `CONTRIBUTING.md` with a "Where to file what" routing table covering all four repos by @brettdavies in [#30](https://github.com/brettdavies/agentnative/pull/30) +- `00-blank.yml` issue template, sort-prefixed to the top of the picker, with an agent-facing footer redirecting to structured templates +- Fourth `contact_link` in `.github/ISSUE_TEMPLATE/config.yml` routing skill-bundle issues to the bundle repo +- `p3-must-version` (universal MUST): top-level `--version` prints a non-empty version line and exits 0. by @brettdavies in [#33](https://github.com/brettdavies/agentnative/pull/33) +- `p3-should-version-short` (universal SHOULD): a short alias (`-V` per clap default; `-v` per Node/npm/Bun/Yarn/Make convention; `-version` per Go's `flag` package) accompanies `--version`. Any of the three forms is sufficient. +- Conditional applicability shape `{kind: conditional, antecedent: {check_id: }}` for machine-checkable conditionals. The `check_id` is a verifier identifier, not a requirement `id`; the v1 schema is single-antecedent only, with compound antecedents (`op`/`checks`) deferred to v2 and rejected by the validator. by @brettdavies in [#34](https://github.com/brettdavies/agentnative/pull/34) +- Antecedent-status propagation table mapping the antecedent's status (under the 7-status taxonomy: `pass`, `warn`, `fail`, `opt_out`, `n_a`, `skip`, `error`) to the consequent's emission, including the inheritance rules for `skip` and `error`. +- Add `principles/scoring.md` defining the leaderboard scoring formula: a binary-behavior, credit-weighted ratio with `opt_out` counted and `n_a`/`skip`/`error` excluded, plus four badge cohort bands. by @brettdavies in [#39](https://github.com/brettdavies/agentnative/pull/39) + +### Changed + +- Issue template `grade-a-cli.yml` renamed to `grading-finding.yml`, with `name`, `description`, and `labels` updated to match the actual purpose by @brettdavies in [#30](https://github.com/brettdavies/agentnative/pull/30) +- `RELEASES.md` reduced from runbook-plus-rationale (339 lines) to runbook-only (201 lines), cross-linking the new rationale doc +- `AGENTS.md` updated from three-repo to four-repo ecosystem; template references updated +- `scripts/prose-check.sh` default `LANGUAGETOOL_URL` is now `http://languagetool:8081` (service-name default; consumers override via env var) +- Migrate p2 (`p2-must-schema-print`, `p2-should-schema-file`) and p8 (`p8-must-bundle-install`, `p8-may-install-all`, `p8-may-bundle-update`) applicability from `{if: }` to the new shape. Antecedent for p2 is `p2-json-output`; for p8 is `p8-bundle-exists`. No tier changes. by @brettdavies in [#34](https://github.com/brettdavies/agentnative/pull/34) +- `last-revised` discipline tightened: any frontmatter mutation (summary rewrites, applicability shape migrations, tier changes, requirement add/remove, status flips, reordering) stamps today's date. Prose-only edits below the closing `---` fence remain exempt. Enforced by `scripts/check-last-revised.mjs` at pre-push and PR CI. by @brettdavies in [#38](https://github.com/brettdavies/agentnative/pull/38) +- Lower the badge eligibility floor from 80% to 70%. by @brettdavies in [#39](https://github.com/brettdavies/agentnative/pull/39) +- Replace the badge's three-color threshold with four cohort bands (Exemplary >= 85, Strong 80-84, Solid 75-79, Qualified 70-74). +- Rename the conditional-antecedent frontmatter field `check_id` to `audit_id`. This is a breaking change to the principle-frontmatter shape: consumers that parse `applicability.antecedent` read `audit_id` instead of `check_id`. by @brettdavies in [#40](https://github.com/brettdavies/agentnative/pull/40) +- Rename the standard's conformance vocabulary from `check` to `audit` (the `anc audit` subcommand, "audit IDs", "auditor") across principle prose and governance docs. + +### Fixed + +- Backfilled `last-revised` on p1, p2, p4, p5, p6, p7, p8 to match each file's most recent substantive frontmatter change. p3 was already correct. by @brettdavies in [#38](https://github.com/brettdavies/agentnative/pull/38) +- The `last-revised` discipline check runs only on PRs targeting `dev`, so release PRs to `main` no longer fail it for principles revised on an earlier day. by @brettdavies in [#41](https://github.com/brettdavies/agentnative/pull/41) + +### Removed + +- `docs/architecture/languagetool-deployment.md` (deployment knowledge compounded into the shared `solutions-docs` repo as `self-hosted-languagetool-for-prose-check-stacks-2026-05-20.md`) by @brettdavies in [#30](https://github.com/brettdavies/agentnative/pull/30) + +**Full Changelog**: [v0.4.0...v0.5.0](https://github.com/brettdavies/agentnative/compare/v0.4.0...v0.5.0) + +## [0.4.0] - 2026-05-07 + +### Added + +- P1 MUST `p1-must-secret-non-leaky-path` (conditional on CLI accepting secret material): sensitive inputs are readable via stdin or a `--*-file` flag; flag-value and env-var inputs MAY exist for convenience but MUST NOT be the only path. by @brettdavies in [#25](https://github.com/brettdavies/agentnative/pull/25) +- P2 MUST `p2-must-schema-print` (conditional on structured output): expose the output schema via a `schema` subcommand or `--schema` flag, runtime-discoverable, with a documented format identifier (canonical recommendation: JSON Schema 2020-12). +- P2 SHOULD `p2-should-schema-file` (conditional on structured output): also export the schema to a stable file path so CI and static-analysis consumers can pin without invoking the tool. +- P2 SHOULD `p2-should-json-aliases`: accept `--json` and `--jsonl` as aliases for `--output json` and `--output jsonl`. +- P4 SHOULD `p4-should-enumerate-valid-set` (conditional on closed-set rejection): when rejecting input against an enum or fixed-allowed-values set, the error message includes the valid set. +- P6 MUST `p6-must-sigterm` (conditional on long-running operations): flush or roll back partial writes, release locks, exit non-zero within a bounded shutdown window. Next invocation succeeds without manual cleanup. +- P6 MAY `p6-may-standard-names` (conditional on subcommands): follow community-standard verbs (`get` / `list` / `create` / `update` / `delete`) and flag spellings (`--force`, `--yes`, `--limit`, `--quiet`, `--verbose`). +- New principle **P8 Discoverable Through Agent Skill Bundles** (four requirements: `p8-must-bundle-install`, `p8-should-bundle-exists`, `p8-may-install-all`, `p8-may-bundle-update`). CLIs ship a top-level skill bundle (`AGENTS.md`, `SKILL.md`, or equivalent) and provide an install path that registers the bundle with installed agent runtimes (canonical form: `tool skill install []`). + +### Changed + +- `VERSION`: 0.3.1 → 0.4.0 (MINOR per `principles/AGENTS.md`'s versioning rules; new MUSTs added). by @brettdavies in [#25](https://github.com/brettdavies/agentnative/pull/25) +- `.impeccable.md`: new spec-channel anti-pattern "No false canonicalization". When a bullet names an outcome the implementer can satisfy any way, prose uses indefinite articles and avoids language that canonicalizes one shape; when a bullet names a citable single-shape pattern, prose uses definite articles and cites the source. + +**Full Changelog**: [v0.3.1...v0.4.0](https://github.com/brettdavies/agentnative/compare/v0.3.1...v0.4.0) + +## [0.3.1] - 2026-05-07 + +### Added + +- Badge claim convention (`docs/badge.md`) defines eligibility floor (≥80% pass-rate), embed shape, score-text format, color thresholds, and version-pinning posture for tool authors who self-host the agent-native badge linked to a live scorecard. by @brettdavies in [#20](https://github.com/brettdavies/agentnative/pull/20) +- README and CONTRIBUTING pointers to the badge convention so HN visitors and tool authors land on the convention from the two top-level entry points. +- Add `BRAND.md` at the repo root. Universal voice and identity SoT shared across the spec, site, linter, and skill bundle channels. Each channel inherits the shared identity and adds register and artifacts in its own `.impeccable.md`. by @brettdavies in [#22](https://github.com/brettdavies/agentnative/pull/22) +- Add spec-channel `.impeccable.md`: RFC 2119 register rules, third-person standards voice, no-implementation-leakage anti-patterns. Narrative identity layer; literal phrase enforcement lives in the `spec` Vale rule pack. +- Add `## Acknowledgements` to README. Names foundational CLI doctrine (12-factor, POSIX, clig.dev, NO_COLOR, XDG), parallel agent-CLI synthesis sources, the spec's proximate ancestors, and the anc.dev ecosystem's mechanism contribution. +- Add deterministic pre-push voice enforcement: Vale rule packs (`styles/brand/`, `styles/spec/`), LanguageTool grammar checks over the Tailnet (graceful skip when unreachable), and pack-README drift detection. One-time setup per contributor: `brew install vale jaq bun && vale sync` after activating `core.hooksPath scripts/hooks`. The layered SoT, orchestrator behavior, contributor flow, and deferred follow-ups live in the `dev`-only architecture docs. + +### Changed + +- Rename README "trifecta" to "four artifacts"; add `agentnative-skill` as a first-class artifact alongside the spec, the linter, and the leaderboard. by @brettdavies in [#22](https://github.com/brettdavies/agentnative/pull/22) +- Drop `docs/architecture/voice-enforcement.md` references from main-shipped files (`AGENTS.md`, `CONTRIBUTING.md`, `principles/AGENTS.md`, `.gitignore` comment). Replace the pointers with inline narrative that names the rule packs and the LT graceful-skip behavior. The architecture docs stay on `dev` as contributor-side reference and are not shipped to `main`. by @brettdavies in [#24](https://github.com/brettdavies/agentnative/pull/24) +- Update the `RELEASES.md` Prose scrubbing procedure to scrub-before-submit. Step 1 covers three entry points (scratch authoring for `gh pr create`, fetch-then-clean for `gh pr edit`, `cp CHANGELOG.md` for changelog scrub); step 6 submits the cleaned version once via `--body-file`. + +**Full Changelog**: [v0.3.0...v0.3.1](https://github.com/brettdavies/agentnative/compare/v0.3.0...v0.3.1) ## [0.3.0] - 2026-04-28 diff --git a/spec/README.md b/spec/README.md index 5a18373..1740361 100644 --- a/spec/README.md +++ b/spec/README.md @@ -6,6 +6,9 @@ latest upstream `v*` tag and ship inside the skill bundle so consumers carry the skill metadata. Each release of this bundle re-vendors against the latest spec tag. The currently vendored version is recorded in [`VERSION`](./VERSION). +For the full spec landing page — leaderboard, badge convention, and acknowledgements — see [anc.dev](https://anc.dev) or +the upstream [`README`](https://github.com/brettdavies/agentnative#readme). + ## Resync Run from the repo root: @@ -21,15 +24,16 @@ remote. The script extracts files via `git show`, so neither source's working tr ## Layout -| Path | Source in `agentnative-spec` | Purpose | -| ------------------ | ---------------------------- | ------------------------------------------------------------- | -| `VERSION` | `VERSION` | Spec version string; the skill's vendored `SPEC_VERSION` | -| `CHANGELOG.md` | `CHANGELOG.md` | Spec change history; informational | -| `principles/p*.md` | `principles/p*.md` | Frontmatter `requirements[]` is the machine-readable contract | +| Path | Source in `agentnative-spec` | Purpose | +| ----------------------- | ---------------------------- | ------------------------------------------------------------- | +| `VERSION` | `VERSION` | Spec version string; the skill's vendored `SPEC_VERSION` | +| `CHANGELOG.md` | `CHANGELOG.md` | Spec change history; informational | +| `principles/p*.md` | `principles/p*.md` | Frontmatter `requirements[]` is the machine-readable contract | +| `principles/scoring.md` | `principles/scoring.md` | Leaderboard formula, badge eligibility floor, and color bands | Each principle file has a YAML frontmatter block with `id`, `title`, `last-revised`, `status`, and `requirements[]`. Each `requirements[]` entry carries a stable `id` (e.g. `p1-must-no-interactive`), a `level` (`must`/`should`/`may`), an -`applicability` (`universal` or `{if: }`), and a one-sentence `summary`. The `anc` checker +`applicability` (`universal` or `{if: }`), and a one-sentence `summary`. The `anc` auditor ([brettdavies/agentnative-cli](https://github.com/brettdavies/agentnative-cli)) emits these IDs in its scorecard so agents can navigate from a finding directly to the requirement. diff --git a/spec/VERSION b/spec/VERSION index 0d91a54..8f0916f 100644 --- a/spec/VERSION +++ b/spec/VERSION @@ -1 +1 @@ -0.3.0 +0.5.0 diff --git a/spec/principles/p1-non-interactive-by-default.md b/spec/principles/p1-non-interactive-by-default.md index f93ad5a..7761a76 100644 --- a/spec/principles/p1-non-interactive-by-default.md +++ b/spec/principles/p1-non-interactive-by-default.md @@ -1,7 +1,7 @@ --- id: p1 title: Non-Interactive by Default -last-revised: 2026-04-22 +last-revised: 2026-05-07 status: active requirements: - id: p1-must-env-var @@ -11,12 +11,17 @@ requirements: - id: p1-must-no-interactive level: must applicability: universal - summary: "`--no-interactive` flag gates every prompt library call; when set or stdin is not a TTY, use defaults/stdin or exit with an actionable error." + summary: "When stdin is not a TTY or `--no-interactive` is set, every blocking-input surface (prompt libraries, read-line, TUI init) resolves from defaults/stdin or exits with an actionable error." - id: p1-must-no-browser level: must applicability: if: CLI authenticates against a remote service summary: Headless authentication path (`--no-browser` / OAuth Device Authorization Grant). + - id: p1-must-secret-non-leaky-path + level: must + applicability: + if: CLI accepts secret material (tokens, passwords, keys) as input + summary: "Sensitive inputs are readable via stdin or a `--*-file` flag; flag-value and env-var inputs MAY exist for convenience but MUST NOT be the only path." - id: p1-should-tty-detection level: should applicability: universal @@ -36,11 +41,11 @@ requirements: ## Definition Every automation path MUST run without human input. A CLI tool that blocks on an interactive prompt is invisible to an -agent — the agent hangs, the user sees nothing, and the operation times out silently. +agent: the agent hangs, the user sees nothing, and the operation times out silently. **Decision record:** this principle's MUST is worded in terms of observable behavior rather than enumerated APIs. [`docs/decisions/p1-behavioral-must.md`](../docs/decisions/p1-behavioral-must.md) records the reasoning and names the -verification boundary: automated checks verify behavior under non-TTY stdin; TTY-driving-agent scenarios are covered by +verification boundary: automated audits verify behavior under non-TTY stdin; TTY-driving-agent scenarios are covered by the MUST but are not PTY-probed at the current scale. ## Why Agents Need It @@ -63,26 +68,32 @@ agent-tool deadlock. quiet: bool, ``` -- A `--no-interactive` flag gating every prompt library call (`dialoguer`, `inquire`, `read_line`, `TTY::Prompt`, - `inquirer`, equivalents in other frameworks, or any TUI event loop that takes over the terminal). When the flag is - set, or when stdin is not a TTY, the tool uses defaults, reads from stdin, or exits with an actionable error. It never - blocks. +- When stdin is not a terminal, or when `--no-interactive` is set, every blocking-input surface (prompt libraries, + read-line calls, TUI session initialization) MUST resolve from defaults, read from stdin, or exit non-zero with an + actionable error. The CLI MUST NOT block waiting for input that cannot arrive. - A headless authentication path if the CLI authenticates. The canonical flag is `--no-browser`, which triggers the OAuth 2.0 Device Authorization Grant ([RFC 8628](https://www.rfc-editor.org/rfc/rfc8628)): the CLI prints a URL and a code; the user authorizes on another device. Agents cannot open browsers. Non-canonical alternatives (`--device-code`, - `--remote`, `--headless`) are acceptable but should migrate toward `--no-browser`. + `--remote`, `--headless`) are acceptable but SHOULD migrate toward `--no-browser`. +- CLIs that accept secret material (tokens, passwords, private keys) MUST provide at least one input path that does not + leak the value into process listings (`ps`), shell history, or the parent environment. The two leak-resistant paths + are stdin and a `--*-file` flag pointing to a credential file. Flag-value (`--token `) and environment-variable + (`TOOL_TOKEN`) paths MAY exist as convenience surfaces but MUST NOT be the only programmatic path. Cloud-CLI env-var + conventions (`AWS_ACCESS_KEY_ID`, `GH_TOKEN`) are accepted as convenience paths under this rule, not as substitutes + for it. **SHOULD:** -- Auto-detect non-interactive context via TTY detection (`std::io::IsTerminal` in Rust 1.70+, `process.stdin.isTTY` in - Node, `sys.stdout.isatty()` in Python) and suppress prompts when stderr is not a terminal, even without an explicit - `--no-interactive` flag. +- Auto-detect non-interactive context via TTY detection on stdin (and stderr, where prompts target it) and suppress + prompts when no terminal is present, even without an explicit `--no-interactive` flag. Language-specific entry points + (`std::io::IsTerminal` in Rust 1.70+, `process.stdin.isTTY` in Node, `sys.stdout.isatty()` in Python) appear in the + Evidence section. - Document default values for prompted inputs in `--help` output so agents can pass them explicitly instead of accepting whatever default ships. **MAY:** -- Offer rich interactive experiences — spinners, progress bars, multi-select menus — when a TTY is detected and +- Rich interactive experiences (spinners, progress bars, multi-select menus) MAY render when a TTY is detected and `--no-interactive` is not set, provided the non-interactive path remains fully functional. ## Evidence @@ -95,20 +106,21 @@ agent-tool deadlock. ## Anti-Patterns -- Bare `dialoguer::Confirm::new().interact()` with no TTY check and no `--no-interactive` override — agents hang +- Bare `dialoguer::Confirm::new().interact()` with no TTY check and no `--no-interactive` override: agents hang indefinitely. - Boolean environment variables parsed as plain strings, so `TOOL_QUIET=false` is truthy because the string is non-empty. - `stdin().read_line()` in a code path reached during normal operation without a TTY check first. - Hard-coded credentials prompts with no env-var or config-file alternative. - OAuth flow that unconditionally opens a browser with no headless escape hatch. +- A `--password ` flag with no stdin or file alternative: every invocation leaks the secret into `ps` output. -Measured by check IDs `p1-non-interactive` (behavioral) and `p1-non-interactive-source` (source). Run `agentnative check ---principle 1 .` against your CLI to see both. +Measured by audit IDs `p1-non-interactive` (behavioral) and `p1-non-interactive-source` (source). Run `anc audit +--principle 1 .` against the CLI under test to see both. ## Pressure test notes -### 2026-04-27 — Show HN launch red-team pass +### 2026-04-27: Red-team pass Adversarial review via `compound-engineering:ce-adversarial-document-reviewer` ahead of the v0.3.0 launch. Findings recorded verbatim per `principles/AGENTS.md` § "Pressure-test protocol". @@ -116,13 +128,13 @@ recorded verbatim per `principles/AGENTS.md` § "Pressure-test protocol". - **[edit]** *Internal inconsistency.* "The `--no-interactive` MUST bullet says 'uses defaults, reads from stdin, or exits with an actionable error' but the principle's behavioral framing (per decision record) covers TUI session init too. The prose bullet only enumerates prompt libraries (`dialoguer`, `inquire`, `read_line`, `TTY::Prompt`, - `inquirer`), not TUI frameworks (`ratatui`, `bubbletea`) — readers will infer the MUST excludes TUIs, contradicting - the decision record's explicit 'blocking-interactive surface includes... TUI session initialization.'" Resolved: prose + `inquirer`), not TUI frameworks (`ratatui`, `bubbletea`). Readers will infer the MUST excludes TUIs, contradicting the + decision record's explicit 'blocking-interactive surface includes... TUI session initialization.'" Resolved: prose bullet's parenthetical now includes "or any TUI event loop that takes over the terminal." Mirrors the behavioral framing in [`docs/decisions/p1-behavioral-must.md`](../docs/decisions/p1-behavioral-must.md). No frontmatter change; the summary already says "gates every prompt library call" and stays. - **[wontfix]** *Prior art.* "RFC 8628 citation is correct in name but incomplete in framing. The prose says Device - Authorization Grant means 'the CLI prints a URL and a code; the user authorizes on another device' — this still + Authorization Grant means 'the CLI prints a URL and a code; the user authorizes on another device'. This still requires a human on another device, which an unattended agent does not have. An HN commenter will note this is a *human-assisted* headless path, not an agent-headless path; true unattended agents need service-account / API-token auth (which the principle doesn't mention)." Rationale: P1 scopes "headless" as "no local browser required," not "no diff --git a/spec/principles/p2-structured-parseable-output.md b/spec/principles/p2-structured-parseable-output.md index 4f1ef7b..db854c0 100644 --- a/spec/principles/p2-structured-parseable-output.md +++ b/spec/principles/p2-structured-parseable-output.md @@ -1,17 +1,17 @@ --- id: p2 title: Structured, Parseable Output -last-revised: 2026-04-22 +last-revised: 2026-05-29 status: active requirements: - id: p2-must-output-flag level: must applicability: universal - summary: "`--output text|json|jsonl` flag selects output format; `OutputFormat` enum threaded through output paths." + summary: "`--output` flag selects format with `json` and `jsonl` as canonical machine-readable values; `text` is the default human-facing form." - id: p2-must-stdout-stderr-split level: must applicability: universal - summary: Data goes to stdout; diagnostics/progress/warnings go to stderr — never interleaved. + summary: Data goes to stdout; diagnostics/progress/warnings go to stderr, never interleaved. - id: p2-must-exit-codes level: must applicability: universal @@ -20,10 +20,28 @@ requirements: level: must applicability: universal summary: When `--output json` is active, errors are emitted as JSON (to stderr) with at least `error`, `kind`, and `message` fields. + - id: p2-must-schema-print + level: must + applicability: + kind: conditional + antecedent: + audit_id: p2-json-output + summary: "CLIs that emit structured output expose the output schema via a `schema` subcommand or `--schema` flag: runtime-discoverable, with a documented format identifier." - id: p2-should-consistent-envelope level: should applicability: universal - summary: JSON output uses a consistent envelope — a top-level object with predictable keys — across every command. + summary: JSON output uses a consistent envelope (a top-level object with predictable keys) across every command. + - id: p2-should-schema-file + level: should + applicability: + kind: conditional + antecedent: + audit_id: p2-json-output + summary: "Output schemas are also exported to a stable file path (e.g., `schema/.json`) so CI/static-analysis consumers pin without invoking the tool." + - id: p2-should-json-aliases + level: should + applicability: universal + summary: "`--json` and `--jsonl` are accepted as aliases for `--output json` and `--output jsonl`; the short forms work alongside the canonical enum." - id: p2-may-more-formats level: may applicability: universal @@ -46,18 +64,19 @@ data forces agents into fragile regex extraction that breaks on any format chang An agent calling a CLI needs three things from each invocation: the data, the error (if any), and the exit code. When data goes to stdout, diagnostics go to stderr, and errors carry machine-readable fields, the agent parses the result reliably without heuristics. Mix these channels or ship human-formatted output only, and the agent falls back to -best-effort text parsing that fails unpredictably across versions, locales, and edge cases — silently at first, +best-effort text parsing that fails unpredictably across versions, locales, and edge cases: silently at first, catastrophically later. ## Requirements **MUST:** -- A `--output text|json|jsonl` flag selects the output format. Text is the default for humans; JSON and JSONL are the - agent-facing formats. Implementation surfaces an `OutputFormat` enum and an `OutputConfig` struct threaded through - every function that produces output. -- Data goes to stdout. Diagnostics, progress indicators, and warnings go to stderr. An agent consuming JSON from stdout - must never encounter an interleaved progress message. +- Structured-output CLIs MUST offer at least one machine-readable format selectable via `--output`, with `json` and + `jsonl` as canonical values; `text` is the default human-facing form. The format selection threads through every + output path, so a single invocation never mixes formats. +- Data goes to stdout. Diagnostics, progress indicators, and warnings go to stderr. The split is decades-old Unix + practice (POSIX, ESR's Rule of Repair, clig.dev's "Output" rules); for an agent it is load-bearing: a JSON consumer + reading stdout MUST NOT encounter an interleaved progress line. - Exit codes are structured and documented: | Code | Meaning | @@ -71,17 +90,29 @@ catastrophically later. These codes blend the bash 0/1/2 convention with BSD `sysexits.h` 77/78 (`EX_NOPERM`, `EX_CONFIG`); the result is the de-facto agent-facing dialect, not strict `sysexits.h` compliance. -- When `--output json` is active, errors are emitted as JSON (to stderr) with at least `error`, `kind`, and `message` - fields. Plain-text errors in a JSON run break the agent's parser on the only output it was told to expect. +- When `--output json` is active, errors MUST be emitted as JSON to stderr with at least `error`, `kind`, and `message` + fields. A plain-text error inside a JSON run breaks the consumer's parser on the only shape it was told to expect. +- CLIs that emit structured output (`--output json|jsonl`) MUST expose the output schema at runtime via a `schema` + subcommand (or a `--schema` flag on each data-emitting subcommand). The schema MUST identify its format (canonical + recommendation is JSON Schema 2020-12, the same dialect OpenAPI 3.1 uses), so an agent reading the schema loads the + right validator without parsing prose. A consumer asking "what shape am I about to receive?" gets a machine-readable + answer in one call. **SHOULD:** -- JSON output uses a consistent envelope — a top-level object with predictable keys — across every command so agents can +- JSON output uses a consistent envelope (a top-level object with predictable keys) across every command so agents can rely on the same shape. +- The schema SHOULD also be exported to a stable file path in the source repo (e.g., `schema/.json`) so + consumers can pin against it at install or CI time without invoking the tool. The print form is the runtime contract; + the file form is the build-time contract. +- CLIs SHOULD accept `--json` as an alias for `--output json` and `--jsonl` as an alias for `--output jsonl`. The + `--output` enum remains the canonical surface for the format MUST (`p2-must-output-flag`); a Cloudflare-style CLI + shipping only the short forms still satisfies the canonical MUST through the alias path. **MAY:** -- Additional output formats (CSV, TSV, YAML) beyond the core three. The core three remain mandatory. +- Additional `--output` values (CSV, TSV, YAML) MAY be offered beyond the canonical text/json/jsonl. The canonical three + remain mandatory. - A `--raw` flag for unformatted output suitable for piping to other tools. ## Evidence @@ -89,36 +120,36 @@ catastrophically later. - `OutputFormat` enum with `Text`, `Json`, `Jsonl` variants deriving `ValueEnum`. - `OutputConfig` struct with `format`, `use_color`, and `quiet` fields. - `serde_json` in `Cargo.toml`. -- No `println!` in `src/` outside the output module — every print goes through `OutputConfig`. +- No `println!` in `src/` outside the output module: every print goes through `OutputConfig`. - Exit-code constants or match arms mapping error variants to distinct numeric codes. - `eprintln!` (or an equivalent diagnostic macro) for every diagnostic line. ## Anti-Patterns - `println!` scattered across handlers instead of routing through the output config. -- A single exit code (1) for everything — agents cannot distinguish auth failures from config errors. +- A single exit code (1) for everything: agents cannot distinguish auth failures from config errors. - Status lines ("Fetching data…") printed to stdout where they contaminate JSON output. - `process::exit()` in library code, bypassing structured error propagation. - Human-formatted tables as the only output mode with no JSON alternative. -Measured by check IDs `p2-output-json`, `p2-output-format`, `p2-stderr-diagnostics`. Run `agentnative check --principle -2 .` against your CLI to see each. +Measured by audit IDs `p2-output-json`, `p2-output-format`, `p2-stderr-diagnostics`. Run `anc audit --principle 2 .` +against the CLI under test to see each. ## Pressure test notes -### 2026-04-27 — Show HN launch red-team pass +### 2026-04-27: Red-team pass Adversarial review via `compound-engineering:ce-adversarial-document-reviewer` ahead of the v0.3.0 launch. Findings recorded verbatim per `principles/AGENTS.md` § "Pressure-test protocol". - **[edit]** *Prior art.* "The exit-code table conflicts with `sysexits.h`. `EX_NOPERM=77` is 'permission denied' (close), but `EX_CONFIG=78` is correct. However, `sysexits.h` reserves `EX_USAGE=64`, `EX_DATAERR=65`, - `EX_NOINPUT=66`, `EX_UNAVAILABLE=69`, `EX_SOFTWARE=70` — P2 puts 'usage error' at 2 (bash convention), not 64. HN will + `EX_NOINPUT=66`, `EX_UNAVAILABLE=69`, `EX_SOFTWARE=70`. P2 puts 'usage error' at 2 (bash convention), not 64. HN will note the principle straddles two conventions (bash 0/1/2 + sysexits 77/78) without naming the hybrid." Resolved: added one sentence under the exit-code table acknowledging the bash + `sysexits.h` blend. The same citation now appears in P4's exit-code table (per Row #13 of the same review pass) so both files agree. -- **[later]** *Must-vs-should.* "A single-number-emitting CLI (e.g., `epoch`, `uuidgen`) plausibly violates the +- **[later]** *MUST-vs-SHOULD.* "A single-number-emitting CLI (e.g., `epoch`, `uuidgen`) plausibly violates the `--output text|json|jsonl` MUST for a defensible reason. Universal applicability is a strong claim." Deferred: revisit - whether `applicability` should soften when the launch landscape clarifies actual single-number agent-facing CLIs. The + whether `applicability` SHOULD soften when the launch landscape clarifies actual single-number agent-facing CLIs. The applicability change would fire coupled-release (CLI registry impact), so it is held for a v0.4.0 cleanup PR rather than churned during launch week. diff --git a/spec/principles/p3-progressive-help-discovery.md b/spec/principles/p3-progressive-help-discovery.md index 55bf47b..cfe1725 100644 --- a/spec/principles/p3-progressive-help-discovery.md +++ b/spec/principles/p3-progressive-help-discovery.md @@ -1,7 +1,7 @@ --- id: p3 title: Progressive Help Discovery -last-revised: 2026-04-22 +last-revised: 2026-05-21 status: active requirements: - id: p3-must-subcommand-examples @@ -13,6 +13,14 @@ requirements: level: must applicability: universal summary: The top-level command ships 2–3 examples covering the primary use cases. + - id: p3-must-version + level: must + applicability: universal + summary: Top-level `--version` prints a non-empty version line and exits 0. + - id: p3-should-version-short + level: should + applicability: universal + summary: A short version alias (`-V`, `-v`, or `-version`) accompanies `--version` for fast version probes. - id: p3-should-paired-examples level: should applicability: universal @@ -46,16 +54,24 @@ trial-and-errors its way into a working call, burning tokens and sometimes landi **MUST:** -- Every subcommand ships at least one concrete invocation example showing the command with realistic arguments, rendered - in the section that appears after the flags list. In clap this is the `after_help` attribute. -- The top-level command ships 2–3 examples covering the primary use cases. +- Every subcommand MUST render at least one concrete invocation example with realistic arguments, in the section that + appears after the flags list. Clap's `after_help` attribute is the Rust realization; other frameworks have equivalents + (see Evidence section below). +- The top-level command MUST render 2–3 examples covering the primary use cases. +- The top-level command MUST respond to `--version` with a non-empty version line on stdout and exit 0. Agents pin + against tool versions to detect breaking changes; a `--version` that errors, exits non-zero, or prints nothing forces + every consumer to scrape the binary path or manifest for a clue. **SHOULD:** -- Examples show human and agent invocations side by side — a text-output example followed by its `--output json` +- Examples show human and agent invocations side by side: a text-output example followed by its `--output json` equivalent. Readers see the pair; agents see the JSON form. - Short `about` for command-list summaries; `long_about` reserved for detailed descriptions visible with `--help` but not `-h`. +- A short alias for `--version` SHOULD work: `-V` (clap default, `curl`, `wget`, `gzip`), `-v` (`npm`, `node`, `bun`, + `yarn`, `make`), or `-version` (Go's `flag` package). Any of the three forms is sufficient. Agents probing tool + versions across many CLIs save token cost when they can pin against a one- or two-character flag; the long-only path + forces an extra parse step. **MAY:** @@ -71,18 +87,18 @@ trial-and-errors its way into a working call, burning tokens and sometimes landi ## Anti-Patterns -- Relying solely on `///` doc comments — those populate `about` / `long_about`, not `after_help`, so no examples render +- Relying solely on `///` doc comments: those populate `about` / `long_about`, not `after_help`, so no examples render after the flags list. - A single `about` string serving as both summary and usage documentation. - Examples buried in a README or man page but absent from `--help` output. - `after_help` text that describes the flags in prose instead of demonstrating them in code. -Measured by check IDs `p3-help`, `p3-after-help`, `p3-version`. Run `agentnative check --principle 3 .` against your CLI -to see each. +Measured by audit IDs `p3-help`, `p3-after-help`, `p3-version`. Run `anc audit --principle 3 .` against the CLI under +test to see each. ## Pressure test notes -### 2026-04-27 — Show HN launch red-team pass +### 2026-04-27: Red-team pass Adversarial review via `compound-engineering:ce-adversarial-document-reviewer` ahead of the v0.3.0 launch. Findings recorded verbatim per `principles/AGENTS.md` § "Pressure-test protocol". @@ -91,16 +107,16 @@ recorded verbatim per `principles/AGENTS.md` § "Pressure-test protocol". implicitly gates P3 conformance on a `--output json` mode, which is P2's territory. A CLI without structured output cannot satisfy this SHOULD even in spirit, yet `applicability: universal`." Deferred: narrowing `applicability` from `universal` to conditional (`if: CLI exposes a structured-output mode`) fires the coupled-release norm (CLI registry - parses `applicability`). Bundled with other applicability cleanups for a v0.4.0 PR with explicit registry + parses `applicability`). Bundled with other applicability cleanups for a future PR with explicit registry coordination. -- **[later]** *Must-vs-should.* "'Top-level command ships 2–3 examples' as a universal MUST is too strong for genuinely +- **[later]** *MUST-vs-SHOULD.* "'Top-level command ships 2–3 examples' as a universal MUST is too strong for genuinely single-purpose CLIs (e.g., `cat`, `true`, a one-shot wrapper) where one canonical invocation is the entire surface. The '2–3' count baked into a MUST will draw HN fire as cargo-culted." Deferred: softening to "at least one example, and 2–3 when the tool has multiple primary use cases" is a MUST-content change that drifts the frontmatter summary. - Bundled with other MUST-content softenings for a v0.4.0 PR. + Bundled with other MUST-content softenings for a future PR. - **[later]** *Prior art.* "Principle is clap-flavored throughout (`after_help`, `about`/`long_about`, `///` doc comments in Anti-Patterns) without a single sentence acknowledging the non-Rust analog (docopt usage block, `argparse` epilog, `cobra` Example field, `gh`/`kubectl` Examples convention). HN will call this 'a clap style guide, not a CLI standard.'" Deferred: a cross-framework analog appendix is a meaningful addition. The Definition / Why-Agents-Need-It - sections are framework-agnostic; the Evidence section is intentionally clap-keyed. Worth revisiting in v0.4.0 once the - standard's multi-language reach is clearer; site copy may also be a better home than the principle file itself. + sections are framework-agnostic; the Evidence section is intentionally clap-keyed. Worth revisiting once the + standard's multi-language reach is clearer; site copy could also be a better home than the principle file itself. diff --git a/spec/principles/p4-fail-fast-actionable-errors.md b/spec/principles/p4-fail-fast-actionable-errors.md index 8e91deb..6479504 100644 --- a/spec/principles/p4-fail-fast-actionable-errors.md +++ b/spec/principles/p4-fail-fast-actionable-errors.md @@ -1,7 +1,7 @@ --- id: p4 title: Fail Fast with Actionable Errors -last-revised: 2026-04-22 +last-revised: 2026-05-07 status: active requirements: - id: p4-must-try-parse @@ -15,7 +15,7 @@ requirements: - id: p4-must-actionable-errors level: must applicability: universal - summary: Every error message contains what failed, why, and what to do next. + summary: Every error message names the failure, the cause, and a concrete remediation (a command or a value, not a hint to consult docs). - id: p4-should-structured-enum level: should applicability: universal @@ -29,6 +29,11 @@ requirements: level: should applicability: universal summary: "Error output respects `--output json`: JSON-formatted errors go to stderr when JSON output is selected." + - id: p4-should-enumerate-valid-set + level: should + applicability: + if: CLI rejects input against a closed set + summary: "When rejecting input against an enum or fixed-allowed-values set, the error message includes the valid set." --- # P4: Fail Fast with Actionable Errors @@ -40,8 +45,8 @@ why, and what to do next. An error that says "operation failed" gives an agent n ## Why Agents Need It -Agents operate in a retry loop: attempt, observe, decide. When an error is vague or unstructured — a bare stack trace, a -one-word failure, a mixed-channel splurge — the agent cannot tell whether to retry, re-authenticate, fix configuration, +Agents operate in a retry loop: attempt, observe, decide. When an error is vague or unstructured (a bare stack trace, a +one-word failure, a mixed-channel splurge), the agent cannot tell whether to retry, re-authenticate, fix configuration, or escalate to the user. Distinct exit codes with actionable messages let the agent act correctly on the first read. The difference between exit code 77 (re-authenticate) and exit code 78 (fix config) determines whether the agent retries OAuth or asks the user to check their config file. Getting that wrong wastes entire conversation turns. @@ -51,8 +56,8 @@ OAuth or asks the user to check their config file. Getting that wrong wastes ent **MUST:** - Parse arguments with `try_parse()` instead of `parse()`. Clap's `parse()` calls `process::exit()` directly, bypassing - custom error handlers — which means `--output json` cannot emit JSON parse errors. `try_parse()` returns a `Result` - the tool can format: + custom error handlers, which means `--output json` cannot emit JSON parse errors. `try_parse()` returns a `Result` the + tool can format: ```rust let cli = Cli::try_parse()?; @@ -72,7 +77,8 @@ OAuth or asks the user to check their config file. Getting that wrong wastes ent These codes blend the bash 0/1/2 convention with BSD `sysexits.h` 77/78 (`EX_NOPERM`, `EX_CONFIG`); the result is the de-facto agent-facing dialect, not strict `sysexits.h` compliance. -- Every error message contains **what failed**, **why**, and **what to do next**. Example: +- Every error message MUST name the failure, the cause, and the remediation. The remediation is concrete: a command to + run or a value to set, not a hint to consult documentation. ```text Authentication failed: token expired (expires_at: 2026-03-25T00:00:00Z). @@ -87,6 +93,14 @@ OAuth or asks the user to check their config file. Getting that wrong wastes ent three-tier definition (meta-commands, local-only commands, network commands) lives in P6 (`p6-should-tier-gating`); this requirement specifies the network-call ordering consequence. - Error output respects `--output json`: JSON-formatted errors go to stderr when JSON output is selected. +- When the failure is "invalid value for X" against a known closed set (an enum field, a documented allow-list, a typed + parameter), the error SHOULD include the valid set. An agent reading `error: invalid visibility` guesses and retries; + an agent reading `error: --visibility must be one of: public, private, unlisted (got "secret")` self-corrects in one + round-trip. + + ```text + error: --visibility must be one of: public, private, unlisted (got "secret") + ``` ## Evidence @@ -99,43 +113,43 @@ OAuth or asks the user to check their config file. Getting that wrong wastes ent ## Anti-Patterns -- `Cli::parse()` anywhere in the codebase — it silently prevents JSON error output. -- `process::exit()` in library code or command handlers. Only `main()` may call it, after all error handling. +- `Cli::parse()` anywhere in the codebase, because it silently prevents JSON error output. +- `process::exit()` in library code or command handlers. Only `main()` MAY call it, after all error handling. - A single catch-all error variant that maps everything to exit code 1. - Error messages that state the symptom without the cause or fix ("Error: request failed"). - Panics (`unwrap()`, `expect()`) on recoverable errors in production code paths. -Measured by check IDs `p4-bad-args`, `p4-process-exit`, `p4-unwrap`, `p4-exit-codes`. Run `agentnative check --principle -4 .` against your CLI to see each. +Measured by audit IDs `p4-bad-args`, `p4-process-exit`, `p4-unwrap`, `p4-exit-codes`. Run `anc audit --principle 4 .` +against the CLI under test to see each. ## Pressure test notes -### 2026-04-27 — Show HN launch red-team pass +### 2026-04-27: Red-team pass Adversarial review via `compound-engineering:ce-adversarial-document-reviewer` ahead of the v0.3.0 launch. Findings recorded verbatim per `principles/AGENTS.md` § "Pressure-test protocol". - **[edit]** *Internal inconsistency.* "Three-tier gating is labeled identically as a SHOULD in both P4 - (`p4-should-gating-before-network`) and P6 (`p6-should-tier-gating`) — same pattern, two homes, no cross-reference. + (`p4-should-gating-before-network`) and P6 (`p6-should-tier-gating`). Same pattern, two homes, no cross-reference. Readers can't tell which is canonical, and a CLI that satisfies one auto-satisfies the other." Resolved: P4's bullet now focuses on the network-call ordering consequence and points to P6 as the canonical home of the structural three-tier definition. Frontmatter summary tightened to match. Requirement ID is unchanged so CLI registry pinning is unaffected. -- **[edit]** *Must-vs-should.* "`p4-must-exit-code-mapping` is `applicability: universal` and the prose says 'At - minimum' 0/1/2/77/78 — but a CLI with no auth surface and no config file legitimately has nothing to assign to either +- **[edit]** *MUST-vs-SHOULD.* "`p4-must-exit-code-mapping` is `applicability: universal` and the prose says 'At + minimum' 0/1/2/77/78. But a CLI with no auth surface and no config file legitimately has nothing to assign to either 77 or 78, and the MUST forces empty-by-construction error variants. Same shape as P6, which correctly gates `p6-must-timeout-network` behind `if: CLI makes network calls`." Resolved: prose now reads "Use 77 when the CLI has an auth surface and 78 when it has a config surface; 0/1/2 are universal." Frontmatter summary stays universal because the *mapping discipline* is universal even if the specific 77/78 codes are conditional. The summary-prose drift is a - known launch-week tradeoff; full alignment of the summary text is on the v0.4.0 punch list. + known launch-week tradeoff; full alignment of the summary text is on the punch list. -- **[edit]** *Prior art.* "77/78 align with BSD `sysexits.h` (`EX_NOPERM`, `EX_CONFIG`) — the alignment is a strength - but neither P2 nor P4 cites BSD sysexits, leaving an HN commenter to 'discover' it as a gotcha." Resolved: added a +- **[edit]** *Prior art.* "77/78 align with BSD `sysexits.h` (`EX_NOPERM`, `EX_CONFIG`). The alignment is a strength but + neither P2 nor P4 cites BSD sysexits, leaving an HN commenter to 'discover' it as a gotcha." Resolved: added a one-liner under the P4 exit-code table acknowledging the `sysexits.h` alignment. Same sentence added to P2's exit-code table for consistency. -- **[later]** *Must-vs-should.* "`p4-must-try-parse` names a clap-specific Rust API in a `applicability: universal` - MUST. A Go/Python/Node CLI has no `try_parse()`. The underlying requirement — 'argument-parse failures route through - the same error/output formatter as runtime errors, not a library-internal `process::exit()`' — is universal; the API +- **[later]** *MUST-vs-SHOULD.* "`p4-must-try-parse` names a clap-specific Rust API in a `applicability: universal` + MUST. A Go/Python/Node CLI has no `try_parse()`. The underlying requirement: 'argument-parse failures route through + the same error/output formatter as runtime errors, not a library-internal `process::exit()`' is universal; the API name is not." Deferred: language-neutralizing the bullet ("Argument parsing returns a structured error rather than calling `process::exit()` internally; in Rust+clap, this means `try_parse()` not `parse()`") drifts the frontmatter - summary. Bundled with P6's SIGPIPE and `global = true` rewrites for a coordinated v0.4.0 language-neutralization PR. + summary. Bundled with P6's SIGPIPE and `global = true` rewrites for a coordinated language-neutralization PR. diff --git a/spec/principles/p5-safe-retries-mutation-boundaries.md b/spec/principles/p5-safe-retries-mutation-boundaries.md index 882769b..21c3e7e 100644 --- a/spec/principles/p5-safe-retries-mutation-boundaries.md +++ b/spec/principles/p5-safe-retries-mutation-boundaries.md @@ -1,7 +1,7 @@ --- id: p5 title: Safe Retries and Explicit Mutation Boundaries -last-revised: 2026-04-22 +last-revised: 2026-05-07 status: active requirements: - id: p5-must-force-yes @@ -23,7 +23,7 @@ requirements: level: should applicability: if: CLI has write operations - summary: Write operations are idempotent where the domain allows it — running the same command twice produces the same result. + summary: "Write operations are idempotent where the domain allows it: running the same command twice produces the same result." --- # P5: Safe Retries and Explicit Mutation Boundaries @@ -33,31 +33,31 @@ requirements: Every CLI with write operations MUST support `--dry-run` so agents can preview a mutation before committing it. Commands MUST make the read-vs-write distinction visible from name and `--help` alone, and destructive writes MUST require explicit confirmation. An agent that cannot distinguish a safe read from a dangerous write will either avoid the tool or -execute mutations blindly — both are failure modes. +execute mutations blindly: both are failure modes. ## Why Agents Need It Agent harnesses commonly retry failed operations. If a write operation is not idempotent, a retry creates duplicates, corrupts data, or trips rate limits. When destructive operations require explicit confirmation (`--force`, `--yes`) and support preview (`--dry-run`), an agent can safely explore what a command would do before committing to it. Read-only -tools are inherently safe for retries, but they still benefit from help text that names the mutation contract — "this +tools are inherently safe for retries, but they still benefit from help text that names the mutation contract: "this does not modify state" is a better sentence to put in `--help` than to assume. ## Requirements **MUST:** -- Destructive operations (delete, overwrite, bulk modify) require an explicit `--force` or `--yes` flag. Without it, the - tool refuses the operation or enters dry-run mode — never mutates silently. -- The distinction between read and write commands is clear from the command name and help text alone. An agent reading - `--help` immediately knows whether a command mutates state. -- A `--dry-run` flag is present on every write command. When set, the command validates inputs and reports what it would - do without executing. Dry-run output respects `--output json` so agents can parse the preview programmatically. +- Destructive operations (delete, overwrite, bulk modify) MUST require an explicit `--force` or `--yes` flag. Without + it, the command refuses the operation or enters dry-run mode; it MUST NOT mutate silently. +- The read-vs-write distinction MUST be visible from the command name and `--help` text alone. A reader scanning the + help output immediately knows whether a command mutates state. +- Every write command MUST support `--dry-run`: validate inputs and report the intended effect without executing it. + Dry-run output respects `--output json`. **SHOULD:** -- Write operations are idempotent where the domain allows it — running the same command twice produces the same result - rather than doubling the effect. +- Write operations SHOULD be idempotent where the domain allows it. Running the same command twice produces the same end + state, not a doubled effect. ## Evidence @@ -72,20 +72,20 @@ does not modify state" is a better sentence to put in `--help` than to assume. - A `delete` command that executes immediately without `--force` or confirmation. - Write commands sharing a name pattern with read commands (e.g., a `sync` that silently overwrites local state). - No `--dry-run` option on bulk operations, where a preview prevents costly mistakes. -- Operations that fail on retry because the first attempt partially succeeded — non-idempotent writes without rollback. +- Operations that fail on retry because the first attempt partially succeeded: non-idempotent writes without rollback. -Measured by check IDs `p5-dry-run`, `p5-destructive-guard`. Run `agentnative check --principle 5 .` against your CLI to -see each. +Measured by audit IDs `p5-dry-run`, `p5-destructive-guard`. Run `anc audit --principle 5 .` against the CLI under test +to see each. ## Pressure test notes -### 2026-04-27 — Show HN launch red-team pass +### 2026-04-27: Red-team pass Adversarial review via `compound-engineering:ce-adversarial-document-reviewer` ahead of the v0.3.0 launch. Findings recorded verbatim per `principles/AGENTS.md` § "Pressure-test protocol". - **[edit]** *Internal inconsistency.* "Definition opens 'Every CLI MUST support `--dry-run`' as universal, but - `p5-must-dry-run` is gated on 'CLI has write operations' — read-only CLIs would falsely fail this prose claim." + `p5-must-dry-run` is gated on 'CLI has write operations'. Read-only CLIs would falsely fail this prose claim." Resolved: Definition sentence 1 narrowed to "Every CLI with write operations MUST support `--dry-run`..." Read-only CLIs are no longer falsely accused by the prose. - **[edit]** *Internal inconsistency.* "Definition's 'Write operations MUST clearly separate destructive actions from @@ -93,26 +93,26 @@ recorded verbatim per `principles/AGENTS.md` § "Pressure-test protocol". read-only' is a different axis (writes can be non-destructive, e.g., `create`)." Resolved: Definition sentence 2 rewritten to "Commands MUST make the read-vs-write distinction visible from name and `--help` alone, and destructive writes MUST require explicit confirmation." The two axes are now stated separately. -- **[later]** *Internal inconsistency.* "`--force`/`--yes` MUST + P1 `--no-interactive` MUST should compose (agent path - is `--force --no-interactive`); composition isn't called out, leaving the 'without it, the tool refuses or enters - dry-run' clause ambiguous when stdin is non-TTY." Deferred: tightening the MUST to specify error-vs-dry-run behavior - under `--no-interactive` modifies the bullet's contract semantics. Bundled with other MUST-content cleanups for a - v0.4.0 PR. -- **[later]** *Must-vs-should.* "`read-write-distinction` MUST hinges on 'clear from command name and help text alone' — - subjective and unverifiable by `anc`. The `sync` anti-pattern proves the bar is taste, not a checkable property." +- **[later]** *Internal inconsistency.* "`--force`/`--yes` MUST and P1 `--no-interactive` MUST need to compose + explicitly (agent path is `--force --no-interactive`); composition isn't called out, leaving the 'without it, the tool + refuses or enters dry-run' clause ambiguous when stdin is non-TTY." Deferred: tightening the MUST to specify + error-vs-dry-run behavior under `--no-interactive` modifies the bullet's contract semantics. Bundled with other + MUST-content cleanups for a future PR. +- **[later]** *MUST-vs-SHOULD.* "`read-write-distinction` MUST hinges on 'clear from command name and help text alone', + subjective and unverifiable by `anc`. The `sync` anti-pattern proves the bar is taste, not an auditable property." Deferred: rewriting to a verifiable form ("Help text for every write command MUST contain an explicit mutation statement; command names SHOULD signal intent") creates a new SHOULD-shape claim, which is a `requirements[]` change. - Coupled-release fires firmly. Defer to v0.4.0 with explicit registry-coordination plan. + Coupled-release fires firmly. Defer to a future PR with explicit registry-coordination plan. - **[later]** *Prior art.* "Principle prescribes flag *names* (`--dry-run`, `--force`, `--yes`) without naming the contract behind them. kubectl `--dry-run=server|client`, Terraform `plan`/`apply`, apt `--simulate`, rsync `-n` all satisfy the contract under different surfaces." Deferred: worth revisiting whether to add a 'name-or-contract-equivalent' clause that names the contract first and treats canonical flag spelling as one - realization. Hold for v0.4.0 alongside the verifiability rewrite above. -- **[wontfix]** *Must-vs-should.* "'Why Agents Need It' leans on retry-safety, then idempotency lands as SHOULD. If + realization. Hold for a future PR alongside the verifiability rewrite above. +- **[wontfix]** *MUST-vs-SHOULD.* "'Why Agents Need It' leans on retry-safety, then idempotency lands as SHOULD. If retries are the framing, idempotency-where-domain-allows is the load-bearing property; `--dry-run` is mitigation, not cure." Rationale: domain-gated idempotency genuinely cannot be a universal MUST (some domains forbid it: append-only logs, payment capture). The current SHOULD is correct; the prose framing in "Why Agents Need It" is fine because it explains *why* idempotency matters when it is available, not that it is universally required. -- **[edit]** *Vague agent-native.* "'Agents retry failed operations by default' — true for Claude Code/Cursor/Aider tool +- **[edit]** *Vague agent-native.* "'Agents retry failed operations by default'. True for Claude Code/Cursor/Aider tool loops; not universally true for one-shot harnesses or human-in-the-loop agents." Resolved: "Why Agents Need It" hedged to "Agent harnesses commonly retry failed operations." Same operational point; more accurate across harness shapes. diff --git a/spec/principles/p6-composable-predictable-command-structure.md b/spec/principles/p6-composable-predictable-command-structure.md index 0448d8e..e8aecd1 100644 --- a/spec/principles/p6-composable-predictable-command-structure.md +++ b/spec/principles/p6-composable-predictable-command-structure.md @@ -1,21 +1,26 @@ --- id: p6 title: Composable and Predictable Command Structure -last-revised: 2026-04-22 +last-revised: 2026-05-07 status: active requirements: - id: p6-must-sigpipe level: must applicability: universal summary: SIGPIPE is handled so piping to `head`/`tail` does not crash the process (Rust example below; Python/Go/Node have language-specific equivalents). + - id: p6-must-sigterm + level: must + applicability: + if: CLI has long-running operations + summary: "Long-running operations handle SIGTERM gracefully: flush or roll back partial writes, release locks, exit non-zero within a bounded window. Next invocation succeeds without manual cleanup." - id: p6-must-no-color level: must applicability: universal - summary: TTY detection plus support for `NO_COLOR` and `TERM=dumb` — color codes suppressed when stdout/stderr is not a terminal. + summary: "TTY detection plus support for `NO_COLOR` and `TERM=dumb`: color codes suppressed when stdout/stderr is not a terminal." - id: p6-must-completions level: must applicability: universal - summary: Shell completions available via a `completions` subcommand (Tier 1 meta-command — needs no config/auth/network). + summary: Shell completions available via a `completions` subcommand (Tier 1 meta-command, needs no config/auth/network). - id: p6-must-timeout-network level: must applicability: @@ -54,6 +59,11 @@ requirements: level: may applicability: universal summary: "`--color auto|always|never` flag for explicit color control beyond TTY auto-detection." + - id: p6-may-standard-names + level: may + applicability: + if: CLI uses subcommands + summary: "Subcommand verbs MAY follow community-standard names (`get`/`list`/`create`/`update`/`delete`); flag spellings MAY follow widely-used canonical forms (`--force`, `--yes`, `--limit`, `--quiet`, `--verbose`)." --- # P6: Composable and Predictable Command Structure @@ -88,14 +98,20 @@ tool a building block rather than a dead end. unsafe { libc::signal(libc::SIGPIPE, libc::SIG_DFL); } ``` - Equivalents in other languages: Python — restore the default `SIGPIPE` handler at startup - (`signal.signal(signal.SIGPIPE, signal.SIG_DFL)`); Go — the runtime's default handling already exits cleanly on - EPIPE writes; Node.js — handle `EPIPE` on `process.stdout`. + Equivalents in other languages: in Python, restore the default `SIGPIPE` handler at startup + (`signal.signal(signal.SIGPIPE, signal.SIG_DFL)`); in Go, the runtime's default handling already exits cleanly on + EPIPE writes; in Node.js, handle `EPIPE` on `process.stdout`. + +- Agent harnesses send SIGTERM when their own timeout fires. A CLI that exits abruptly leaving a half-written file, a + stale `*.tmp` artifact, or a held flock makes the next invocation fail with a confusing error the agent cannot + diagnose. Long-running operations MUST handle SIGTERM by flushing or rolling back partial writes, releasing acquired + locks, and exiting non-zero within a bounded shutdown window. The next invocation MUST succeed without manual cleanup + of the previous run's state. This complements the existing SIGPIPE MUST (`p6-must-sigpipe`). - TTY detection, plus support for `NO_COLOR` and `TERM=dumb`. When stdout or stderr is not a terminal, color codes are suppressed automatically. - Shell completions available via a `completions` subcommand (clap_complete in Rust; equivalents elsewhere). This is a - Tier 1 meta-command — it works without config, auth, or network. + Tier 1 meta-command: it works without config, auth, or network. - Network CLIs ship a `--timeout` flag with a sensible default (30 seconds). Agents operating under their own time budgets need to fail fast rather than block on a slow upstream. - If the CLI uses a pager (`less`, `more`, `$PAGER`), it supports `--no-pager` or respects `PAGER=""`. Pagers block @@ -105,18 +121,23 @@ tool a building block rather than a dead end. **SHOULD:** -- Commands that accept input read from stdin when no file argument is provided. Pipeline composition depends on it. -- Subcommand naming follows a consistent `noun verb` or `verb noun` convention throughout the tool. Mixing patterns - (e.g., `list-users` alongside `user show`) forces agents to learn exceptions. +- Commands that accept input data SHOULD read from stdin when no file argument is provided. Pipeline composition depends + on it. +- Subcommand naming SHOULD follow one consistent grammar (`noun verb` or `verb noun`) throughout the tool. Mixed + patterns (e.g., `list-users` alongside `user show`) force consumers to memorize exceptions instead of applying a rule. - A three-tier dependency gating pattern: Tier 1 (meta-commands like `completions`, `version`) needs nothing; Tier 2 (local commands) needs config; Tier 3 (network commands) needs config + auth. `completions` and `version` always work, even in broken environments. -- Operations are modeled as subcommands, not flags. `tool search "query"` is correct; `tool --search "query"` is wrong. - Flags modify behavior (`--quiet`, `--output json`); subcommands select operations. +- Operations SHOULD be modeled as subcommands, not flags. `tool search "query"` is correct; `tool --search "query"` + conflates two roles. Flags modify behavior (`--quiet`, `--output json`); subcommands select operations. **MAY:** - A `--color auto|always|never` flag for explicit color control beyond TTY auto-detection. +- Subcommand verbs MAY follow community-standard names (`get` / `list` / `create` / `update` / `delete`); flag spellings + MAY follow widely-used canonical forms (`--force` for confirmation bypass, `--yes` for prompt bypass, `--limit` for + pagination, `--quiet`/`--verbose` for volume control). Convergence reduces an agent's per-tool relearning cost: an + agent that has seen `kubectl get` and `gh repo list` recognizes `tool list` immediately, without re-reading `--help`. ## Evidence @@ -130,31 +151,31 @@ tool a building block rather than a dead end. ## Anti-Patterns -- Missing SIGPIPE handler — `cargo run -- list | head` panics with "broken pipe". +- Missing SIGPIPE handler: `cargo run -- list | head` panics with "broken pipe". - Hard-coded ANSI escape codes without TTY detection. -- Color output in JSON mode — ANSI codes inside JSON string values break downstream parsing. +- Color output in JSON mode: ANSI codes inside JSON string values break downstream parsing. - A `completions` command that requires auth or config to run. - No stdin support on commands where piped input is a natural use case. -Measured by check IDs `p6-sigpipe`, `p6-no-color`, `p6-completions`, `p6-timeout`, `p6-agents-md`. Run `agentnative -check --principle 6 .` against your CLI to see each. +Measured by audit IDs `p6-sigpipe`, `p6-no-color`, `p6-completions`, `p6-timeout`, `p6-agents-md`. Run `anc audit +--principle 6 .` against the CLI under test to see each. ## Pressure test notes -### 2026-04-27 — Show HN launch red-team pass +### 2026-04-27: Red-team pass Adversarial review via `compound-engineering:ce-adversarial-document-reviewer` ahead of the v0.3.0 launch. Findings recorded verbatim per `principles/AGENTS.md` § "Pressure-test protocol". - **[edit]** *Prior art / vague agent-native.* "The SIGPIPE MUST prescribes `unsafe { libc::signal(libc::SIGPIPE, - libc::SIG_DFL); }` as the first `main()` statement — that is a Rust-specific remedy. Python raises `BrokenPipeError` - by default (different fix), Go's runtime already exits cleanly on EPIPE writes (no fix needed), Node.js needs + libc::SIG_DFL); }` as the first `main()` statement. That is a Rust-specific remedy. Python raises `BrokenPipeError` by + default (different fix), Go's runtime already exits cleanly on EPIPE writes (no fix needed), Node.js needs `process.stdout.on('error')`. The MUST as written is correct in spirit but the prescription leaks Rust into a universal-applicability rule." Resolved: prose bullet now leads with the language-neutral MUST ("SIGPIPE is handled so that piping to `head`, `tail`, or any tool that closes the pipe early does not crash the process"); the Rust snippet stays as the canonical example; per-language one-liners cover Python, Go, and Node. Frontmatter summary updated to match. -- **[edit]** *Must-vs-should.* "The `global = true` MUST is a clap-API artifact — the behavioral requirement is 'agentic +- **[edit]** *MUST-vs-SHOULD.* "The `global = true` MUST is a clap-API artifact. The behavioral requirement is 'agentic flags propagate to every subcommand,' which is what the prose actually says. The frontmatter summary baking `global = true` into a universal contract overfits to one library." Resolved: frontmatter summary and prose bullet now lead with the behavioral requirement ("propagate to every subcommand"), with `global = true` cited as the clap-specific example. diff --git a/spec/principles/p7-bounded-high-signal-responses.md b/spec/principles/p7-bounded-high-signal-responses.md index a77da84..62f13a0 100644 --- a/spec/principles/p7-bounded-high-signal-responses.md +++ b/spec/principles/p7-bounded-high-signal-responses.md @@ -1,7 +1,7 @@ --- id: p7 title: Bounded, High-Signal Responses -last-revised: 2026-04-22 +last-revised: 2026-05-07 status: active requirements: - id: p7-must-quiet @@ -12,7 +12,7 @@ requirements: level: must applicability: if: CLI has list-style commands - summary: "List operations clamp to a sensible default maximum; when truncated, indicate it (`\"truncated\": true` in JSON, stderr note in text)." + summary: "List operations clamp to a documented default maximum; when truncated, indicate it (`\"truncated\": true` in JSON, stderr note in text)." - id: p7-should-verbose level: should applicability: universal @@ -41,13 +41,13 @@ requirements: ## Definition -CLI tools MUST provide mechanisms to control output volume. Agent context windows are finite and expensive — a tool that -dumps 10,000 lines of unfiltered output wastes tokens and may exceed the context limit entirely, breaking the +CLI tools MUST provide mechanisms to control output volume. Agent context windows are finite and expensive: a tool that +dumps 10,000 lines of unfiltered output wastes tokens and can exceed the context limit entirely, breaking the conversation that invoked it. ## Why Agents Need It -Unbounded CLI output is expensive for any agent — token cost and context-window capacity for LLM agents, parse cost and +Unbounded CLI output is expensive for any agent: token cost and context-window capacity for LLM agents, parse cost and memory pressure for scripts, schedulers, and other automation. Either way, the agent ends up truncating (losing potentially important data) or consuming the full response (wasting cycles on noise). Bounded output with `--quiet`, `--verbose`, and `--limit` flags gives the agent precise control over how much data arrives, keeping responses @@ -57,9 +57,9 @@ high-signal and inside budget. **MUST:** -- A `--quiet` flag suppresses non-essential output: progress indicators, informational messages, decorative formatting. - When `--quiet` is set, only requested data and errors appear. Implementations typically route diagnostics through a - macro that short-circuits when quiet is on: +- A `--quiet` flag MUST suppress non-essential output (progress indicators, informational messages, decorative + formatting). Under `--quiet`, only requested data and errors appear. The Rust realization gates diagnostics through a + macro: ```rust macro_rules! diag { @@ -69,19 +69,19 @@ high-signal and inside budget. } ``` -- List operations clamp to a sensible default maximum. A `list` without `--limit` does not return more than a - configurable ceiling (e.g., 100 items). If more items exist, the output indicates truncation — `"truncated": true` in - JSON, a stderr note in text mode. +- List operations MUST clamp to a documented default maximum. A `list` invoked without `--limit` returns no more than a + configurable ceiling (e.g., 100 items). When the underlying result set exceeds the ceiling, the output signals + truncation: `"truncated": true` in JSON, a stderr note in text mode. **SHOULD:** - A `--verbose` flag (or `-v` / `-vv`) escalates diagnostic detail when agents need to debug failures. -- A `--limit` or `--max-results` flag lets callers request exactly the number of items they want. +- A `--limit` (or `--max-results`) flag SHOULD let callers request exactly the number of items they want. - A `--timeout` flag bounds execution time. An agent waiting indefinitely on a hung network call cannot proceed. **MAY:** -- Cursor-based pagination flags (`--after`, `--before`) for efficient traversal of large result sets. +- Cursor-based pagination flags (`--after`, `--before`) MAY be offered for efficient traversal of large result sets. - Automatic verbosity reduction in non-TTY contexts (the same behavior `--quiet` explicitly requests). ## Evidence @@ -96,35 +96,35 @@ high-signal and inside budget. ## Anti-Patterns -- List commands that return all results with no default limit — an agent listing 50,000 items floods its context window. -- No `--quiet` flag — agents consuming JSON output still receive interleaved diagnostic text on stderr. +- List commands that return all results with no default limit. An agent listing 50,000 items floods its context window. +- No `--quiet` flag. Agents consuming JSON output still receive interleaved diagnostic text on stderr. - `--verbose` as the only output control. If there is no way to reduce output, bounded responses do not exist. - Progress bars or spinners that write to stderr in non-TTY contexts, adding noise to agent logs. - No `--timeout` on network operations. A stalled request blocks the agent indefinitely. -Measured by check IDs `p7-quiet`, `p7-limit`, `p7-timeout`. Run `agentnative check --principle 7 .` against your CLI to -see each. +Measured by audit IDs `p7-quiet`, `p7-limit`, `p7-timeout`. Run `anc audit --principle 7 .` against the CLI under test +to see each. ## Pressure test notes -### 2026-04-27 — Show HN launch red-team pass +### 2026-04-27: Red-team pass Adversarial review via `compound-engineering:ce-adversarial-document-reviewer` ahead of the v0.3.0 launch. Findings recorded verbatim per `principles/AGENTS.md` § "Pressure-test protocol". - **[later]** *Internal inconsistency.* "`--timeout` is universal SHOULD in P7 but conditional MUST in P6 (`p6-must-timeout-network`). For network CLIs the two compose (MUST wins), but P7's prose ('An agent waiting - indefinitely on a hung network call cannot proceed') only motivates the network case — the universal scope is + indefinitely on a hung network call cannot proceed') only motivates the network case. The universal scope is unjustified by its own rationale." Deferred: narrowing P7's `applicability` from `universal` to non-network - long-running operations only — or to `if: CLI has long-running operations` — fires the coupled-release norm (CLI - registry parses `applicability`). Bundled with other applicability cleanups for a v0.4.0 PR with explicit registry + long-running operations only (or to `if: CLI has long-running operations`) fires the coupled-release norm (CLI + registry parses `applicability`). Bundled with other applicability cleanups for a future PR with explicit registry coordination. -- **[later]** *Must-vs-should.* "The list-clamping MUST fires on every CLI with 'list-style commands' regardless of +- **[later]** *MUST-vs-SHOULD.* "The list-clamping MUST fires on every CLI with 'list-style commands' regardless of natural cardinality. A tool whose list operation returns a bounded small set by construction (e.g., `anc principles - list` → exactly 7) gains nothing from a clamp + `\"truncated\": true` contract — the clamp is unreachable and the + list` → exactly 7) gains nothing from a clamp + `\"truncated\": true` contract: the clamp is unreachable and the truncation flag is dead schema." Deferred: narrowing the `if:` clause from "CLI has list-style commands" to "CLI has list-style commands whose result set is unbounded or user-data-driven" changes the registry-parsed applicability - value. Bundled with the P3 / P7 applicability cleanups for v0.4.0. + value. Bundled with the P3 / P7 applicability cleanups for a future PR. - **[edit]** *Vague agent-native.* "'Every token of CLI output an agent consumes has a cost' plus 'context window' framing dates the principle to current LLM-agent assumptions. Non-LLM agents (scripts, schedulers, future architectures) still benefit from bounded output, but for throughput/parsing reasons, not tokens." Resolved: "Why diff --git a/spec/principles/p8-discoverable-skill-bundle.md b/spec/principles/p8-discoverable-skill-bundle.md new file mode 100644 index 0000000..53206f9 --- /dev/null +++ b/spec/principles/p8-discoverable-skill-bundle.md @@ -0,0 +1,94 @@ +--- +id: p8 +title: Discoverable Through Agent Skill Bundles +last-revised: 2026-05-29 +status: active +requirements: + - id: p8-must-bundle-install + level: must + applicability: + kind: conditional + antecedent: + audit_id: p8-bundle-exists + summary: "When a skill bundle exists, the CLI provides an install path (`tool skill install []`) that registers the bundle with installed agent runtimes." + - id: p8-should-bundle-exists + level: should + applicability: universal + summary: "CLIs ship a top-level agent-discoverable markdown bundle (`AGENTS.md`, `SKILL.md`, or equivalent) with YAML frontmatter naming the tool and capability summary." + - id: p8-may-install-all + level: may + applicability: + kind: conditional + antecedent: + audit_id: p8-bundle-exists + summary: "An `--all` mode auto-detects installed runtimes (Claude Code, Cursor, Codex, OpenCode, etc.) and installs across all." + - id: p8-may-bundle-update + level: may + applicability: + kind: conditional + antecedent: + audit_id: p8-bundle-exists + summary: "An update/upgrade subcommand (`tool skill update`) pulls the latest bundle version." +--- + +# P8: Discoverable Through Agent Skill Bundles + +## Definition + +A skill bundle is a structured markdown file (canonical names: `AGENTS.md` or `SKILL.md`) with YAML frontmatter that +names the tool, describes its capabilities, and provides workflow guidance an agent can load into its runtime. The +bundle lives outside the CLI's flag space: agents discover it via filesystem convention, not via `--help`. + +## Why Agents Need It + +`--help` describes what is *possible* (the flag and subcommand surface); a skill bundle describes what to *do* (workflow +knowledge, common compositions, recovery patterns). Workflow knowledge does not fit in `after_help` examples. Without a +bundle, every invocation begins with a `--help` round-trip plus inference; with one, the agent loads `SKILL.md` once and +recognizes the tool's idioms across every subsequent invocation. + +## Requirements + +**MUST:** + +- If a CLI ships a skill bundle, then it MUST provide an install path that registers the bundle with installed agent + runtimes. The canonical form is a `tool skill install []` subcommand that writes into the runtime's filesystem + cascade (e.g., `~/.claude/skills/`, `~/.cursor/skills/`). Non-canonical alternatives (`tool init --skill`, `tool + skills add`, `tool agents add`) are acceptable but SHOULD migrate toward `tool skill install`. A bundle without an + install path sits unread until a human manually copies it; the install path is what turns the bundle from + documentation into discoverable runtime knowledge. + +**SHOULD:** + +- CLIs SHOULD ship a top-level agent-discoverable markdown bundle (canonical names are `AGENTS.md` or `SKILL.md`, both + recognized by major agent runtimes) with YAML frontmatter naming the tool and summarizing its capabilities. The + bundle's first job is to be findable by filesystem convention; its second is to teach the agent how to invoke the tool + well. + +**MAY:** + +- If a CLI ships a skill bundle, then an `--all` mode MAY auto-detect installed agent runtimes (Claude Code, Cursor, + Codex, OpenCode, and others as the ecosystem evolves) and install the bundle across each. A user setting up a new + machine with multiple coding agents installs once and gets coverage across every runtime. +- If a CLI ships a skill bundle, then an `update` (or `upgrade`) subcommand under `tool skill` MAY pull the latest + bundle version, so agents stay current with the CLI's evolving surface without a full reinstall. + +## Evidence + +- A top-level `AGENTS.md` or `SKILL.md` in the CLI's source tree (and shipped in the release artifact) with YAML + frontmatter declaring at least the tool name and a one-line capability summary. +- A `skill` subcommand group in the CLI enum (e.g., `tool skill install`, `tool skill update`, `tool skill list`). +- An installer that targets the runtime cascade directly (file writes to `~/.claude/skills//`, etc.) rather than + requiring the runtime to be running. +- Bundle content versioned alongside the CLI's release: the bundle ships from the same commit as the binary, not from a + separate doc tree that drifts. + +## Anti-Patterns + +- A CLI shipping a skill bundle with no install path: the bundle sits unread until a human manually copies it. +- An install path that requires the agent runtime to be running: `tool skill install` writes to the runtime's filesystem + cascade (e.g., `~/.claude/skills/`) rather than requiring an active session. +- A bundle whose contents drift from the CLI's actual surface: the bundle is part of the CLI's release artifact, not a + separate doc tree. + +The vendor census in the v0.4.0 source-mining sprint documents the shipped patterns across Firecrawl, CLI-Anything, gws, +Crush, and larksuite; the `agentnative-skill` repo's `bin/check-update` is a reference for an update-check pattern. diff --git a/spec/principles/scoring.md b/spec/principles/scoring.md new file mode 100644 index 0000000..0ef9929 --- /dev/null +++ b/spec/principles/scoring.md @@ -0,0 +1,129 @@ +--- +title: "Scoring — leaderboard formula and badge eligibility" +last-revised: 2026-05-28 +--- + +# Scoring — leaderboard formula and badge eligibility + +This document defines how a tool's per-requirement scorecard collapses into the single `score_pct` that drives the +`anc.dev` leaderboard, the badge color, and badge eligibility. The taxonomy of per-requirement statuses and how they are +produced is defined in [`AGENTS.md`](AGENTS.md); this document is the layer above it: how those rows aggregate into one +number. + +The formula is the spec-side contract. The `anc` CLI computes `badge.score_pct` to match it, and the +[`agentnative-site`](https://github.com/brettdavies/agentnative-site) renderer reads the eligibility floor and color +bands defined here. + +## Scope: shipped-binary behavior only + +A public score reflects how the **shipped binary behaves**, observed by running it (`anc audit --command `). +Source-code and repository audits (static analysis of a project's source tree, manifest inspection, bundle presence) do +**not** contribute to the public score. What a tool's source code looks like does not change how an agent experiences +the installed binary. The binary is what agents run. + +Concretely, only behavioral-layer requirement rows enter the formula. Source-layer and project-layer audits are out of +scope for the leaderboard; they belong to a future advisory mode (`anc` run against a source tree to help authors +improve before release), not to the published score. This holds uniformly: every tool on the leaderboard, including +`anc` itself, is scored from binary behavior alone, so the comparison is like-for-like. + +## Inputs: the seven statuses + +Each behavioral requirement row carries a `status` and a `tier` (`must`, `should`, or `may`). The formula treats the +seven statuses in three groups: + +| Status | In denominator? | Execution credit | Meaning | +| --------- | --------------- | ---------------- | ------------------------------------------------ | +| `pass` | yes | 1.0 | Behavior present and correct. | +| `warn` | yes | 0.5 | Behavior present, partially correct. | +| `fail` | yes | 0.0 | Behavior expected, absent or broken. | +| `opt_out` | yes | 0.0 | Behavior deliberately declined (counts against). | +| `n_a` | no | — | Inapplicable: a conditional antecedent is unmet. | +| `skip` | no | — | Unmeasurable: the probe could not determine. | +| `error` | no | — | The probe raised an exception. | + +`opt_out` is in the denominator on purpose: a deliberate decision not to adopt a behavior is a real signal worth +reflecting, distinct from a behavior that simply does not apply (`n_a`) or could not be measured (`skip`). The +distinction is the whole point of the seven-status taxonomy: it lets the score exclude genuinely inapplicable audits +without letting deliberate non-adoption hide. + +## Formula + +For a tool's set of behavioral rows, let `D` be the rows whose status is in `{pass, warn, fail, opt_out}` (the +denominator set). With per-tier weights `w(must)`, `w(should)`, `w(may)`: + +```text +score_pct = round( 100 × Σ_{i∈D} w(tier_i) · credit(status_i) + ─────────────────────────────────────── ) + Σ_{i∈D} w(tier_i) +``` + +where `credit(pass) = 1.0`, `credit(warn) = 0.5`, `credit(fail) = credit(opt_out) = 0.0`. A tool with an empty +denominator set scores 0. + +### Tier weights (tunable) + +The tier weights are a parameter, not a constant baked into the definition, so the standard can re-tune the balance +between MUST/SHOULD/MAY without redefining the formula. The current published weights are **flat**: + +```text +w(must) = w(should) = w(may) = 1 +``` + +Flat weights keep the score legible (every behavioral audit counts the same) and avoid over- or under-rewarding +optional-tier adoption relative to the mandatory baseline. Under flat weights the formula reduces to a simple +credit-weighted ratio: + +```text +score_pct = round( 100 × (n_pass + 0.5 · n_warn) / (n_pass + n_warn + n_fail + n_opt_out) ) +``` + +A future revision can move to non-flat weights (for example, weighting failures at the MUST tier more heavily). Such a +change is a re-tuning of a published parameter and ships through the normal versioned-release path, subject to the +stability commitment below. + +### Worked example + +A narrow filter that ships no structured output: 20 `pass`, 7 `warn`, 0 `fail`, 1 `opt_out`, 1 `n_a`, 14 `skip`. + +- Denominator set `D` = the 20 + 7 + 0 + 1 = 28 rows that are pass/warn/fail/opt_out. The `n_a` and `skip` rows are + excluded. +- Numerator (flat weights) = 20 × 1.0 + 7 × 0.5 + 0 + 1 × 0.0 = 23.5. +- `score_pct = round(100 × 23.5 / 28) = 84` → **Strong** band. + +## Eligibility floor + +**A tool is badge-eligible at `score_pct ≥ 70`.** + +The floor is deliberately low. The badge's job is to spread the standard: a tool that clears a reasonable bar can +display it and point readers at its scorecard. Exclusivity is carried by the cohort bands below and by the score shown +on the badge itself, not by a high gate. A tool below the floor still gets a rendered badge and scorecard. The color +reflects the lower score, and the scorecard page shifts to improvement hints rather than the embed snippet (see +[`docs/badge.md`](../docs/badge.md#regression-behavior)). + +## Cohort bands + +The score maps to one of four cohort bands above the floor, plus a below-floor state. The bands are the exclusivity +signal. The top band is intentionally rare. + +| Band | Score | Meaning | +| ------------- | ------- | ----------------------------------------------------- | +| **Exemplary** | `≥ 85` | Near-complete binary-behavior conformance. Rare. | +| **Strong** | `80–84` | Broad conformance with minor gaps. | +| **Solid** | `75–79` | Solidly above the floor. | +| **Qualified** | `70–74` | Meets the eligibility floor. | +| _below floor_ | `< 70` | Not yet eligible; badge renders muted improvement UX. | + +The band thresholds are the spec-side contract; [`docs/badge.md`](../docs/badge.md) maps each band to a rendered color. +Exact colors are a rendering detail owned by the site. + +## Stability commitment + +The formula, tier weights, eligibility floor, and band thresholds are held stable for at least six months from +publication so that authors who embed the badge can rely on it. Any change ships through a versioned spec release with +the rationale recorded, never as a silent re-tuning. + +## Cross-references + +- [`AGENTS.md`](AGENTS.md): the seven-status taxonomy, per-row scorecard model, and antecedent propagation. +- [`docs/badge.md`](../docs/badge.md): badge claim, embed shapes, color rendering per band, regression behavior. +- RFC 2119 / RFC 8174: MUST / SHOULD / MAY tier semantics.