Skip to content

wp-abilities-verify: add new skill for verifying Abilities API registrations including adversarial readonly-but-writes detection#47

Draft
rtio wants to merge 40 commits intoWordPress:trunkfrom
rtio:feat/abilities-verify-skill
Draft

wp-abilities-verify: add new skill for verifying Abilities API registrations including adversarial readonly-but-writes detection#47
rtio wants to merge 40 commits intoWordPress:trunkfrom
rtio:feat/abilities-verify-skill

Conversation

@rtio
Copy link
Copy Markdown

@rtio rtio commented Apr 30, 2026

Adds a new skill wp-abilities-verify that checks a plugin's Abilities API registrations against the structural and behavioral promises they declare. The centerpiece is adversarial annotation correctness: a readonly: true ability whose execute callback actually writes (via $wpdb->update, update_option, wp_insert_*, a non-GET delegate, etc.) is the bug class that unit tests don't catch — the mock looks just like the real writer. Verify reads the callback body and compares what it does against what the annotation claims, with a small set of common write patterns as a starting point and the agent operationalizing for the plugin's own vocabulary.

Depends on #44, #45, #46. The branch stacks on wp-abilities-audit (#46) which stacks on wp-abilities-api mechanics (#45) which stacks on wp-abilities-api strategy refs (#44). For a focused view of just this PR's commits, see feat/abilities-audit-skill...feat/abilities-verify-skill on the fork.

What lands here

File Lines Purpose
skills/wp-abilities-verify/SKILL.md 212 Skill entrypoint. Two modes (static, runtime). Seven-step procedure: validate audit doc → enumerate statically → enumerate at runtime (runtime mode) → annotation correctness → permission roundtrip → schema lints → error-code vocabulary. Notes the static-mode blind spots so a static PASS isn't read as "verified write-free."
references/static-enumeration.md 126 Find wp_register_ability( calls (multi-line via rg --multiline --pcre2), extract names and annotation blocks, follow execute_callback to its body. Records limitations for variable-indirected names / annotations / callbacks so reviewers know where runtime mode is required.
references/annotation-correctness.md 145 The adversarial core. Three-claims table grounded in core's docblock (class-wp-ability.php lines 47-48 for idempotent — "no additional effect on the environment") and the run controller's HTTP method routing (class-wp-rest-abilities-v1-run-controller.php lines 110-116). Lists common write patterns as a starting set, not a checklist. Documents known blind spots (indirected service writes, hooks-as-writes, variable-built HTTP methods, tautological caps), the // verify-ignore suppression mechanism, and the runtime twin-invocation heuristic.
references/schema-lints.md 111 Six small principles applied to each ability's input_schema: object schemas declare additionalProperties; required fields have descriptions; enums non-empty; no $ref; defaults are statically constant (including (object) array() per input-schema-gotchas.md §4); reference abilities have no required inputs.
references/permission-roundtrip.md 164 Background on what permission_callback actually receives (validated input value, never a WP_REST_Request — the antipattern ships in real plugins copied from REST controllers without translation). Six callback shapes (A-F) classified into OK / WARN / FAIL. Runnable wp-cli block exercising the gate against anon / subscriber / admin contexts. Audit cross-check when an audit doc is provided.
references/runtime-harness.md 360 Six wp-cli checks against a booted WordPress: ability-name match, annotation read-back, read execution, write input validation, permission gate, idempotency twin-invocation heuristic. PHP snippets target the declared 7.2.24+ floor. Check 6 frames twin-invocation as a heuristic for environmental change rather than a verdict on return-value equality.
references/audit-schema-validation.md 131 Validates audit docs against the canonical schema owned by wp-abilities-audit. Three whole-audit invariants (reference_ability uniqueness, backing: null paired with a surfaced_gaps entry, optional arrays may be empty).
eval/scenarios/wp-abilities-verify.json End-to-end scenario covering a plugin with 7 abilities where one read accidentally calls $wpdb->update after a refactor; success criteria include FAIL top-line status and one FAIL row in annotation correctness with file + line evidence.

Verify-skill content total: 1,249 lines across 7 markdown files + 1 eval scenario.

Iteration history

This PR went through three rounds against @nerrad's review:

Verification

  • WP core symbols cited (WP_Ability::check_permissions, WP_Ability::invoke_callback, WP_Ability::validate_input, WP_Ability::execute, WP_Ability::get_meta, WP_Ability::get_category, wp_register_ability, wp_get_ability, wp_get_abilities, wp_set_current_user, wp_create_user, get_user_by, is_wp_error, current_user_can, is_user_logged_in, __return_true, WP_User::set_role, WP_Error::get_error_code) verified against WordPress/wordpress-develop trunk. Idempotent definition matches class-wp-ability.php lines 47-48; HTTP method routing matches class-wp-rest-abilities-v1-run-controller.php lines 110-116.
  • PHP code blocks php -l'd on PHP 7.2 — no syntax errors.
  • node eval/harness/run.mjs passes.
  • Diff hygiene: no internal paths, no PII, no plugin-specific capability strings outside scoped examples.

Reviewer notes

rtio added 30 commits April 30, 2026 14:06
Agents that conflate ability registration with consumer exposure end up
re-registering abilities every time the projection question reopens
(token budgets, MCP shape, Command Palette conventions). Treating
those as one decision couples a stable layer (capabilities) to a
volatile one (consumer constraints).

This reference captures a three-layer model: domain capability,
optional workflow, and projection. Adds the use-case-contract test
("would a human ever do this in admin?"), the "same code path as the
UI" rule, and a decision order (domain first, projection second,
workflow third). Worked example walks a generic Notifications plugin
through three projections off the same three domain registrations.
Abilities that re-implement business logic instead of consuming the
same code path as the UI / REST surface drift apart over time:
permissions diverge, validation rules drop, side-effects fire on
UI-driven writes but not ability-driven ones. None of those breaks
immediately; the divergence becomes discoverable only when an agent
invokes the ability in production.

This reference captures three execute-callback shapes (re-implement /
delegate / shared service), the metric-trap argument for preferring
the third shape on telemetry-emitting endpoints, the MCP exposure
rule (don't expose REST + ability for the same op to the same client),
and the AGENTS.md sync rule for keeping registrations in lockstep when
the underlying code path changes. Worked example walks a generic
plugin through extracting a service class that ability + REST + UI
all consume.
Plugin authors landing abilities on top of an existing REST surface
have to choose domain-layer granularity. The three observed shapes
(action-bundle behind a single ability, REST-atomization at one
ability per HTTP method, semantic-intent at one ability per
user-meaningful question) each solve a different problem, and the
naive default — atomize because the REST surface already does — leaks
HTTP plumbing into the agent's tool list.

This reference recommends semantic-intent at the domain layer:
abilities map to user-meaningful actions, filter parameters live in
input_schema, writes stay narrow at one state transition per ability.
Five rules cover read grouping, write narrowness, filter-vs-name
choice, zero-arg overview abilities, and the one-sentence
explainability test. Two worked examples (Jetpack Forms and a generic
Tickets plugin) show the heuristic in action without committing the
reader to a specific projection-layer pattern.
The procedure section previously jumped from "Confirm version
constraints" to "Find existing usage" without surfacing the design-time
decisions that should precede registration: which layer to register at
(domain vs projection) and how to choose granularity. Agents following
the procedure verbatim reach for syntax before reasoning about shape.

Adds a layer-model paragraph at the top of the procedure pointing at
domain-vs-projection, plus two cross-links in step 4 — to
grouping-heuristic for granularity decisions and shared-core-service
for the drift-prevention pattern (including the AGENTS.md sync rule).
Five points where references/php-registration.md drifted from
wordpress-develop/src/wp-includes/abilities-api/class-wp-ability.php:

1. permission_callback was marked optional. Per prepare_properties() at
   class-wp-ability.php:287-292, it throws InvalidArgumentException when
   missing — required on every registration with no implicit default.

2. The skill placed `readonly` at meta.readonly. Core puts the three
   semantic annotations at meta.annotations.* (lines 38-51 + validation
   at 313-317). Updated the table to show meta.annotations.readonly,
   .destructive, .idempotent at the canonical path.

3. Two of the three annotations (destructive, idempotent) were missing
   entirely. All three are first-class core annotations seeded via
   $default_annotations at lines 326-336.

4. Required-vs-optional was not flagged on label, description, category,
   execute_callback, or permission_callback — all five throw on missing
   per lines 262-292. Added a "Required?" column.

5. The recommended-patterns line "Always include a permission_callback
   for abilities that modify data" framed it as a write-only practice.
   Core requires it for all registrations. Reworded.

Adjacent fix while in the file: the namespacing example said
`my-plugin:feature.edit` (colon-and-dot) but every other example in
this skill uses slash-separated `my-plugin/get-info`. Aligned.

Updated the example registration block to register a permission_callback
and all three meta.annotations keys, so the example matches the contract
the table now describes.

Refs WordPress#44 review-id 4207902432
The inclusion test in references/domain-vs-projection.md framed the
ability question as "would a human ever do this in admin?". That lens
encodes a Woo/Jetpack-implementer's bias and excludes a real and
common case: abilities scoped to public authed/unauthed users on the
site front end (storefront `get-my-orders`, events
`list-available-tickets`, profile `update-public-profile`), and
abilities invoked as part of a workflow chain by another plugin or an
agent rather than by direct human input at all.

Reframed the test as "would a human intentionally do this through a
supported UI or workflow?". That criterion:

- Keeps admin-side abilities qualifying (the original case).
- Adds public-facing UIs and end-user self-service flows.
- Covers workflow / chain-of-actions invocation paths where the human
  intent is upstream of the ability call.
- Preserves the exclusion of internal-only plumbing (cache
  invalidation, scheduler ticks, debug snapshots, lifecycle
  bookkeeping).

Also dropped the asymmetric-test sentence that read "anything that
isn't admin-meaningful almost certainly isn't ability-meaningful" —
the same point survives in the rewritten version without the
admin-only framing.

Refs WordPress#44 review-id 4207902432
Same admin-only blind spot as the inclusion test in
domain-vs-projection.md (fixed in the previous commit), surfacing in
the opening sentence of references/shared-core-service.md.

The lockstep rule applies wherever the ability mirrors a supported
human workflow — admin screen, public-facing UI, customer self-service
flow, or a plugin/agent-driven workflow — not only admin actions.

Refs WordPress#44 review-id 4207902432
…eric responses scenario

Worked example A in references/grouping-heuristic.md was anchored to a
specific Automattic plugin's source layout, including a hardcoded link
to a file path under Automattic/jetpack/projects/packages/forms/. Two
problems:

1. WordPress/agent-skills is a generic, plugin-agnostic skill repo per
   CONTRIBUTING.md. Worked examples should not lean on one canonical
   plugin family.
2. File-path links into a fast-moving codebase rot — a rename or
   restructure makes the reference broken or worse, misleading.

Worked example B was already generic (Tickets plugin); aligning A to
the same approach.

The semantic-intent table row now uses a generic `feedback/get-responses`
example. The worked example A reframes around a generic feedback or
form-response plugin with the same structural lessons (filters in
`input_schema`, one write for "modify a response", a zero-arg
dashboard-summary ability). The hardcoded canonical-file link is
removed entirely.

Refs WordPress#44 review-id 4207902432
…ern, not a real callable

The shared-core-service reference repeatedly wrote
`delegate_to_rest_controller(...)` in code-shaped backticks (table row
example, body prose at four locations, a rule-of-thumb bullet). Read in
isolation, that string looks like a callable a plugin author should be
able to find or import. It is not — at least not in PR-44's scope. The
canonical helper of that name lives in another reference that is the
subject of a separate forthcoming PR; PR-44 cannot point at it without
phantom-dependency.

Two changes:

1. Replaced every `delegate_to_rest_controller(...)` occurrence with
   prose ("the delegation pattern", "the delegate-to-REST-controller
   route") so the text describes a pattern rather than naming a
   non-existent function.

2. Added a "What the delegation pattern looks like" subsection between
   the three-shapes table and the metric-trap discussion, with a small
   inline example showing the actual mechanism: construct a
   WP_REST_Request, set params, dispatch via rest_do_request(), handle
   the error case, return data. Three core symbols cited
   (WP_REST_Request, rest_do_request, WP_Error) verified against
   wordpress-develop trunk; method calls (set_query_params, is_error,
   as_error, get_data) similarly verified.

The example is intentionally minimal and self-contained. It does not
forward-reference any helper that lives outside this PR. A `php -l`
smoke-test on the extracted block on PHP 7.2 returns no syntax errors.

Refs WordPress#44 review-id 4207902432
…fects

The shared-core-service reference centred its argument for service
extraction on a "metric trap" — telemetry double-counting under agent
traffic. A reviewer pointed out that telemetry is one of several side
effects a REST handler may emit, and that the doc was elevating it
disproportionately. Plugin authors may track usage at any layer, and
the structural lesson — "do not let an ability trigger UI-scoped side
effects on agent invocations" — applies to audit logs, custom-event
hooks, notifications, cache invalidation, and lock acquisition just as
much as to telemetry.

Restructured the section accordingly:

- Renamed the H2 from "The metric trap" to "Side effects on the
  existing REST path".
- Replaced the telemetry-only intro paragraph with a five-bullet list
  of side-effect categories: usage telemetry, audit logs, custom-event
  hooks, email/notification dispatch, cache/schedule/lock side
  effects. Telemetry remains as the lead bullet because metric
  double-counting is the only failure mode in this set that is
  invisible by default — the system "works" while dashboards drift —
  which is worth one sentence even after demotion.
- Generalised the "fix" bullets ("emit telemetry" → "emit its side
  effects").
- Replaced the `my_plugin_track_event(...)` calls in both worked-example
  PHP blocks with `do_action( 'my_plugin/things_listed', $rows )`, a
  generic WP-idiomatic side-effect that does not bake telemetry into
  the example. Comments updated to "Side effects (audit log, hooks,
  notifications) live on the REST adapter."
- Generalised the rule-of-thumb bullets and the closing
  override-paragraph from telemetry-specific to side-effect-general
  framing.
- Updated the two stale pointers ("the metric trap that makes that
  distinction matter", 'read "the metric trap" below first') to point
  at the renamed section.

Net effect: telemetry is mentioned three times in the file (one bullet
in the side-effects list with the silent-failure callout, one
parenthetical in the worked-example closing, one parenthetical in the
rule-of-thumb), down from six. The original concern Darren raised —
telemetry as the centerpiece — is addressed; the genuinely-specific
risk it represents is preserved as one bullet of context.

php -l on both worked-example blocks returns no syntax errors on
PHP 7.2.

Refs WordPress#44 review-id 4207902432
The skill's php-registration reference covers `meta.show_in_rest` and the
three `meta.annotations` keys, but does not cover `meta.mcp.public` or
`meta.mcp.type`. The bundled WordPress MCP adapter (the package WP plugins
pull in for MCP exposure) reads those two keys to decide which abilities
to surface and how to project them. An ability registered with
`show_in_rest => true` but no `meta.mcp` block is technically valid yet
invisible to MCP clients connecting through that adapter — which is the
default discovery path for agents.

Surfaced as review feedback on a WooPayments pilot PR: a registration
that followed the skill exactly was discoverable on the abilities REST
namespace but missing from the MCP tool list. The reviewer's note was
"we should default all our registered abilities to use this unless we
know for sure it shouldn't be exposed."

Add both keys to the registration arguments table with their defaults
and constraints (mcp.public is bool default false; mcp.type is one of
'tool'/'resource'/'prompt' default 'tool', with silent coercion to
'tool' for any value outside the enum). Add a section that distinguishes
the abilities-REST namespace contract (`show_in_rest` → `wp-abilities/v1`)
from the MCP adapter projection (`mcp.public` → adapter's default MCP
server) so plugin authors don't conflate them. Update the canonical
example so the recommended starting shape opts the ability into MCP
discovery.

Verified against the MCP adapter package source: `mcp.public` read at
McpAbilityHelperTrait.php lines 37 and 61; `mcp.type` read at line 75
with the enum guard at line 78. PHP example block re-validated with
`php -l`.

Refs WordPress#44
Refs Automattic/woocommerce-payments#11679
…nded

The reference's table previously listed all three `meta.annotations`
keys (`readonly`, `destructive`, `idempotent`) as "Optional (default
null)". The post-table paragraph said plugins were "expected to populate
them honestly" but did not explain why a `null` annotation is materially
worse than a populated one.

Observed downstream consequence: an agent following the skill registered
an ability and shipped it without any annotations. The reviewer flagged
the omission. The skill teaches the keys exist and that they are hints
for tooling, but the cell-level "Optional" framing licensed the agent's
shortcut. The structural fix is to escalate the tone.

Reword the three table cells from "Optional" to "Strongly recommended"
while keeping the technically accurate "(default null)" footnote so
readers don't misread this as a core-enforced requirement. Expand the
post-table paragraph to spell out what `readonly: null` actually
broadcasts — "behavior unknown" — and why that is a worse signal than
either `true` or `false`. Strengthen the recommended-patterns bullet
along the same axis: explicit cost (three lines) vs. cost of omission
(opaque safety surface).

Verified core behavior at class-wp-ability.php lines 38-51 (the
`$default_annotations` array seeds all three keys to `null`) — the
table's "default null" claim still holds against trunk.

Refs WordPress#44
Refs Automattic/woocommerce-payments#11679
…effect

The "Side effects on the existing REST path" section enumerated five
semantic side effects of routing an ability through `rest_do_request()`
(telemetry, audit logs, custom-event hooks, notifications, cache/lock
side effects) but missed a category that is mechanically different but
just as important: performance.

`rest_do_request()` calls `rest_get_server()`, which is a lazy singleton
that fires `rest_api_init` the first time it's instantiated in a request
lifecycle. Inside an HTTP REST request that cost is paid before the
abilities layer ever runs. Outside one — CLI invocations, cron jobs,
agent runs hitting an ability through a non-REST MCP transport — the
*first* delegating ability call pays it. Every plugin's
`register_rest_route()` callback wires up at that point, which on a
cold path is measurably slow.

Surfaced as review feedback on a WooPayments pilot PR: the reviewer
flagged the delegation pattern with the question "what about using
List_Transactions::from_rest_request directly?" — the underlying
concern being that the ability is invoked from CLI, agents, and MCP
contexts where the REST bootstrap cost lands on the user.

Add a sixth bullet to the side-effects list, framed as "performance,
not semantics" so readers don't blur it together with the existing
five. Cite the lazy-instantiation guard explicitly so the cost-once-
per-request behavior is clear and readers don't over-correct toward
"never delegate." Add a corresponding rule-of-thumb row for "read
predominantly invoked outside REST" and update the override paragraph
so the non-REST-context rule joins side-effect and write as a reason
to extract a service even when delegation would otherwise be cheapest.

Verified against `wp-includes/rest-api.php` lines 619-651: the
`empty( $wp_rest_server )` guard at line 623, the instantiation at
line 635, and the `do_action( 'rest_api_init', $wp_rest_server )` at
line 647.

Refs WordPress#44
Refs Automattic/woocommerce-payments#11679
When adopting the Abilities API inside an existing plugin, the plugin's
architecture dictates how execute callbacks call into existing business
logic. Two patterns cover most real-world plugins, and picking the
wrong shape costs in tests, error-code semantics, and bootstrap
correctness.

Adds a reference covering Pattern A (shared API client, common in
plugins talking to a remote service) and Pattern B (zero-arg
controllers, common in plugins delegating primarily to WordPress core
mechanisms). Includes detection heuristics for each, minimal
skeletons for both, a hybrid-case note for plugins that legitimately
use both, an anti-pattern warning against inventing a fake API client
to fit Pattern A's helper shape, and a quick-pick table.
List-style read abilities backed by an existing REST controller repeat
the same four steps in every execute callback: validate the plugin is
initialized, build a WP_REST_Request from input, instantiate the
controller, and unwrap the response (which can be either a raw array
or a WP_REST_Response). At three or more abilities the boilerplate
dominates; copy-paste between callbacks is also where drift starts.

Adds a reference documenting the canonical delegate_to_rest_controller
helper: signature, the three guards every implementation needs
(class_exists on the controller, null-check on the shared dependency,
is_wp_error short-circuit before unwrap), the dual response-shape
unwrap, and explicit guidance on when NOT to use the helper (zero-arg
backings, non-REST services). Builds on plugin-family-patterns.md to
cover both shape variants.
Agents orchestrating multiple abilities need to know whether a failure
was caused by their input, by the plugin not being initialized, or by
a transient downstream error — the answer drives retry strategy. When
every plugin invents its own WP_Error codes, agents can't reason
uniformly across the surface they're given.

Adds a reference covering a small standardized vocabulary that handles
the 95% case: <plugin>_not_initialized for bootstrap failures,
<plugin>_missing_<field> and <plugin>_invalid_<field> for input
problems caught in the execute callback, <plugin>_<resource>_data_unavailable
for transient backend failures, and ability_invalid_input for the
schema-validator path that fires earlier than the callback. Documents
why upstream third-party codes (e.g. Stripe's resource_missing) should
bubble through verbatim rather than being re-wrapped, plus naming
rules and i18n discipline for codes vs messages.
Registering an ability with an input_schema doesn't guarantee that the
execute callback sees the schema applied at runtime. Three surprises
have shown up in real plugin work after the schema looked correct and
unit tests passed.

Adds a reference covering the three: (1) the Abilities API does NOT
inject schema 'default' values into callback input — agents that omit
optional fields receive callback input without those keys, so callbacks
must normalize defaults explicitly; (2) pagination parameter-name drift
when an ability exposes per_page but the backing controller delegates
to an internal class that reads pagesize, silently breaking pagination;
(3) PHP empty() false-rejecting "0" and 0 as IDs, which matters for
order/post/row IDs that can legitimately be zero.

Each gotcha includes detection patterns, the corrective code, and a
combined worked example showing all three guards in one callback.
Step 4 (Register abilities) gated on grouping and shared-core but did
not yet point at the family-pattern decision (shared-API-client vs
zero-arg controllers), the delegate-helper, or the standardized
WP_Error vocabulary. Failure modes also did not point at
input-schema-gotchas, so debugging "execute returns surprising errors"
required re-deriving the three known traps.

Adds two cross-links in step 4 (plugin-family-patterns + delegate-helper,
plus error-code-vocabulary) and one in Failure modes
(input-schema-gotchas).
The reference covered three input-schema gotchas — schema property
defaults not being injected, pagination key drift, and `empty()`
false-rejecting "0" — all of which assume the callback has already
been invoked with some input. It missed a structural concern one
layer up: HOW the callback gets invoked, and what runs first.

`wp_get_ability(...)->execute()` runs `normalize_input()` then
`validate_input()` (then `check_permissions()`) before invoking the
callback. Direct invocation of the static method skips all of that.
The two paths diverge in three places:

1. Validation rejects null input on the indirect path when the schema
   declares an object type — the callback never runs, the PHP-level
   `$input = null` default in the signature is dead code.
2. The schema's top-level `default` IS substituted by `normalize_input()`
   on null inputs (different layer from the property-level `default`
   that Gotcha 1 covers).
3. WordPress only forwards `$input` to the callback when the input
   schema is non-empty; without one, indirect invocation produces
   `ArgumentCountError` unless the signature has a default.

Surfaced on a pilot ability registration when the reviewer noted that
indirect callers validate input first, so a callback signature like
`function execute( $input = null )` does nothing for them — null gets
rejected during validation, never reaching the callback. The reviewer's
suggested fix (`default => []` at the schema root) works because
`normalize_input()` honors it; documenting that mechanism is what the
new gotcha section adds.

Add Gotcha 4 covering the direct/indirect split, the three concrete
divergences, the fix (top-level schema `default`, ideally
`(object) array()` for object-typed schemas to avoid PHP's array/object
ambiguity at the validator boundary), and how this relates to Gotcha 1.
Rename "Putting the three together" to "Putting gotchas 1-3 together"
so the existing example doesn't imply Gotcha 4 is included — Gotcha 4
is structurally about callback-invocation, not callback-robustness, and
doesn't fit the same worked example.

Verified against `wordpress-develop` `trunk`: `WP_Ability::execute()`
order at class-wp-ability.php lines 612-669; `normalize_input()` at
lines 444-455; `validate_input()` at 465-496 (delegates to
`rest_validate_value_from_schema()` which validates only and does not
inject property-level defaults); `invoke_callback()` at 507-526 with
the input-schema-gated arg-forwarding at line 509. PHP example block
lints clean on PHP 7.2.

Refs WordPress#45
Refs Automattic/woocommerce-payments#11679
… bypass

The "When NOT to use the helper" section listed two bypass categories
(zero-arg backings and non-REST services), but missed the case in the
middle: backing methods that DO take `WP_REST_Request` but route
straight to a request class via something like
`Things_Request::from_rest_request($request)->execute()` — where the
controller adds nothing beyond the construction step.

Surfaced as feedback on a pilot ability registration where the
reviewer suggested calling the request class directly instead of going
through the controller. The shape is structurally distinct from both
existing bypass categories: it still uses `WP_REST_Request` (so the
zero-arg subsection doesn't fit), but bypasses the controller layer
(so the non-REST-service subsection doesn't fit either).

Add a third "Backing request class can be invoked without the
controller" subsection between the introduction and "Zero-arg backing
methods", with a worked example that constructs a `WP_REST_Request`
purely to satisfy `from_rest_request()`'s signature, then calls the
request class directly. Cross-link to `shared-core-service.md` for the
adjacent first-call REST bootstrap-cost discussion. Note the explicit
tradeoff: bypassing the controller bypasses anything the controller
does (validation, side effects), so this shape is for cases where the
controller is genuinely a thin wrapper, not when it's doing work
worth keeping.

Update the rule of thumb so the three shapes (helper, request-class
direct, fully direct-path) compose into one decision tree instead of
the previous binary helper-or-direct-path framing.

PHP example block lints clean on PHP 7.2. Diff hygiene checks pass:
no filesystem paths, no private identifiers, no PHP 7.4+ syntax, no
internal URLs, no plugin-specific cap strings.

Refs WordPress#45
Refs Automattic/woocommerce-payments#11679
A standardized audit document is only useful if the structure is stable
across plugins and consumable without parsing surprises. Without a written
schema, every audit invents its own field names and downstream tools (or
human reviewers) have to map between variants.

Adds the canonical schema reference: required top-level fields (plugin,
repo, branch_audited, audited_at, auditor, baseline_abilities,
capability_gate, plus optional plugin_family), proposed_abilities array
shape (with backing, permission, return_type, effort, annotations, notes,
risks, optional reference_ability), excluded_from_mvp and surfaced_gaps
arrays, plus required prose sections (Controller Inventory, Notes and
Surprises). Documents capability_gate's two legal shapes (single string
vs structured {read, write} object), backing: null semantics for known
gaps, and the full minimal valid example with both a read and a write
ability.
The audit's capability_gate field has to reflect what the controller
actually gates on, not what its docblock says. Two mechanisms cover most
plugins, and naming them explicitly stops auditors from hard-coding one
plugin family's assumptions.

Adds a reference covering Mechanism A (direct: a base controller calls
current_user_can('<some_cap>') once and every route inherits the gate)
and Mechanism B (post-type-backed: controllers extend
WP_REST_Posts_Controller or a subclass; permission resolution is
dynamic at request time, with read and write often resolving to
distinct caps via core's map_meta_cap). Each mechanism gets identifying
signs, a tracing recipe, and a YAML representation example. Includes
guidance for the legacy compound-string capability_gate form
(accepted-but-not-preferred) and the __return_true REST-layer pitfall
(record verbatim, but never copy into the ability's own
permission_callback).
The first step of every audit is producing an exhaustive list of the
plugin's REST controllers. Real plugins use at least two layouts (the
standard includes/admin/class-*-rest-*-controller.php glob, and arbitrary
non-standard locations like includes/api/, src/Rest/, or monorepo
package directories), and missing controllers means missing abilities.

Adds a reference documenting the two enumeration paths — glob for
standard layouts, grep as the universal fallback — with explicit
guidance on when each works, how to combine them for monorepos, and
how to handle inherited routes (record on the child class with
backing.inherited_from + null line numbers). Closes with an
exhaustiveness rule (the Controller Inventory table must list every
controller found, not just MVP-backers) so reviewers can answer "why
isn't X in the MVP?" by pointing at the inventory.
A skill that produces a structured audit doc needs an eval scenario
demonstrating both routing (which skill to invoke for "I want to plan
abilities for plugin X") and the produced-doc shape (YAML block, prose
sections, structured capability_gate, surfaced_gaps, etc).

Adds eval/scenarios/wp-abilities-audit.json. Scenario covers a plugin
with a non-standard REST layout (includes/api/), a post-type-backed
base controller with distinct read/write capabilities, and a missing
backing endpoint that should surface as a gap. Success criteria
include routing to wp-abilities-audit, structured {read, write}
capability_gate, at least 3 proposed_abilities with at least one read
and one write, and a non-empty surfaced_gaps array.
When planning Abilities API adoption on a non-trivial plugin, the gap
between "we have a REST surface" and "we have a coherent set of
abilities to register" is real survey work — enumerate controllers,
trace capability gates, decide granularity, identify gaps. Without a
standardized output, every survey produces a differently-shaped
artifact and downstream readers (or tooling) have to re-derive the
shape each time.

Adds the wp-abilities-audit skill: a 7-step procedure that consumes a
plugin checkout, runs controller enumeration (references
controller-enumeration.md), traces capability gates (references
capability-gate-tracing.md), proposes abilities via semantic-intent
grouping (references wp-abilities-api/grouping-heuristic.md), and
writes a doc conforming to the canonical schema in audit-schema.md.
The doc is plugin-agnostic — any WP plugin that exposes a REST
surface is in scope.
Static-mode verification starts with a complete inventory of registered
abilities and the byte ranges of their callbacks. Without this, every
downstream check (annotation correctness, schema lints, error-code
vocabulary) is operating on a partial set or guessing where the
callback body lives.

Adds a reference covering grep-based enumeration of wp_register_ability
calls, ability-name extraction (literal strings + variable cases),
annotation-block extraction across the three observed shapes (one-liner,
multi-line, and helper-method indirection), execute_callback resolution
through array-form / string-form / closure variants, and explicit
handling of fluent builders, dynamic registration loops, and heredoc
SQL bodies. Closes with a clearly-marked Limits section so reviewers
know which cases require runtime mode for authoritative enumeration.
The adversarial core of the verify skill — what makes it a distinct
skill rather than a wrapper around "run the tests". Agents plan
actions on the basis of annotations they introspect; a readonly: true
ability that actually writes is a security and UX disaster. Unit
tests don't catch this class of bug because the mock looks just like
the real writer.

Adds a reference covering grep patterns that read the execute callback
body and compare what it does against what the annotation promises.
Three claims (readonly: true, destructive: false, idempotent: true),
each with its own pattern set. Severity table assigns FAIL or WARN
weights per pattern (sharp on the things that matter — deletion,
money-moving verbs, non-GET delegates; soft on ambiguous patterns
like transients and time-in-response). Honest blind-spot table
covering indirected service writes, hooks-as-writes,
variable-built-method delegate calls, WP-CLI-from-ability,
tautological capability gates. Inline `// verify-ignore` suppression
mechanism and a curated false-positive allow-list.
Orchestrating agents read each ability's input_schema to figure out
how to call it. A schema that's hard to parse, ambiguous, or
misleading wastes agent turns even when the ability itself works
perfectly — schema hygiene is about agent legibility.

Adds a reference covering six static lints over input_schema:
additionalProperties scoped to object-typed schemas (typo protection
without overreach into primitive-typed schemas), required-field
descriptions (failure mode: opaque required keys), non-empty enums,
no $ref (ability schemas are introspected without a definitions
document), primitive defaults only (computed defaults break
idempotent: true), and a reference_ability invariant cross-checked
against the audit. The non-object-root example shows why Lint 1
shouldn't apply to primitive-typed schemas — the Abilities API
accepts schemas of any JSON Schema type at the root.
Verify that the registered permission_callback on every ability
actually gates on a real capability — statically by source inspection,
and at runtime by exercising the gate against unauthenticated,
subscriber, and admin contexts.

Adds a reference covering six permission-callback shapes (direct
current_user_can, input-shape branch with an explicit recommendation
to split into two abilities instead, '__return_true', helper
delegation, literal true, is_user_logged_in only). Includes a
prominent Background section explaining what the callback actually
receives — never a WP_REST_Request, despite copy-paste from REST
controllers being a recurring antipattern; the input-value contract
is documented against the WP_Ability::check_permissions() and
invoke_callback() invocation chain. Runtime check uses
WP_Ability::check_permissions() (the public method that wraps the
registered callback) and treats both bool and WP_Error returns
correctly. Audit cross-check confirms the registered cap matches the
audit's declared capability_gate.
Static mode catches structural problems (annotation lies, schema lint
failures, audit mismatches). Some bug classes only surface against a
booted WordPress: missing constructor arguments, bootstrap-ordering
issues, schema-validator-path errors, capability-roundtrip failures,
idempotency violations at the response-byte level.

Adds the runtime-mode procedure: env-up via the plugin's AGENTS.md
(observed patterns include npm-run-wp-env, plain wp-env, monorepo
docker, and composer-based bootstraps), six canonical wp-cli checks
(name enumeration, annotation read-back, read execution,
write-input validation, permission gate roundtrip, idempotency
twin-invocation diff). Each check produces a row in the structured
report, with hard-fail-on-fatal and explicit interpretation of
WP_Error returns from check_permissions() (bool|WP_Error). PHP
snippets target the 7.2.24+ floor declared in the skill's
compatibility frontmatter — closures instead of arrow functions,
strpos instead of str_starts_with.
rtio added 3 commits May 6, 2026 17:35
When verify is invoked with an audit doc as input, validating the doc
itself comes first — every other check then assumes a well-formed
audit. Co-locating the validator with the verify skill keeps
"validate audit" in the same procedure as "validate registered
abilities" so a single run produces one consolidated report.

Adds a reference covering YAML extraction (one fenced block per
audit doc), per-field validation against the canonical schema owned
by wp-abilities-audit (no field-table duplication; the canonical is
the single source), and three whole-audit invariants
(reference_ability uniqueness, backing: null entries paired with
surfaced_gaps, optional excluded_from_mvp / surfaced_gaps arrays
allowed but missing entirely is a WARN). FAIL on any required-field
miss; WARN on the legacy compound-string capability_gate form.
The verify skill's most distinctive output is catching a readonly: true
ability that quietly writes — a class of bug that unit tests don't
catch because the mock looks just like the real writer. The scenario
exercises that case end-to-end.

Adds eval/scenarios/wp-abilities-verify.json. Scenario covers a
plugin with 7 registered abilities (6 reads + 1 write) where one of
the read abilities accidentally calls $wpdb->update inside its
execute callback after a refactor. Plus an audit doc that declared
the expected annotations before the refactor. Success criteria
include FAIL top-line status, exactly one FAIL row in the
annotation-correctness table with file + line evidence of the
$wpdb->update call, and OK on the other 5 readonly abilities.
A registered abilities surface that passes its unit tests can still
ship lies: an ability annotated readonly: true whose callback writes
to $wpdb, an ability whose registered permission_callback is shape-
correct but resolves to a cap subscribers hold, a schema that
declares per_page but whose backing reads pagesize. None of these
trip a typical test suite. They trip an agent in production.

Adds the wp-abilities-verify skill: a 7-step procedure for verifying
a plugin's Abilities API registrations against a shipped audit doc
or against the canonical schema independently. Two modes (static
and runtime) both produce the same structured report. The skill is
honest about its blind spots — static mode is grep-driven and won't
catch indirected service writes, hooks-as-writes, or
variable-built HTTP methods on delegate helpers; the SKILL.md
"Static-mode coverage caveats" section names the gaps explicitly so
a static-mode PASS isn't read as "verified write-free."
@rtio rtio force-pushed the feat/abilities-verify-skill branch from 5168e24 to bb1de00 Compare May 6, 2026 20:36
Copy link
Copy Markdown

@nerrad nerrad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I reviewed the current state of the wp-abilities-verify skill, I'm a bit wary that it might be too exhaustive/wordy and risks agent indirection or getting stale with the source code. Especially when smarter models come along (this has already happened with some of the attempts I've made with using this). So comments will reflect that.

Another minor thing to note is that the the repo's README's "Available Skills" table doesn't list wp-abilities-audit or wp-abilities-verify.

Comment thread skills/wp-abilities-verify/references/annotation-correctness.md Outdated
Comment thread skills/wp-abilities-verify/references/annotation-correctness.md Outdated
Comment on lines +149 to +169
### Lint 5 — default values are primitive or `null`

```php
// OK — primitive defaults.
'per_page' => [ 'type' => 'integer', 'default' => 25 ],
'submit' => [ 'type' => 'boolean', 'default' => false ],
'status' => [ 'type' => 'string', 'default' => 'pending' ],
'metadata' => [ 'type' => 'object', 'default' => null ],

// WRONG — complex expression as default.
'created_at' => [ 'type' => 'string', 'default' => gmdate( 'c' ) ],
```

**Why:** a default that evaluates per-call is both non-deterministic
(twin invocations can differ, violating `idempotent: true`) and
surprising to agents that expect defaults to be static values.

**Check:** each `'default'` value must be a scalar literal (`true`,
`false`, integer, float, quoted string, `null`) or an empty array
literal. A function call, variable reference, or computed expression →
FAIL.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This says defaults must be primitive, null, or an empty array. But input-schema-gotchas.md correctly says zero-arg object ab ilities need a top-level schema default like (object) array() because normalize_input() applies top-level defaults. See https://github.com/rtio/agent-skills/blob/bb1de00133d64698b78114c9712749ac814c027c/skills/wp-abilities-api/references/input-schema-gotchas.md#L205-L222 and WP_Ability::normalize_input(). As written, it looks like verify could end up flagging the recommended fix here as invalid.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in c5f962a.

Lint 5 now accepts (object) array(), (object) [], new stdClass(), and all-literal arrays as valid defaults. Cross-linked input-schema-gotchas.md §4 explicitly in the lint copy so the agreement between the two skills is visible to anyone reading either one. Function-call defaults like gmdate('c') or wp_generate_uuid4() still FAIL because the value varies per call — the lint is now about "evaluates to the same shape every call," which the cast and new stdClass() do.

Comment thread skills/wp-abilities-verify/references/static-enumeration.md
Comment thread skills/wp-abilities-verify/references/annotation-correctness.md Outdated
rtio added 5 commits May 8, 2026 15:58
…default

The Lint 5 rule previously required default values to be scalar
literals, `null`, or empty array literals — anything else, including a
type-cast like `(object) array()`, was a FAIL.

This contradicted the recommended fix in `wp-abilities-api`'s
`input-schema-gotchas.md` §4, which specifically calls for
`'default' => (object) array()` at the top level of an `object`-typed
schema so `WP_Ability::normalize_input()` can substitute a sane default
when the indirect path is invoked with `null`. Without that top-level
default, zero-arg-allowed abilities fail at the validator boundary.

Verify would have flagged the recommended fix as a lint violation,
sending implementers in circles.

Broaden Lint 5 to accept type-cast empty object literals
(`(object) array()`, `(object) []`), `new stdClass()`, and all-literal
arrays — anything that evaluates to the same shape on every call.
Function-call defaults like `gmdate('c')` or `wp_generate_uuid4()`
remain a FAIL because they vary per call. Cross-link the gotchas
reference so the agreement between the two skills is explicit.

Refs WordPress#47 (comment)
The skill described `idempotent: true` as "same input always produces
same effect" with a hard FAIL rule that twin invocations must be
byte-identical. Two problems with that framing:

1. Core's docblock at `class-wp-ability.php` lines 47-48 defines
   idempotent as "calling the ability repeatedly with the same arguments
   will have no additional effect on its environment." That's about
   *environmental writes*, not return-value determinism. A read that
   embeds `time()` in its response is idempotent under core's reading;
   the skill would have FAILed it.
2. The Abilities REST run controller operationalizes annotations into
   HTTP method routing (`readonly: true` -> GET, `destructive &&
   idempotent` -> DELETE, otherwise POST — see lines 110-116 of
   `class-wp-rest-abilities-v1-run-controller.php`). That's the
   load-bearing semantic, not response-shape equality.

Reframe the three-claims table, the static idempotent-but-non-idempotent
check, the runtime twin-invocation check, and the SKILL.md summary
around environmental side effects. Twin-invocation byte-equality stays
as a heuristic (cheap pass when hashes match; signal worth inspecting
when they don't), not as the definition.

Drop the static "non-determinism" greps that flagged `rand()`,
`wp_create_nonce()`, and `time()` because they don't violate core's
idempotent — they only vary the return value while leaving the
environment unchanged. The new static checks target patterns that
almost always reflect real environmental accumulation: counter writes,
per-call cron schedules, append-only inserts, and sequence increments.

Refs WordPress#47 (comment)
…->delete)

The mutator-method regexes in Step 2 (readonly-but-writes) and Step 3
(destructive-but-says-false) required a trailing `_` after the verb,
e.g. `->save_record(`. Direct mutator calls without a suffix —
`->save()`, `->delete()`, `->add_order_note()`, `->set_status()` — fell
through silently because the regex demanded `verb_` and they end in
`verb(`.

Fix: change the trailing token from `_` to `(_|\()` so both verb-prefixed
methods and direct calls match. Add `add` and `set` to the verb list
(common additive writes the original list omitted) along with `trash`.

Also note explicitly that the verb list is a representative starting
point, not exhaustive — plugin vocabularies vary (camelCase verbs,
domain-specific writes like `->markAsPaid()` or `->commit()`) and the
agent should extend it for the plugin under verification.

Refs WordPress#47 (comment)
…ls in static enumeration

Step 2's name-extraction regex assumed `wp_register_ability(` and the
first quoted argument appear on the same line. The canonical
formatting shown at the top of the same file is multi-line, so the
regex would miss the file's own example — and any plugin that follows
the multi-line convention.

Recommend ripgrep's `--multiline --pcre2` mode as the primary path,
keep the single-line grep as a fast inline-form check, and document
`pcregrep -M` / `perl -0777` as fallbacks when only POSIX grep is
available. When the regex still misses a registration (heredoc names,
unusual formatting), fall back to inspecting the call-site window by
hand.

Refs WordPress#47 (comment)
The Available Skills table didn't list the audit or verify skills, so
readers landing on the README couldn't discover them even though they
ship in this repo. Add concise rows next to wp-abilities-api so the
abilities cluster appears together.

Refs WordPress#47 (review)
@rtio
Copy link
Copy Markdown
Author

rtio commented May 8, 2026

Hey Darren — pushed five focused fixes addressing the specific issues; replied inline on each thread with detail.

Commit Thread What changed
35f60d8 idempotent definition Match core's "no additional effect on its environment" wording (class-wp-ability.php:47-48); cross-ref the run controller's HTTP routing as the load-bearing semantic; static check no longer flags rand() / wp_create_nonce() / time(); runtime twin-invocation reframed as heuristic, not verdict
16eb6a1 mutator regex Pattern is now `verb(_
ee85b28 multi-line registration rg --multiline --pcre2 as primary, with pcregrep -M / perl -0777 fallbacks; the canonical multi-line example now matches the regex used to enumerate it
c5f962a Lint 5 vs (object) array() Lint accepts (object) array(), (object) [], new stdClass(), all-literal arrays; cross-linked input-schema-gotchas.md §4 so the two skills no longer quietly disagree
d8ea032 README Added wp-abilities-audit and wp-abilities-verify to the Available Skills table

On the broader thread that runs through the whole review — that the references are over-prescriptive and risk locking behavior to current model capability — I think you're right. The right move is to reshape from "exhaustive grep playbook" to "principles + a few examples + agent-derived patterns." That's a substantive change and I'd rather scope it as a separate pass once these bug fixes are in. Happy to open a follow-up issue if you'd like to track it explicitly.

rtio added 2 commits May 8, 2026 18:17
Per Darren's 2026-05-08 follow-up: the references were packing too much
into the initial iteration and risk model indirection (he has signal
from Woo Core usage of similar prescriptions). Lighter v0; iterate as
real-world usage tells us where to add specificity.

Three references go from prescriptive grep playbooks to principle +
illustrative examples + "agent operationalizes for the plugin's
vocabulary." The runnable harnesses (runtime-harness.md,
permission-roundtrip.md, audit-schema-validation.md) stay closer to
as-is in this commit because they're concrete runbooks, not pattern
catalogues.

annotation-correctness.md: 333 -> 145 lines. Drops the exhaustive grep
playbook (~50 lines of write-pattern recipes), the severity table, the
hardcoded false-positive allow-list, and the "why grep not parser"
section. Keeps the "Why this matters" framing, the three-claims table
with core's idempotent wording, the HTTP method routing reference, the
known blind spots, the //verify-ignore suppression mechanism, the
runtime heuristic complement, and the report format. Common write
patterns are listed as a starting set, not a checklist.

schema-lints.md: 250 -> 100 lines. Each of the six lints is now a
short principle (5-15 lines) instead of a grep recipe + result table.
Lint 5 keeps the corrected `(object) array()` acceptance from c5f962a
and the cross-link to input-schema-gotchas.md.

static-enumeration.md: 280 -> 90 lines. Steps 1-4 stay; the 3.x
annotation patterns, fluent-builder details, helper-method
drilldowns, and heredoc-callback subsection drop. The multi-line
`wp_register_ability` extraction from ee85b28 is preserved as a
single ripgrep example with one-line fallback notes.

The eval scenario (eval/scenarios/wp-abilities-verify.json) survives
the slim — its success criteria don't depend on specific patterns
being present in the references, just on the agent finding
\$wpdb->update() in a readonly callback. The new annotation-correctness
explicitly lists \$wpdb->update as a candidate write pattern; the agent
will catch it.

Refs WordPress#47 (review)
… too

Continuing the lighter-v0 reshape from dad0196 across the remaining
files that carried prescription weight:

permission-roundtrip.md: 301 -> 164 lines. Drops the static-grep
patterns block (the most prescriptive part — exactly what Darren
flagged for Woo Core indirection). Compresses the Shape A-F catalogue
from a per-shape code-block-plus-narrative into a single classification
table. Keeps the background section (the WP_REST_Request antipattern
warning is load-bearing — it ships in real plugin copies from REST
controllers), the runnable wp eval block, the audit cross-check
narrative, and the static-only mode caveat.

SKILL.md: 220 -> 212 lines, but the structure is leaner. Drops the
"Static-mode coverage caveats" section that duplicated content now
living more concisely in the slimmed annotation-correctness.md. Step
4's prose summary tightens to mirror the slimmed reference. The
description in frontmatter clarifies that the centerpiece is "callback
behavior matches each annotation's claim," dropping language that
implied an exhaustive grep playbook.

Total skill footprint: 1,835 (round-1) -> ~1,900 (round-2 with
clarifications) -> 1,249 (this push). Net cut roughly one-third.

The runnable harnesses (runtime-harness.md, audit-schema-validation.md)
are preserved at their current size — they're concrete runbooks with
executable wp eval blocks, not pattern catalogues, and they pull their
weight at the line count they sit at.

Eval scenario still passes (`node eval/harness/run.mjs`).

Refs WordPress#47 (review)
@rtio
Copy link
Copy Markdown
Author

rtio commented May 8, 2026

Hey @nerrad — agreed and pivoted. The round-2 bug fixes landed first (idempotent definition, Lint 5 vs (object) array(), README); for the substantive part of your review I cut the prescriptive grep playbooks across the references so the skill reads as principles + a few illustrative examples + "agent operationalizes for the plugin's vocabulary."

Round-3 commits (dad0196..f0070f6):

Commit Effect
dad0196 Slim annotation-correctness.md (333 → 145), schema-lints.md (250 → 111), static-enumeration.md (280 → 126). Drops the exhaustive write-pattern playbook, the severity table, the hardcoded false-positive allow-list, and the embedded grep recipes for each schema lint. Keeps the Why framing, the three-claims table grounded in core's docblock, the HTTP method routing reference, the known blind spots, the // verify-ignore suppression, the runtime heuristic, and the report formats.
f0070f6 Slim permission-roundtrip.md (301 → 164) and tighten SKILL.md. The static grep block in permission-roundtrip — the most prescriptive part — is gone; the Shape A-F catalogue compresses into a single classification table. Runnable wp eval block stays as-is.

Skill footprint went from ~1,900 lines (after round-2) to 1,249 in this push — roughly a one-third cut. The runnable harnesses (runtime-harness.md, the wp eval block in permission-roundtrip.md, audit-schema-validation.md) are preserved because they're concrete runbooks rather than pattern catalogues.

Eval scenario still passes (the agent finds $wpdb->update in a readonly: true callback either way; the slim doesn't break that).

PR body rewritten to reflect the lighter framing and the iteration history. Open to further trimming if you see specific sections that still pack too much; my read is the runnable harnesses pull their weight at their current size, but happy to revisit.

@rtio rtio requested a review from nerrad May 8, 2026 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants