wp-abilities-verify: add new skill for verifying Abilities API registrations including adversarial readonly-but-writes detection#47
Conversation
Agents that conflate ability registration with consumer exposure end up
re-registering abilities every time the projection question reopens
(token budgets, MCP shape, Command Palette conventions). Treating
those as one decision couples a stable layer (capabilities) to a
volatile one (consumer constraints).
This reference captures a three-layer model: domain capability,
optional workflow, and projection. Adds the use-case-contract test
("would a human ever do this in admin?"), the "same code path as the
UI" rule, and a decision order (domain first, projection second,
workflow third). Worked example walks a generic Notifications plugin
through three projections off the same three domain registrations.
Abilities that re-implement business logic instead of consuming the same code path as the UI / REST surface drift apart over time: permissions diverge, validation rules drop, side-effects fire on UI-driven writes but not ability-driven ones. None of those breaks immediately; the divergence becomes discoverable only when an agent invokes the ability in production. This reference captures three execute-callback shapes (re-implement / delegate / shared service), the metric-trap argument for preferring the third shape on telemetry-emitting endpoints, the MCP exposure rule (don't expose REST + ability for the same op to the same client), and the AGENTS.md sync rule for keeping registrations in lockstep when the underlying code path changes. Worked example walks a generic plugin through extracting a service class that ability + REST + UI all consume.
Plugin authors landing abilities on top of an existing REST surface have to choose domain-layer granularity. The three observed shapes (action-bundle behind a single ability, REST-atomization at one ability per HTTP method, semantic-intent at one ability per user-meaningful question) each solve a different problem, and the naive default — atomize because the REST surface already does — leaks HTTP plumbing into the agent's tool list. This reference recommends semantic-intent at the domain layer: abilities map to user-meaningful actions, filter parameters live in input_schema, writes stay narrow at one state transition per ability. Five rules cover read grouping, write narrowness, filter-vs-name choice, zero-arg overview abilities, and the one-sentence explainability test. Two worked examples (Jetpack Forms and a generic Tickets plugin) show the heuristic in action without committing the reader to a specific projection-layer pattern.
The procedure section previously jumped from "Confirm version constraints" to "Find existing usage" without surfacing the design-time decisions that should precede registration: which layer to register at (domain vs projection) and how to choose granularity. Agents following the procedure verbatim reach for syntax before reasoning about shape. Adds a layer-model paragraph at the top of the procedure pointing at domain-vs-projection, plus two cross-links in step 4 — to grouping-heuristic for granularity decisions and shared-core-service for the drift-prevention pattern (including the AGENTS.md sync rule).
Five points where references/php-registration.md drifted from wordpress-develop/src/wp-includes/abilities-api/class-wp-ability.php: 1. permission_callback was marked optional. Per prepare_properties() at class-wp-ability.php:287-292, it throws InvalidArgumentException when missing — required on every registration with no implicit default. 2. The skill placed `readonly` at meta.readonly. Core puts the three semantic annotations at meta.annotations.* (lines 38-51 + validation at 313-317). Updated the table to show meta.annotations.readonly, .destructive, .idempotent at the canonical path. 3. Two of the three annotations (destructive, idempotent) were missing entirely. All three are first-class core annotations seeded via $default_annotations at lines 326-336. 4. Required-vs-optional was not flagged on label, description, category, execute_callback, or permission_callback — all five throw on missing per lines 262-292. Added a "Required?" column. 5. The recommended-patterns line "Always include a permission_callback for abilities that modify data" framed it as a write-only practice. Core requires it for all registrations. Reworded. Adjacent fix while in the file: the namespacing example said `my-plugin:feature.edit` (colon-and-dot) but every other example in this skill uses slash-separated `my-plugin/get-info`. Aligned. Updated the example registration block to register a permission_callback and all three meta.annotations keys, so the example matches the contract the table now describes. Refs WordPress#44 review-id 4207902432
The inclusion test in references/domain-vs-projection.md framed the ability question as "would a human ever do this in admin?". That lens encodes a Woo/Jetpack-implementer's bias and excludes a real and common case: abilities scoped to public authed/unauthed users on the site front end (storefront `get-my-orders`, events `list-available-tickets`, profile `update-public-profile`), and abilities invoked as part of a workflow chain by another plugin or an agent rather than by direct human input at all. Reframed the test as "would a human intentionally do this through a supported UI or workflow?". That criterion: - Keeps admin-side abilities qualifying (the original case). - Adds public-facing UIs and end-user self-service flows. - Covers workflow / chain-of-actions invocation paths where the human intent is upstream of the ability call. - Preserves the exclusion of internal-only plumbing (cache invalidation, scheduler ticks, debug snapshots, lifecycle bookkeeping). Also dropped the asymmetric-test sentence that read "anything that isn't admin-meaningful almost certainly isn't ability-meaningful" — the same point survives in the rewritten version without the admin-only framing. Refs WordPress#44 review-id 4207902432
Same admin-only blind spot as the inclusion test in domain-vs-projection.md (fixed in the previous commit), surfacing in the opening sentence of references/shared-core-service.md. The lockstep rule applies wherever the ability mirrors a supported human workflow — admin screen, public-facing UI, customer self-service flow, or a plugin/agent-driven workflow — not only admin actions. Refs WordPress#44 review-id 4207902432
…eric responses scenario Worked example A in references/grouping-heuristic.md was anchored to a specific Automattic plugin's source layout, including a hardcoded link to a file path under Automattic/jetpack/projects/packages/forms/. Two problems: 1. WordPress/agent-skills is a generic, plugin-agnostic skill repo per CONTRIBUTING.md. Worked examples should not lean on one canonical plugin family. 2. File-path links into a fast-moving codebase rot — a rename or restructure makes the reference broken or worse, misleading. Worked example B was already generic (Tickets plugin); aligning A to the same approach. The semantic-intent table row now uses a generic `feedback/get-responses` example. The worked example A reframes around a generic feedback or form-response plugin with the same structural lessons (filters in `input_schema`, one write for "modify a response", a zero-arg dashboard-summary ability). The hardcoded canonical-file link is removed entirely. Refs WordPress#44 review-id 4207902432
…ern, not a real callable
The shared-core-service reference repeatedly wrote
`delegate_to_rest_controller(...)` in code-shaped backticks (table row
example, body prose at four locations, a rule-of-thumb bullet). Read in
isolation, that string looks like a callable a plugin author should be
able to find or import. It is not — at least not in PR-44's scope. The
canonical helper of that name lives in another reference that is the
subject of a separate forthcoming PR; PR-44 cannot point at it without
phantom-dependency.
Two changes:
1. Replaced every `delegate_to_rest_controller(...)` occurrence with
prose ("the delegation pattern", "the delegate-to-REST-controller
route") so the text describes a pattern rather than naming a
non-existent function.
2. Added a "What the delegation pattern looks like" subsection between
the three-shapes table and the metric-trap discussion, with a small
inline example showing the actual mechanism: construct a
WP_REST_Request, set params, dispatch via rest_do_request(), handle
the error case, return data. Three core symbols cited
(WP_REST_Request, rest_do_request, WP_Error) verified against
wordpress-develop trunk; method calls (set_query_params, is_error,
as_error, get_data) similarly verified.
The example is intentionally minimal and self-contained. It does not
forward-reference any helper that lives outside this PR. A `php -l`
smoke-test on the extracted block on PHP 7.2 returns no syntax errors.
Refs WordPress#44 review-id 4207902432
…fects
The shared-core-service reference centred its argument for service
extraction on a "metric trap" — telemetry double-counting under agent
traffic. A reviewer pointed out that telemetry is one of several side
effects a REST handler may emit, and that the doc was elevating it
disproportionately. Plugin authors may track usage at any layer, and
the structural lesson — "do not let an ability trigger UI-scoped side
effects on agent invocations" — applies to audit logs, custom-event
hooks, notifications, cache invalidation, and lock acquisition just as
much as to telemetry.
Restructured the section accordingly:
- Renamed the H2 from "The metric trap" to "Side effects on the
existing REST path".
- Replaced the telemetry-only intro paragraph with a five-bullet list
of side-effect categories: usage telemetry, audit logs, custom-event
hooks, email/notification dispatch, cache/schedule/lock side
effects. Telemetry remains as the lead bullet because metric
double-counting is the only failure mode in this set that is
invisible by default — the system "works" while dashboards drift —
which is worth one sentence even after demotion.
- Generalised the "fix" bullets ("emit telemetry" → "emit its side
effects").
- Replaced the `my_plugin_track_event(...)` calls in both worked-example
PHP blocks with `do_action( 'my_plugin/things_listed', $rows )`, a
generic WP-idiomatic side-effect that does not bake telemetry into
the example. Comments updated to "Side effects (audit log, hooks,
notifications) live on the REST adapter."
- Generalised the rule-of-thumb bullets and the closing
override-paragraph from telemetry-specific to side-effect-general
framing.
- Updated the two stale pointers ("the metric trap that makes that
distinction matter", 'read "the metric trap" below first') to point
at the renamed section.
Net effect: telemetry is mentioned three times in the file (one bullet
in the side-effects list with the silent-failure callout, one
parenthetical in the worked-example closing, one parenthetical in the
rule-of-thumb), down from six. The original concern Darren raised —
telemetry as the centerpiece — is addressed; the genuinely-specific
risk it represents is preserved as one bullet of context.
php -l on both worked-example blocks returns no syntax errors on
PHP 7.2.
Refs WordPress#44 review-id 4207902432
The skill's php-registration reference covers `meta.show_in_rest` and the three `meta.annotations` keys, but does not cover `meta.mcp.public` or `meta.mcp.type`. The bundled WordPress MCP adapter (the package WP plugins pull in for MCP exposure) reads those two keys to decide which abilities to surface and how to project them. An ability registered with `show_in_rest => true` but no `meta.mcp` block is technically valid yet invisible to MCP clients connecting through that adapter — which is the default discovery path for agents. Surfaced as review feedback on a WooPayments pilot PR: a registration that followed the skill exactly was discoverable on the abilities REST namespace but missing from the MCP tool list. The reviewer's note was "we should default all our registered abilities to use this unless we know for sure it shouldn't be exposed." Add both keys to the registration arguments table with their defaults and constraints (mcp.public is bool default false; mcp.type is one of 'tool'/'resource'/'prompt' default 'tool', with silent coercion to 'tool' for any value outside the enum). Add a section that distinguishes the abilities-REST namespace contract (`show_in_rest` → `wp-abilities/v1`) from the MCP adapter projection (`mcp.public` → adapter's default MCP server) so plugin authors don't conflate them. Update the canonical example so the recommended starting shape opts the ability into MCP discovery. Verified against the MCP adapter package source: `mcp.public` read at McpAbilityHelperTrait.php lines 37 and 61; `mcp.type` read at line 75 with the enum guard at line 78. PHP example block re-validated with `php -l`. Refs WordPress#44 Refs Automattic/woocommerce-payments#11679
…nded The reference's table previously listed all three `meta.annotations` keys (`readonly`, `destructive`, `idempotent`) as "Optional (default null)". The post-table paragraph said plugins were "expected to populate them honestly" but did not explain why a `null` annotation is materially worse than a populated one. Observed downstream consequence: an agent following the skill registered an ability and shipped it without any annotations. The reviewer flagged the omission. The skill teaches the keys exist and that they are hints for tooling, but the cell-level "Optional" framing licensed the agent's shortcut. The structural fix is to escalate the tone. Reword the three table cells from "Optional" to "Strongly recommended" while keeping the technically accurate "(default null)" footnote so readers don't misread this as a core-enforced requirement. Expand the post-table paragraph to spell out what `readonly: null` actually broadcasts — "behavior unknown" — and why that is a worse signal than either `true` or `false`. Strengthen the recommended-patterns bullet along the same axis: explicit cost (three lines) vs. cost of omission (opaque safety surface). Verified core behavior at class-wp-ability.php lines 38-51 (the `$default_annotations` array seeds all three keys to `null`) — the table's "default null" claim still holds against trunk. Refs WordPress#44 Refs Automattic/woocommerce-payments#11679
…effect The "Side effects on the existing REST path" section enumerated five semantic side effects of routing an ability through `rest_do_request()` (telemetry, audit logs, custom-event hooks, notifications, cache/lock side effects) but missed a category that is mechanically different but just as important: performance. `rest_do_request()` calls `rest_get_server()`, which is a lazy singleton that fires `rest_api_init` the first time it's instantiated in a request lifecycle. Inside an HTTP REST request that cost is paid before the abilities layer ever runs. Outside one — CLI invocations, cron jobs, agent runs hitting an ability through a non-REST MCP transport — the *first* delegating ability call pays it. Every plugin's `register_rest_route()` callback wires up at that point, which on a cold path is measurably slow. Surfaced as review feedback on a WooPayments pilot PR: the reviewer flagged the delegation pattern with the question "what about using List_Transactions::from_rest_request directly?" — the underlying concern being that the ability is invoked from CLI, agents, and MCP contexts where the REST bootstrap cost lands on the user. Add a sixth bullet to the side-effects list, framed as "performance, not semantics" so readers don't blur it together with the existing five. Cite the lazy-instantiation guard explicitly so the cost-once- per-request behavior is clear and readers don't over-correct toward "never delegate." Add a corresponding rule-of-thumb row for "read predominantly invoked outside REST" and update the override paragraph so the non-REST-context rule joins side-effect and write as a reason to extract a service even when delegation would otherwise be cheapest. Verified against `wp-includes/rest-api.php` lines 619-651: the `empty( $wp_rest_server )` guard at line 623, the instantiation at line 635, and the `do_action( 'rest_api_init', $wp_rest_server )` at line 647. Refs WordPress#44 Refs Automattic/woocommerce-payments#11679
When adopting the Abilities API inside an existing plugin, the plugin's architecture dictates how execute callbacks call into existing business logic. Two patterns cover most real-world plugins, and picking the wrong shape costs in tests, error-code semantics, and bootstrap correctness. Adds a reference covering Pattern A (shared API client, common in plugins talking to a remote service) and Pattern B (zero-arg controllers, common in plugins delegating primarily to WordPress core mechanisms). Includes detection heuristics for each, minimal skeletons for both, a hybrid-case note for plugins that legitimately use both, an anti-pattern warning against inventing a fake API client to fit Pattern A's helper shape, and a quick-pick table.
List-style read abilities backed by an existing REST controller repeat the same four steps in every execute callback: validate the plugin is initialized, build a WP_REST_Request from input, instantiate the controller, and unwrap the response (which can be either a raw array or a WP_REST_Response). At three or more abilities the boilerplate dominates; copy-paste between callbacks is also where drift starts. Adds a reference documenting the canonical delegate_to_rest_controller helper: signature, the three guards every implementation needs (class_exists on the controller, null-check on the shared dependency, is_wp_error short-circuit before unwrap), the dual response-shape unwrap, and explicit guidance on when NOT to use the helper (zero-arg backings, non-REST services). Builds on plugin-family-patterns.md to cover both shape variants.
Agents orchestrating multiple abilities need to know whether a failure was caused by their input, by the plugin not being initialized, or by a transient downstream error — the answer drives retry strategy. When every plugin invents its own WP_Error codes, agents can't reason uniformly across the surface they're given. Adds a reference covering a small standardized vocabulary that handles the 95% case: <plugin>_not_initialized for bootstrap failures, <plugin>_missing_<field> and <plugin>_invalid_<field> for input problems caught in the execute callback, <plugin>_<resource>_data_unavailable for transient backend failures, and ability_invalid_input for the schema-validator path that fires earlier than the callback. Documents why upstream third-party codes (e.g. Stripe's resource_missing) should bubble through verbatim rather than being re-wrapped, plus naming rules and i18n discipline for codes vs messages.
Registering an ability with an input_schema doesn't guarantee that the execute callback sees the schema applied at runtime. Three surprises have shown up in real plugin work after the schema looked correct and unit tests passed. Adds a reference covering the three: (1) the Abilities API does NOT inject schema 'default' values into callback input — agents that omit optional fields receive callback input without those keys, so callbacks must normalize defaults explicitly; (2) pagination parameter-name drift when an ability exposes per_page but the backing controller delegates to an internal class that reads pagesize, silently breaking pagination; (3) PHP empty() false-rejecting "0" and 0 as IDs, which matters for order/post/row IDs that can legitimately be zero. Each gotcha includes detection patterns, the corrective code, and a combined worked example showing all three guards in one callback.
Step 4 (Register abilities) gated on grouping and shared-core but did not yet point at the family-pattern decision (shared-API-client vs zero-arg controllers), the delegate-helper, or the standardized WP_Error vocabulary. Failure modes also did not point at input-schema-gotchas, so debugging "execute returns surprising errors" required re-deriving the three known traps. Adds two cross-links in step 4 (plugin-family-patterns + delegate-helper, plus error-code-vocabulary) and one in Failure modes (input-schema-gotchas).
The reference covered three input-schema gotchas — schema property defaults not being injected, pagination key drift, and `empty()` false-rejecting "0" — all of which assume the callback has already been invoked with some input. It missed a structural concern one layer up: HOW the callback gets invoked, and what runs first. `wp_get_ability(...)->execute()` runs `normalize_input()` then `validate_input()` (then `check_permissions()`) before invoking the callback. Direct invocation of the static method skips all of that. The two paths diverge in three places: 1. Validation rejects null input on the indirect path when the schema declares an object type — the callback never runs, the PHP-level `$input = null` default in the signature is dead code. 2. The schema's top-level `default` IS substituted by `normalize_input()` on null inputs (different layer from the property-level `default` that Gotcha 1 covers). 3. WordPress only forwards `$input` to the callback when the input schema is non-empty; without one, indirect invocation produces `ArgumentCountError` unless the signature has a default. Surfaced on a pilot ability registration when the reviewer noted that indirect callers validate input first, so a callback signature like `function execute( $input = null )` does nothing for them — null gets rejected during validation, never reaching the callback. The reviewer's suggested fix (`default => []` at the schema root) works because `normalize_input()` honors it; documenting that mechanism is what the new gotcha section adds. Add Gotcha 4 covering the direct/indirect split, the three concrete divergences, the fix (top-level schema `default`, ideally `(object) array()` for object-typed schemas to avoid PHP's array/object ambiguity at the validator boundary), and how this relates to Gotcha 1. Rename "Putting the three together" to "Putting gotchas 1-3 together" so the existing example doesn't imply Gotcha 4 is included — Gotcha 4 is structurally about callback-invocation, not callback-robustness, and doesn't fit the same worked example. Verified against `wordpress-develop` `trunk`: `WP_Ability::execute()` order at class-wp-ability.php lines 612-669; `normalize_input()` at lines 444-455; `validate_input()` at 465-496 (delegates to `rest_validate_value_from_schema()` which validates only and does not inject property-level defaults); `invoke_callback()` at 507-526 with the input-schema-gated arg-forwarding at line 509. PHP example block lints clean on PHP 7.2. Refs WordPress#45 Refs Automattic/woocommerce-payments#11679
… bypass The "When NOT to use the helper" section listed two bypass categories (zero-arg backings and non-REST services), but missed the case in the middle: backing methods that DO take `WP_REST_Request` but route straight to a request class via something like `Things_Request::from_rest_request($request)->execute()` — where the controller adds nothing beyond the construction step. Surfaced as feedback on a pilot ability registration where the reviewer suggested calling the request class directly instead of going through the controller. The shape is structurally distinct from both existing bypass categories: it still uses `WP_REST_Request` (so the zero-arg subsection doesn't fit), but bypasses the controller layer (so the non-REST-service subsection doesn't fit either). Add a third "Backing request class can be invoked without the controller" subsection between the introduction and "Zero-arg backing methods", with a worked example that constructs a `WP_REST_Request` purely to satisfy `from_rest_request()`'s signature, then calls the request class directly. Cross-link to `shared-core-service.md` for the adjacent first-call REST bootstrap-cost discussion. Note the explicit tradeoff: bypassing the controller bypasses anything the controller does (validation, side effects), so this shape is for cases where the controller is genuinely a thin wrapper, not when it's doing work worth keeping. Update the rule of thumb so the three shapes (helper, request-class direct, fully direct-path) compose into one decision tree instead of the previous binary helper-or-direct-path framing. PHP example block lints clean on PHP 7.2. Diff hygiene checks pass: no filesystem paths, no private identifiers, no PHP 7.4+ syntax, no internal URLs, no plugin-specific cap strings. Refs WordPress#45 Refs Automattic/woocommerce-payments#11679
A standardized audit document is only useful if the structure is stable
across plugins and consumable without parsing surprises. Without a written
schema, every audit invents its own field names and downstream tools (or
human reviewers) have to map between variants.
Adds the canonical schema reference: required top-level fields (plugin,
repo, branch_audited, audited_at, auditor, baseline_abilities,
capability_gate, plus optional plugin_family), proposed_abilities array
shape (with backing, permission, return_type, effort, annotations, notes,
risks, optional reference_ability), excluded_from_mvp and surfaced_gaps
arrays, plus required prose sections (Controller Inventory, Notes and
Surprises). Documents capability_gate's two legal shapes (single string
vs structured {read, write} object), backing: null semantics for known
gaps, and the full minimal valid example with both a read and a write
ability.
The audit's capability_gate field has to reflect what the controller
actually gates on, not what its docblock says. Two mechanisms cover most
plugins, and naming them explicitly stops auditors from hard-coding one
plugin family's assumptions.
Adds a reference covering Mechanism A (direct: a base controller calls
current_user_can('<some_cap>') once and every route inherits the gate)
and Mechanism B (post-type-backed: controllers extend
WP_REST_Posts_Controller or a subclass; permission resolution is
dynamic at request time, with read and write often resolving to
distinct caps via core's map_meta_cap). Each mechanism gets identifying
signs, a tracing recipe, and a YAML representation example. Includes
guidance for the legacy compound-string capability_gate form
(accepted-but-not-preferred) and the __return_true REST-layer pitfall
(record verbatim, but never copy into the ability's own
permission_callback).
The first step of every audit is producing an exhaustive list of the plugin's REST controllers. Real plugins use at least two layouts (the standard includes/admin/class-*-rest-*-controller.php glob, and arbitrary non-standard locations like includes/api/, src/Rest/, or monorepo package directories), and missing controllers means missing abilities. Adds a reference documenting the two enumeration paths — glob for standard layouts, grep as the universal fallback — with explicit guidance on when each works, how to combine them for monorepos, and how to handle inherited routes (record on the child class with backing.inherited_from + null line numbers). Closes with an exhaustiveness rule (the Controller Inventory table must list every controller found, not just MVP-backers) so reviewers can answer "why isn't X in the MVP?" by pointing at the inventory.
A skill that produces a structured audit doc needs an eval scenario
demonstrating both routing (which skill to invoke for "I want to plan
abilities for plugin X") and the produced-doc shape (YAML block, prose
sections, structured capability_gate, surfaced_gaps, etc).
Adds eval/scenarios/wp-abilities-audit.json. Scenario covers a plugin
with a non-standard REST layout (includes/api/), a post-type-backed
base controller with distinct read/write capabilities, and a missing
backing endpoint that should surface as a gap. Success criteria
include routing to wp-abilities-audit, structured {read, write}
capability_gate, at least 3 proposed_abilities with at least one read
and one write, and a non-empty surfaced_gaps array.
When planning Abilities API adoption on a non-trivial plugin, the gap between "we have a REST surface" and "we have a coherent set of abilities to register" is real survey work — enumerate controllers, trace capability gates, decide granularity, identify gaps. Without a standardized output, every survey produces a differently-shaped artifact and downstream readers (or tooling) have to re-derive the shape each time. Adds the wp-abilities-audit skill: a 7-step procedure that consumes a plugin checkout, runs controller enumeration (references controller-enumeration.md), traces capability gates (references capability-gate-tracing.md), proposes abilities via semantic-intent grouping (references wp-abilities-api/grouping-heuristic.md), and writes a doc conforming to the canonical schema in audit-schema.md. The doc is plugin-agnostic — any WP plugin that exposes a REST surface is in scope.
Static-mode verification starts with a complete inventory of registered abilities and the byte ranges of their callbacks. Without this, every downstream check (annotation correctness, schema lints, error-code vocabulary) is operating on a partial set or guessing where the callback body lives. Adds a reference covering grep-based enumeration of wp_register_ability calls, ability-name extraction (literal strings + variable cases), annotation-block extraction across the three observed shapes (one-liner, multi-line, and helper-method indirection), execute_callback resolution through array-form / string-form / closure variants, and explicit handling of fluent builders, dynamic registration loops, and heredoc SQL bodies. Closes with a clearly-marked Limits section so reviewers know which cases require runtime mode for authoritative enumeration.
The adversarial core of the verify skill — what makes it a distinct skill rather than a wrapper around "run the tests". Agents plan actions on the basis of annotations they introspect; a readonly: true ability that actually writes is a security and UX disaster. Unit tests don't catch this class of bug because the mock looks just like the real writer. Adds a reference covering grep patterns that read the execute callback body and compare what it does against what the annotation promises. Three claims (readonly: true, destructive: false, idempotent: true), each with its own pattern set. Severity table assigns FAIL or WARN weights per pattern (sharp on the things that matter — deletion, money-moving verbs, non-GET delegates; soft on ambiguous patterns like transients and time-in-response). Honest blind-spot table covering indirected service writes, hooks-as-writes, variable-built-method delegate calls, WP-CLI-from-ability, tautological capability gates. Inline `// verify-ignore` suppression mechanism and a curated false-positive allow-list.
Orchestrating agents read each ability's input_schema to figure out how to call it. A schema that's hard to parse, ambiguous, or misleading wastes agent turns even when the ability itself works perfectly — schema hygiene is about agent legibility. Adds a reference covering six static lints over input_schema: additionalProperties scoped to object-typed schemas (typo protection without overreach into primitive-typed schemas), required-field descriptions (failure mode: opaque required keys), non-empty enums, no $ref (ability schemas are introspected without a definitions document), primitive defaults only (computed defaults break idempotent: true), and a reference_ability invariant cross-checked against the audit. The non-object-root example shows why Lint 1 shouldn't apply to primitive-typed schemas — the Abilities API accepts schemas of any JSON Schema type at the root.
Verify that the registered permission_callback on every ability actually gates on a real capability — statically by source inspection, and at runtime by exercising the gate against unauthenticated, subscriber, and admin contexts. Adds a reference covering six permission-callback shapes (direct current_user_can, input-shape branch with an explicit recommendation to split into two abilities instead, '__return_true', helper delegation, literal true, is_user_logged_in only). Includes a prominent Background section explaining what the callback actually receives — never a WP_REST_Request, despite copy-paste from REST controllers being a recurring antipattern; the input-value contract is documented against the WP_Ability::check_permissions() and invoke_callback() invocation chain. Runtime check uses WP_Ability::check_permissions() (the public method that wraps the registered callback) and treats both bool and WP_Error returns correctly. Audit cross-check confirms the registered cap matches the audit's declared capability_gate.
Static mode catches structural problems (annotation lies, schema lint failures, audit mismatches). Some bug classes only surface against a booted WordPress: missing constructor arguments, bootstrap-ordering issues, schema-validator-path errors, capability-roundtrip failures, idempotency violations at the response-byte level. Adds the runtime-mode procedure: env-up via the plugin's AGENTS.md (observed patterns include npm-run-wp-env, plain wp-env, monorepo docker, and composer-based bootstraps), six canonical wp-cli checks (name enumeration, annotation read-back, read execution, write-input validation, permission gate roundtrip, idempotency twin-invocation diff). Each check produces a row in the structured report, with hard-fail-on-fatal and explicit interpretation of WP_Error returns from check_permissions() (bool|WP_Error). PHP snippets target the 7.2.24+ floor declared in the skill's compatibility frontmatter — closures instead of arrow functions, strpos instead of str_starts_with.
When verify is invoked with an audit doc as input, validating the doc itself comes first — every other check then assumes a well-formed audit. Co-locating the validator with the verify skill keeps "validate audit" in the same procedure as "validate registered abilities" so a single run produces one consolidated report. Adds a reference covering YAML extraction (one fenced block per audit doc), per-field validation against the canonical schema owned by wp-abilities-audit (no field-table duplication; the canonical is the single source), and three whole-audit invariants (reference_ability uniqueness, backing: null entries paired with surfaced_gaps, optional excluded_from_mvp / surfaced_gaps arrays allowed but missing entirely is a WARN). FAIL on any required-field miss; WARN on the legacy compound-string capability_gate form.
The verify skill's most distinctive output is catching a readonly: true ability that quietly writes — a class of bug that unit tests don't catch because the mock looks just like the real writer. The scenario exercises that case end-to-end. Adds eval/scenarios/wp-abilities-verify.json. Scenario covers a plugin with 7 registered abilities (6 reads + 1 write) where one of the read abilities accidentally calls $wpdb->update inside its execute callback after a refactor. Plus an audit doc that declared the expected annotations before the refactor. Success criteria include FAIL top-line status, exactly one FAIL row in the annotation-correctness table with file + line evidence of the $wpdb->update call, and OK on the other 5 readonly abilities.
A registered abilities surface that passes its unit tests can still ship lies: an ability annotated readonly: true whose callback writes to $wpdb, an ability whose registered permission_callback is shape- correct but resolves to a cap subscribers hold, a schema that declares per_page but whose backing reads pagesize. None of these trip a typical test suite. They trip an agent in production. Adds the wp-abilities-verify skill: a 7-step procedure for verifying a plugin's Abilities API registrations against a shipped audit doc or against the canonical schema independently. Two modes (static and runtime) both produce the same structured report. The skill is honest about its blind spots — static mode is grep-driven and won't catch indirected service writes, hooks-as-writes, or variable-built HTTP methods on delegate helpers; the SKILL.md "Static-mode coverage caveats" section names the gaps explicitly so a static-mode PASS isn't read as "verified write-free."
5168e24 to
bb1de00
Compare
nerrad
left a comment
There was a problem hiding this comment.
As I reviewed the current state of the wp-abilities-verify skill, I'm a bit wary that it might be too exhaustive/wordy and risks agent indirection or getting stale with the source code. Especially when smarter models come along (this has already happened with some of the attempts I've made with using this). So comments will reflect that.
Another minor thing to note is that the the repo's README's "Available Skills" table doesn't list wp-abilities-audit or wp-abilities-verify.
| ### Lint 5 — default values are primitive or `null` | ||
|
|
||
| ```php | ||
| // OK — primitive defaults. | ||
| 'per_page' => [ 'type' => 'integer', 'default' => 25 ], | ||
| 'submit' => [ 'type' => 'boolean', 'default' => false ], | ||
| 'status' => [ 'type' => 'string', 'default' => 'pending' ], | ||
| 'metadata' => [ 'type' => 'object', 'default' => null ], | ||
|
|
||
| // WRONG — complex expression as default. | ||
| 'created_at' => [ 'type' => 'string', 'default' => gmdate( 'c' ) ], | ||
| ``` | ||
|
|
||
| **Why:** a default that evaluates per-call is both non-deterministic | ||
| (twin invocations can differ, violating `idempotent: true`) and | ||
| surprising to agents that expect defaults to be static values. | ||
|
|
||
| **Check:** each `'default'` value must be a scalar literal (`true`, | ||
| `false`, integer, float, quoted string, `null`) or an empty array | ||
| literal. A function call, variable reference, or computed expression → | ||
| FAIL. |
There was a problem hiding this comment.
This says defaults must be primitive, null, or an empty array. But input-schema-gotchas.md correctly says zero-arg object ab ilities need a top-level schema default like (object) array() because normalize_input() applies top-level defaults. See https://github.com/rtio/agent-skills/blob/bb1de00133d64698b78114c9712749ac814c027c/skills/wp-abilities-api/references/input-schema-gotchas.md#L205-L222 and WP_Ability::normalize_input(). As written, it looks like verify could end up flagging the recommended fix here as invalid.
There was a problem hiding this comment.
Good catch — fixed in c5f962a.
Lint 5 now accepts (object) array(), (object) [], new stdClass(), and all-literal arrays as valid defaults. Cross-linked input-schema-gotchas.md §4 explicitly in the lint copy so the agreement between the two skills is visible to anyone reading either one. Function-call defaults like gmdate('c') or wp_generate_uuid4() still FAIL because the value varies per call — the lint is now about "evaluates to the same shape every call," which the cast and new stdClass() do.
…default
The Lint 5 rule previously required default values to be scalar
literals, `null`, or empty array literals — anything else, including a
type-cast like `(object) array()`, was a FAIL.
This contradicted the recommended fix in `wp-abilities-api`'s
`input-schema-gotchas.md` §4, which specifically calls for
`'default' => (object) array()` at the top level of an `object`-typed
schema so `WP_Ability::normalize_input()` can substitute a sane default
when the indirect path is invoked with `null`. Without that top-level
default, zero-arg-allowed abilities fail at the validator boundary.
Verify would have flagged the recommended fix as a lint violation,
sending implementers in circles.
Broaden Lint 5 to accept type-cast empty object literals
(`(object) array()`, `(object) []`), `new stdClass()`, and all-literal
arrays — anything that evaluates to the same shape on every call.
Function-call defaults like `gmdate('c')` or `wp_generate_uuid4()`
remain a FAIL because they vary per call. Cross-link the gotchas
reference so the agreement between the two skills is explicit.
Refs WordPress#47 (comment)
The skill described `idempotent: true` as "same input always produces same effect" with a hard FAIL rule that twin invocations must be byte-identical. Two problems with that framing: 1. Core's docblock at `class-wp-ability.php` lines 47-48 defines idempotent as "calling the ability repeatedly with the same arguments will have no additional effect on its environment." That's about *environmental writes*, not return-value determinism. A read that embeds `time()` in its response is idempotent under core's reading; the skill would have FAILed it. 2. The Abilities REST run controller operationalizes annotations into HTTP method routing (`readonly: true` -> GET, `destructive && idempotent` -> DELETE, otherwise POST — see lines 110-116 of `class-wp-rest-abilities-v1-run-controller.php`). That's the load-bearing semantic, not response-shape equality. Reframe the three-claims table, the static idempotent-but-non-idempotent check, the runtime twin-invocation check, and the SKILL.md summary around environmental side effects. Twin-invocation byte-equality stays as a heuristic (cheap pass when hashes match; signal worth inspecting when they don't), not as the definition. Drop the static "non-determinism" greps that flagged `rand()`, `wp_create_nonce()`, and `time()` because they don't violate core's idempotent — they only vary the return value while leaving the environment unchanged. The new static checks target patterns that almost always reflect real environmental accumulation: counter writes, per-call cron schedules, append-only inserts, and sequence increments. Refs WordPress#47 (comment)
…->delete) The mutator-method regexes in Step 2 (readonly-but-writes) and Step 3 (destructive-but-says-false) required a trailing `_` after the verb, e.g. `->save_record(`. Direct mutator calls without a suffix — `->save()`, `->delete()`, `->add_order_note()`, `->set_status()` — fell through silently because the regex demanded `verb_` and they end in `verb(`. Fix: change the trailing token from `_` to `(_|\()` so both verb-prefixed methods and direct calls match. Add `add` and `set` to the verb list (common additive writes the original list omitted) along with `trash`. Also note explicitly that the verb list is a representative starting point, not exhaustive — plugin vocabularies vary (camelCase verbs, domain-specific writes like `->markAsPaid()` or `->commit()`) and the agent should extend it for the plugin under verification. Refs WordPress#47 (comment)
…ls in static enumeration Step 2's name-extraction regex assumed `wp_register_ability(` and the first quoted argument appear on the same line. The canonical formatting shown at the top of the same file is multi-line, so the regex would miss the file's own example — and any plugin that follows the multi-line convention. Recommend ripgrep's `--multiline --pcre2` mode as the primary path, keep the single-line grep as a fast inline-form check, and document `pcregrep -M` / `perl -0777` as fallbacks when only POSIX grep is available. When the regex still misses a registration (heredoc names, unusual formatting), fall back to inspecting the call-site window by hand. Refs WordPress#47 (comment)
The Available Skills table didn't list the audit or verify skills, so readers landing on the README couldn't discover them even though they ship in this repo. Add concise rows next to wp-abilities-api so the abilities cluster appears together. Refs WordPress#47 (review)
|
Hey Darren — pushed five focused fixes addressing the specific issues; replied inline on each thread with detail.
On the broader thread that runs through the whole review — that the references are over-prescriptive and risk locking behavior to current model capability — I think you're right. The right move is to reshape from "exhaustive grep playbook" to "principles + a few examples + agent-derived patterns." That's a substantive change and I'd rather scope it as a separate pass once these bug fixes are in. Happy to open a follow-up issue if you'd like to track it explicitly. |
Per Darren's 2026-05-08 follow-up: the references were packing too much into the initial iteration and risk model indirection (he has signal from Woo Core usage of similar prescriptions). Lighter v0; iterate as real-world usage tells us where to add specificity. Three references go from prescriptive grep playbooks to principle + illustrative examples + "agent operationalizes for the plugin's vocabulary." The runnable harnesses (runtime-harness.md, permission-roundtrip.md, audit-schema-validation.md) stay closer to as-is in this commit because they're concrete runbooks, not pattern catalogues. annotation-correctness.md: 333 -> 145 lines. Drops the exhaustive grep playbook (~50 lines of write-pattern recipes), the severity table, the hardcoded false-positive allow-list, and the "why grep not parser" section. Keeps the "Why this matters" framing, the three-claims table with core's idempotent wording, the HTTP method routing reference, the known blind spots, the //verify-ignore suppression mechanism, the runtime heuristic complement, and the report format. Common write patterns are listed as a starting set, not a checklist. schema-lints.md: 250 -> 100 lines. Each of the six lints is now a short principle (5-15 lines) instead of a grep recipe + result table. Lint 5 keeps the corrected `(object) array()` acceptance from c5f962a and the cross-link to input-schema-gotchas.md. static-enumeration.md: 280 -> 90 lines. Steps 1-4 stay; the 3.x annotation patterns, fluent-builder details, helper-method drilldowns, and heredoc-callback subsection drop. The multi-line `wp_register_ability` extraction from ee85b28 is preserved as a single ripgrep example with one-line fallback notes. The eval scenario (eval/scenarios/wp-abilities-verify.json) survives the slim — its success criteria don't depend on specific patterns being present in the references, just on the agent finding \$wpdb->update() in a readonly callback. The new annotation-correctness explicitly lists \$wpdb->update as a candidate write pattern; the agent will catch it. Refs WordPress#47 (review)
… too Continuing the lighter-v0 reshape from dad0196 across the remaining files that carried prescription weight: permission-roundtrip.md: 301 -> 164 lines. Drops the static-grep patterns block (the most prescriptive part — exactly what Darren flagged for Woo Core indirection). Compresses the Shape A-F catalogue from a per-shape code-block-plus-narrative into a single classification table. Keeps the background section (the WP_REST_Request antipattern warning is load-bearing — it ships in real plugin copies from REST controllers), the runnable wp eval block, the audit cross-check narrative, and the static-only mode caveat. SKILL.md: 220 -> 212 lines, but the structure is leaner. Drops the "Static-mode coverage caveats" section that duplicated content now living more concisely in the slimmed annotation-correctness.md. Step 4's prose summary tightens to mirror the slimmed reference. The description in frontmatter clarifies that the centerpiece is "callback behavior matches each annotation's claim," dropping language that implied an exhaustive grep playbook. Total skill footprint: 1,835 (round-1) -> ~1,900 (round-2 with clarifications) -> 1,249 (this push). Net cut roughly one-third. The runnable harnesses (runtime-harness.md, audit-schema-validation.md) are preserved at their current size — they're concrete runbooks with executable wp eval blocks, not pattern catalogues, and they pull their weight at the line count they sit at. Eval scenario still passes (`node eval/harness/run.mjs`). Refs WordPress#47 (review)
|
Hey @nerrad — agreed and pivoted. The round-2 bug fixes landed first (idempotent definition, Lint 5 vs Round-3 commits (
Skill footprint went from ~1,900 lines (after round-2) to 1,249 in this push — roughly a one-third cut. The runnable harnesses ( Eval scenario still passes (the agent finds PR body rewritten to reflect the lighter framing and the iteration history. Open to further trimming if you see specific sections that still pack too much; my read is the runnable harnesses pull their weight at their current size, but happy to revisit. |
Adds a new skill
wp-abilities-verifythat checks a plugin's Abilities API registrations against the structural and behavioral promises they declare. The centerpiece is adversarial annotation correctness: areadonly: trueability whose execute callback actually writes (via$wpdb->update,update_option,wp_insert_*, a non-GET delegate, etc.) is the bug class that unit tests don't catch — the mock looks just like the real writer. Verify reads the callback body and compares what it does against what the annotation claims, with a small set of common write patterns as a starting point and the agent operationalizing for the plugin's own vocabulary.Depends on #44, #45, #46. The branch stacks on
wp-abilities-audit(#46) which stacks onwp-abilities-apimechanics (#45) which stacks onwp-abilities-apistrategy refs (#44). For a focused view of just this PR's commits, seefeat/abilities-audit-skill...feat/abilities-verify-skillon the fork.What lands here
skills/wp-abilities-verify/SKILL.mdreferences/static-enumeration.mdwp_register_ability(calls (multi-line viarg --multiline --pcre2), extract names and annotation blocks, followexecute_callbackto its body. Records limitations for variable-indirected names / annotations / callbacks so reviewers know where runtime mode is required.references/annotation-correctness.mdclass-wp-ability.phplines 47-48 for idempotent — "no additional effect on the environment") and the run controller's HTTP method routing (class-wp-rest-abilities-v1-run-controller.phplines 110-116). Lists common write patterns as a starting set, not a checklist. Documents known blind spots (indirected service writes, hooks-as-writes, variable-built HTTP methods, tautological caps), the// verify-ignoresuppression mechanism, and the runtime twin-invocation heuristic.references/schema-lints.mdinput_schema: object schemas declareadditionalProperties; required fields have descriptions; enums non-empty; no$ref; defaults are statically constant (including(object) array()perinput-schema-gotchas.md§4); reference abilities have no required inputs.references/permission-roundtrip.mdpermission_callbackactually receives (validated input value, never aWP_REST_Request— the antipattern ships in real plugins copied from REST controllers without translation). Six callback shapes (A-F) classified into OK / WARN / FAIL. Runnable wp-cli block exercising the gate against anon / subscriber / admin contexts. Audit cross-check when an audit doc is provided.references/runtime-harness.mdreferences/audit-schema-validation.mdwp-abilities-audit. Three whole-audit invariants (reference_abilityuniqueness,backing: nullpaired with asurfaced_gapsentry, optional arrays may be empty).eval/scenarios/wp-abilities-verify.json$wpdb->updateafter a refactor; success criteria include FAIL top-line status and one FAIL row in annotation correctness with file + line evidence.Verify-skill content total: 1,249 lines across 7 markdown files + 1 eval scenario.
Iteration history
This PR went through three rounds against @nerrad's review:
c5f962a..d8ea032) — five focused fixes: alignidempotentwith core's wording, accept(object) array()in Lint 5, broaden the mutator-verb regex, handle multi-linewp_register_abilitycalls, list audit + verify in the README. Replied inline on each thread.dad0196..f0070f6) — slim the skill per @nerrad's follow-up: cut prescriptive grep playbooks (~650 lines net) so the references read as principles + a few examples + agent-derived patterns rather than exhaustive catalogues. Runnable harnesses (runtime-harness, permission-roundtrip's wp eval block, audit-schema-validation) preserved because they're concrete runbooks, not pattern lists. Total skill footprint cut by roughly one-third.Verification
WP_Ability::check_permissions,WP_Ability::invoke_callback,WP_Ability::validate_input,WP_Ability::execute,WP_Ability::get_meta,WP_Ability::get_category,wp_register_ability,wp_get_ability,wp_get_abilities,wp_set_current_user,wp_create_user,get_user_by,is_wp_error,current_user_can,is_user_logged_in,__return_true,WP_User::set_role,WP_Error::get_error_code) verified againstWordPress/wordpress-developtrunk. Idempotent definition matchesclass-wp-ability.phplines 47-48; HTTP method routing matchesclass-wp-rest-abilities-v1-run-controller.phplines 110-116.php -l'd on PHP 7.2 — no syntax errors.node eval/harness/run.mjspasses.Reviewer notes
permission-roundtrip.mdbackground section is the most prominent prose-relative-to-purpose section in the skill — it documents that permission callbacks receive the validated input value, never aWP_REST_Request. Kept prominent because the antipattern (callback signature($request)calling$request->get_method()) ships in real plugins copied from REST controllers; the skill catches this as a static FAIL.// verify-ignore: <annotation>mechanism inannotation-correctness.mdis the suppression path for legitimate write-shaped patterns that are semantically reads (read-through cache viaset_transient,update_user_metafor last-read tracking, etc.). Format is intentionally narrow rather thanverify-ignore: all-broad.ship cautiously; let usage drive shapeprinciple and avoids locking behavior to current model capability.