Operator-facing reference for how a skill moves from "published" to "archived", how usage telemetry is collected, and how dependency closures are resolved at install time. Phase 5 is the slice that turns the catalog from a write-only registry into a living organism.
This doc ties three Phase 5 sub-systems together:
- Decay — finding and archiving skills nobody uses anymore.
- Telemetry — counting downloads and views without storing user content.
- Dependencies — declaring
requires:in frontmatter, walking the closure at install time.
If you only care about one of these, jump to the relevant section. The Lifecycle summary table at the bottom maps every verb to the endpoint + table that implements it.
A published skill becomes a decay candidate when it stops being
useful. The default heuristic, per the master plan and the
/v1/tenant/skills/decay endpoint:
candidate ⇔ last_used_at < now() - {days} days
AND use_count < {max_uses}
The two counters live on the skills row itself (migration 0011):
| column | type | bumped by |
|---|---|---|
use_count |
INTEGER |
every download or view event |
last_used_at |
TIMESTAMPTZ |
replaced on every event |
A NULL last_used_at (never used) is always treated as a candidate
provided use_count < max_uses — that's how a brand-new skill that
sat unused for six months ends up in the graveyard alongside actually
decayed ones.
| knob | default value | source |
|---|---|---|
days |
180 |
DEFAULT_DAYS in server/src/routes/decay.rs |
max_uses |
3 |
DEFAULT_MAX_USES in server/src/routes/decay.rs |
limit |
200 |
hard-capped at 1000 |
The on-demand endpoint above shows operators which skills would
decay. A background tokio task (spawn_decay_sweep in
server/src/main.rs) runs the same heuristic periodically and flips
qualifying rows to status = 'archive_candidate' so curators see
them flagged proactively — without auto-archiving anything (that
remains an explicit admin verb via POST /v1/skills/{slug}/archive).
| knob | default value | source |
|---|---|---|
decay_check_interval_secs |
86400 (24h) |
SKILL_POOL_DECAY_CHECK_INTERVAL_SECS |
| stale-days threshold | 180 |
routes::decay::DEFAULT_SWEEP_STALE_DAYS |
| min-uses threshold | 3 |
routes::decay::DEFAULT_SWEEP_MIN_USES |
Set the interval to 0 to disable the sweep (the on-demand
/v1/tenant/skills/decay endpoint continues to work). The sweep
shares the queue worker's shutdown channel so SIGTERM drains both at
the same time. Errors log + continue: a transient DB blip never
crashes the server.
The new archive_candidate value lives in skills.status (migration
0027). Catalog list / search endpoints filter status = 'published',
so flagged skills disappear from the catalog automatically. Operators
restore via the existing un-archive SQL recipe in the Reversibility
section.
Operators can override at query time:
GET /v1/tenant/skills/decay?days=90&max_uses=1&limit=50
Authorization: Bearer <admin-token>Listing decay candidates and flipping them to archived is admin
only — the endpoints require the tenant:admin scope on the API
token (require_scope in decay.rs). The web catalog renders the
graveyard view from these endpoints; ad-hoc curl works too.
POST /v1/skills/{slug}/archive?kind=skill
Authorization: Bearer <admin-token>Effect: flips the latest published row's status from published to
archived. The list/search endpoints filter status='published', so
the skill disappears from the catalog and from skill-pool ensure
output the next time someone re-runs it.
Archive is a soft delete. The row stays in the database, all
referenced dependencies remain intact (skill_dependencies rows are
not cascaded), and a future "un-archive" admin endpoint can flip the
status back without losing history. Today there's no UI for that — if
you need to restore, run:
UPDATE skills SET status = 'published'
WHERE tenant_id = $1 AND slug = $2 AND status = 'archived'
AND id = (SELECT id FROM skills WHERE tenant_id = $1 AND slug = $2
ORDER BY created_at DESC LIMIT 1);Future work (see Future work) replaces this with a proper admin button.
Decay only applies to kind = 'skill' for now. Agents and commands
have different baseline usage patterns (an agent might be invoked
millions of times via tool-call, a slash-command zero times for months
between releases) — the same 180d/3-use heuristic would archive them
incorrectly. Re-tune per-kind decay when agents/commands have enough
traffic to model.
Two event kinds are recorded today, defined by the CHECK constraint in
migration 0013:
| event | trigger | route |
|---|---|---|
download |
GET /v1/skills/{slug}/bundle.tar.gz |
skills::get_bundle |
view |
GET /v1/skills/{slug}/skill-md |
skills::get_skill_md |
Both events run through record_usage in server/src/routes/skills.rs
on a best-effort basis: a DB error logs a warning but never blocks the
user's request.
Every event row in skill_usage_events (migration 0013) has:
| column | source |
|---|---|
tenant_id |
resolved tenant from the request |
skill_id |
the skills.id UUID being fetched |
event_kind |
'download' or 'view' |
user_id |
resolved user (NULL for token-only auth) |
token_id |
the API token that authenticated the request |
ts |
server now() |
By design, on purpose:
- No IP address. The
audit_eventstable records IPs for tenant admin actions; usage events are too high-volume to keep IPs at row granularity without a privacy review. - No user-agent. Same rationale.
- No user content / no skill body. Only the row reference. The bundle bytes never end up in this table.
- No referrer / no session context. A
viewfrom the web UI and aviewfrom a Claude session are indistinguishable in this table.
This is intentional: the table is meant to answer "who used what, when" at a level coarse enough to make decay decisions and tenant dashboards without becoming a behavioural-tracking system.
Three endpoints serve the read + write sides (server/src/routes/usage.rs):
| endpoint | shape |
|---|---|
GET /v1/tenant/usage/timeline |
per-day { day, downloads, views, unique_skills } (gap-filled w/ zeros) |
GET /v1/tenant/usage/top |
top N skills in the window: { slug, downloads, views, total } |
POST /v1/usage |
CLI-driven view event; body: { skill_id, kind, event, project_hash } |
skill-pool ensure POSTs one view event per successful skill
install to /v1/usage so the decay model sees session-load activity
alongside actual bundle downloads. Otherwise a popular skill that
gets installed once and read from disk many times looks unused from
the registry's vantage point — and quietly drifts toward
archive_candidate.
Defaults:
- Telemetry is ON by default. The CLI already authenticates against
the registry with its API token; sending one best-effort
viewevent per installed skill is symmetrical with that trust posture. --no-telemetryopts out per invocation. Use on air-gapped deploys or when the network policy forbids outbound POSTs from the install step.- The POST is fire-and-forget: a network blip logs at
debugand never blocks the install. The install path's contract is unchanged.
project_hash is the SHA-256 of the project root, truncated to 16
hex chars (~64 bits). It anonymises which project on which machine
sent the event so the server can dedup repeats without persisting a
reversible identifier. The field is accepted today but not yet
persisted; reserved for the dedup pass that lands with the v2 decay
heuristic.
Both are admin-scoped (tenant:admin). The timeline query uses
generate_series to fill missing days so the dashboard chart has no
gaps; top joins back to skills.slug so deleted IDs simply drop out
of the result.
No automatic retention today. Each event becomes one row and stays there. At our current write rate (single-digit events per skill per day per tenant) the table can grow for years before needing partitioning. Once monthly volume crosses ~10M rows the month-partition migration becomes worthwhile. See Future work.
For now, operators can trim manually:
DELETE FROM skill_usage_events
WHERE tenant_id = $1 AND ts < now() - INTERVAL '2 years';The aggregation endpoints recompute on every call, so trimming is safe.
A skill declares dependencies via requires: in its SKILL.md
frontmatter:
---
name: axum-tenant-handler
description: ...
requires:
- sqlx-migrations
- tenant-ctx@1.2.0
---Each entry becomes one row in skill_dependencies (migration 0012)
at publish time. The parsing rule lives in parse_requires_entry in
server/src/routes/skills.rs:
slug— version range*(any).slug@X.Y.Z— version range is the exact string after@.
Migration 0012_skill_dependencies.sql:
CREATE TABLE skill_dependencies (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
parent_skill_id UUID NOT NULL REFERENCES skills(id) ON DELETE CASCADE,
requires_slug TEXT NOT NULL,
version_range TEXT NOT NULL DEFAULT '*',
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE (parent_skill_id, requires_slug)
);Two important properties:
- Forward references are legal.
requires_slugis plain text — the target doesn't have to exist yet. PublishingAthat requires not-yet-existingBis fine. WhenBis published later, the closure resolves cleanly the next time someone walks it. ON DELETE CASCADE— if the parent skill row is hard-deleted (rare; usually archive is enough), the dependency rows go with it. Archived skills keep their dependency rows so the graph stays intact for forensics.
GET /v1/skills/{slug}/deps walks the transitive closure:
GET /v1/skills/axum-tenant-handler/deps
[
{ "slug": "sqlx-migrations", "version_range": "*", "depth": 1 },
{ "slug": "tenant-ctx", "version_range": "1.2.0", "depth": 1 },
{ "slug": "logging-init", "version_range": "*", "depth": 2 }
]The implementation is a WITH RECURSIVE CTE in routes::skills::get_deps
that:
- Seeds the closure from
skill_dependenciesrows whoseparent_skill_idmatches the resolved slug. - Joins back through
skills.slug → skill_dependencies.parent_skill_idon each iteration to descend further. - Cycle-safe.
UNIONdedups rows, and a harddepth < 10cap is belt-and-braces protection against pathological graphs. The depth cap means a cycle of two skills requiring each other surfaces as finite output, not infinite recursion. - Returns rows ordered
depth ASC, slug ASC— the caller can install shallow nodes first, or invert the order to install leaves first.
The endpoint is tenant-scoped (closure can only cross edges in the
same tenant) and returns 404 if the parent slug isn't published in
this tenant.
At publish time, every entry in requires: is checked against
existing dependency rows in the same tenant. If another published
skill already requires the same target slug at an incompatible
version range, the publish fails with 409 Conflict and an error
message naming both skills and both ranges:
skill `b` requires `lib@2.0.0` but skill `a` already requires `lib@1.0.0`
v1 compatibility predicate (check_version_compatibility in
server/src/routes/skills.rs):
| left range | right range | compatible? |
|---|---|---|
* |
anything | yes |
| anything | * |
yes |
1.0.0 |
1.0.0 |
yes (identical) |
1.0.0 |
2.0.0 |
no → 409 |
^1.2 |
^1.3 |
no → 409 (opaque string compare) |
For v1 we deliberately do not parse semver. Anything beyond * and
exact versions is treated as an opaque string; non-equal strings
collide. The trade-off: false-positives push operators to align
ranges explicitly, which is the right outcome until we ship a proper
resolver (see Future work).
The CLI's ensure command (cli/src/cmd/ensure.rs) walks each
manifest entry, calls /deps, and builds a deduplicated install plan:
- For each top-level entry in
[[skills]],[[agents]], and[[commands]], push it onto the plan. - For
[[skills]]only (agents/commands don't have transitive deps today), callGET /v1/skills/{slug}/depsand push every entry in the closure. - Dedup by
(slug, kind)— the same dep pulled by two different parents installs exactly once. - Sort deepest-first, then alphabetically. Leaves land on disk before their dependents, which keeps the symlinks coherent if a curious user inspects the project mid-install.
- For each plan entry, resolve
version="*"againstGET /v1/skills/{slug}?kind=...then download the bundle. A missing-from-registry slug (forward reference not yet published) logswarn: skipping …and the rest of the plan continues — the user can re-runensureafter the missing piece is published.
The CLI uses the depth cap implicitly: any closure deeper than ten
levels is malformed and gets capped server-side, so ensure will
never spin forever on a corrupt graph.
The table below maps every verb a skill goes through to the endpoint that handles it and the table(s) it touches. Use this as a quick reference when wiring up dashboards or writing runbook steps.
| verb | endpoint | tables touched |
|---|---|---|
| publish | POST /v1/skills (multipart) |
skills, skill_dependencies |
| embed | (inline, during publish, if --features fastembed) |
skills.description_embedding |
| list / search | GET /v1/skills?query=&tags=&semantic=&kind= |
skills (read) |
| fetch metadata | GET /v1/skills/{slug}?kind= |
skills (read) |
| fetch body | GET /v1/skills/{slug}/skill-md?kind= |
skills (read), skill_usage_events |
| download bundle | GET /v1/skills/{slug}/bundle.tar.gz?kind= |
skills (read), skill_usage_events |
| install via MCP | tools/call install_skill { slug, kind } |
skills (read) |
| bump use_count | (inline, during fetch body / download) | skills.use_count, skills.last_used_at |
| walk closure | GET /v1/skills/{slug}/deps |
skill_dependencies (recursive) |
| decay candidate | GET /v1/tenant/skills/decay?days=&max_uses= |
skills (read) |
| decay sweep | background tokio task (every decay_check_interval_secs) |
skills.status ← archive_candidate |
| archive | POST /v1/skills/{slug}/archive |
skills.status, audit_events |
| usage timeline | GET /v1/tenant/usage/timeline?days= |
skill_usage_events (read) |
| top skills | GET /v1/tenant/usage/top?days=&limit= |
skill_usage_events, skills (read) |
| CLI usage event | POST /v1/usage (called by skill-pool ensure) |
skill_usage_events, skills |
Defaults that operators most commonly tune: decay days=180,
max_uses=3; usage timeline days=30; closure depth cap 10.
Postgres is the source of truth for the catalog. For teams that want a human-readable, audit-grade history on disk, the server can additionally commit every successful publish into a Git repo.
Enable with a single env var:
SKILL_POOL_GIT_REPO_PATH=/var/lib/skill-pool/catalog-mirrorWhen set, both publish paths (POST /v1/skills and the
POST /v1/drafts/{id}/publish promotion) spawn a detached
git_sync::commit_skill task after a successful row insert. The
publish response is never blocked on the Git side — if git isn't
installed, the repo path doesn't exist, or the commit fails for any
reason, the failure is logged and the publish still returns 2xx.
Postgres remains the source of truth; Git is a mirror.
On-disk layout:
<repo>/<tenant_slug>/<kind>/<slug>/<version>/SKILL.md
<other-bundle-files>
<kind> is one of skill, agent, command. Promoted drafts always
write under skill/ (drafts have no explicit kind today). Each commit
has subject publish: <tenant>/<kind>/<slug>@<version> and is authored
as skill-pool@local.
If you want signed commits or a custom author, run the path under a
working tree whose .git/config already sets those — the spawned
process picks up local config from git -C <repo>.
Phase 5 ships the bones. The following deferred items live on the roadmap; tracking them here so the next operator who edits this doc has the context.
- Sliding-window decay. Today decay is binary: "below max_uses in
the last N days". A sliding window (e.g. "downloads in last 14 days
must exceed downloads in same window 6 months ago") would catch
declining skills before they hit the hard threshold. Requires
back-fill of
skill_usage_eventspartitions and a heavier query. - Retention rules on
skill_usage_events. No automatic cleanup today. Once the table volume warrants it, partition monthly and drop partitions older than the tenant's configured retention window. Surface as a per-tenant setting (usage_retention_days). - Semver-aware conflict detection. The publish-time check now
catches the common case (parent A requires
lib@1.0.0, parent B requireslib@2.0.0→ 409). It treats non-*ranges as opaque strings, which is intentionally narrow:^1.2vs^1.3reads as a conflict even though semver would consider them compatible. A follow-up slice parses semver ranges properly and downgrades those false-positives to "OK". Until then, operators align ranges explicitly when the resolver complains. - Per-kind decay tuning. As noted above, agents and commands need their own thresholds before they participate in the graveyard view.
- Un-archive UI button. Today archive is one-way through the UI;
reversal requires a SQL update. Add a
POST /v1/skills/{slug}/unarchivewith admin scope + audit event. - Closure caching. The recursive CTE is cheap today but caching
the closure (with
created_atinvalidation on dep publish) saves round-trips for the hot path ofensureagainst deeply nested graphs.