A TypeScript backend that tracks the agentic commerce protocol ecosystem 24/7 and fires alerts when anything changes.
Four pipelines:
- Snapshot tracker — fetches ~50 protocol sources (UCP, ACP, AP2, x402, WebMCP, MCP, MPP, Visa TAP, Mastercard Agent Pay, Shopify, Firmly, Rye, DaVinci, Wizard) across seven source kinds (html / html_js / markdown / github_repo / github_commits / rss / sitemap), normalizes content, hashes, and records diffs when anything changes. Flip-flop reverts (A→B→A) are detected and suppressed. Severity is upgraded to
majoron line-count or keyword match (breaking/deprecated/version/etc). LLM-summarized with llm_severity (breaking/additive/editorial) and llm_category (schema/docs/example/meta/release). - Discovery scanner — searches GitHub (rolling 90-day window, with rate-limit backoff), Hacker News, and the general web (Brave / Tavily / DuckDuckGo) for new or updated agentic-commerce protocols. Trust-filtered (domain allowlist) and recency-filtered (published after tracker start). Claude Haiku classifies findings.
- Alerting — webhook / Slack / Discord subscribers with per-protocol / per-severity / per-minimum-line-count filters. HMAC-signed webhook deliveries. Every detected change fires to matching subscribers automatically, and every delivery is logged with retry-on-failure semantics.
- Daily digest — aggregates changes, findings, matches, and trusted searches into a narrative markdown digest written by Claude Sonnet.
Plus doc↔git correlation — matches git commits in upstream spec repos to downstream doc-site changes, so you can see which commits landed in the official docs and which are still floating.
The whole thing runs as a single long-lived Bun process: snapshot cycle hourly, web search twice a day (UTC), daily digest once a day (UTC), snapshot pruning once a day (UTC). Exposes a REST API for frontends, an RSS feed for readers, and an MCP server for Claude Code.
- Runtime: Bun
- Framework: Hono
- Database: SQLite (
bun:sqlite) with FTS5 virtual table for RAG search - HTML→markdown:
node-html-markdown - Headless browser: optional
playwright(install on demand) - LLM:
@anthropic-ai/sdk— Haiku for diff summarization + classification, Sonnet for daily digest - MCP:
@modelcontextprotocol/sdk— stdio transport
cd protocol-tracker
cp .env.example .env
# (optional) add ANTHROPIC_API_KEY, GITHUB_TOKEN, BRAVE_SEARCH_API_KEY, TAVILY_API_KEY
bun install
bun run dev # or: bun run startOn first start it seeds ~35 protocol sources, initializes the tracker start epoch, and launches the scheduler. API is live on http://localhost:3000.
The html_js source kind uses Playwright. It's an optional dep — install if you need it:
bun add playwright
bunx playwright install chromiumFor stealthier fetching on Linux, run inside xvfb-run:
HEADLESS=false xvfb-run -a bun run src/index.tsOr use Camoufox via the BROWSER_CHANNEL env var.
┌─────────────────────────────┐
│ Hono REST API │
│ + MCP stdio server │
└──────────────┬──────────────┘
│
┌──────────────────────────────────▼──────────────────────────────────┐
│ bun:sqlite │
│ sources · snapshots · changes · findings · scans │
│ digests · doc_git_matches · search_results · meta · snapshots_fts │
└──────────▲──────────────────────────────────────────────────────────┘
│
┌──────────┴──────────────────────────────────────────────────────────┐
│ Scheduler (24/7) │
│ │
│ HOURLY snapshot cycle │
│ └─► fetchers: html · html_js · markdown · github_repo │
│ · github_commits · rss · sitemap │
│ └─► smart normalize → hash → diff → change │
│ └─► LLM summarize → severity + category + highlights │
│ └─► doc↔git matcher → correlate commits with doc changes │
│ │
│ 8Z, 20Z web search cycle │
│ └─► providers: brave · tavily · duckduckgo │
│ └─► trust filter (domain allowlist) │
│ └─► recency filter (published > tracker_start_epoch) │
│ │
│ 21Z daily digest │
│ └─► LLM (Sonnet) writes markdown over last 24h │
│ │
│ ADHOC discovery scan (GitHub + HN) │
│ └─► findings inbox → classify → promote or dismiss │
└─────────────────────────────────────────────────────────────────────┘
| Kind | Fetches | Normalization | Best for |
|---|---|---|---|
html |
plain HTTP → markdown via node-html-markdown | strip nav/footer, collapse whitespace | static spec sites, docs |
html_js |
Playwright (headless or headful via xvfb) | same as html |
SPAs, JS-rendered docs |
markdown |
plain HTTP, treat body as markdown | normalize whitespace | raw .md files in repos |
github_repo |
GitHub API (repo meta + README + 5 commits/releases) | included | slow-moving repos where README is the artifact |
github_commits |
GitHub API (15 commits + 15 releases + 15 tags + 15 merged PRs) | structured feed | git-first tracking of spec repos — this is the Scout-style channel |
rss |
RSS/Atom parse (dependency-free) | item list as markdown | company blogs, release feeds, Substack |
sitemap |
/sitemap.xml (follows sitemapindex) |
sorted URL list | auto-discover new spec pages |
See src/types.ts for the full types. Highlights:
- Source:
{ id, name, url, kind, protocol, tags[], checkIntervalHours, config, lastCheckedAt, active, createdAt } - Snapshot:
{ id, sourceId, fetchedAt, status, contentHash, content, rawBytes, error, metadata } - Change:
{ id, sourceId, fromSnapshotId, toSnapshotId, detectedAt, additions, deletions, severity, llmSummary, llmSeverity, llmCategory, llmHighlights } - Finding:
{ id, discoveredAt, sourceType, url, title, description, signals, classification, confidence, reasoning, promotedSourceId, status } - SearchResult:
{ id, searchedAt, provider, query, url, title, snippet, publishedAt, domain, trusted, recent, promotedFindingId } - DocGitMatch:
{ id, matchedAt, gitChangeId, docChangeId, protocol, confidence, reasoning, lagSeconds } - Digest:
{ id, generatedAt, periodStart, periodEnd, kind, headline, summary, changeCount, findingCount, searchCount, matchCount, model, error }
Timestamps are epoch milliseconds. All responses are JSON. CORS is open by default.
GET /health{"ok": true, "time": 1765000000000, "trackerStartEpoch": 1764000000000, "uptime_ms": 1000000, "version": "0.2.0"}GET /sources?active=true&protocol=UCP&kind=github_commits
POST /sources # body: {name, url, kind, protocol?, tags?, checkIntervalHours?, config?}
GET /sources/:id
PATCH /sources/:id # partial update
DELETE /sources/:id
POST /sources/:id/check # fetch + snapshot + diff immediately
GET /sources/:id/snapshots?limit=50
POST /sources/:id/wayback # body: {limit?, fromYear?} — seed history from archive.orgGET /snapshots/:id?content=true|false
GET /snapshots/:id/diff?from=<other_snapshot_id>GET /changes?limit=100&sourceId=<id>&severity=breaking&protocol=UCP&since=<epoch_ms>&until=<epoch_ms>
GET /changes.rss?limit=50&protocol=UCP&severity=major # RSS 2.0 feed
GET /changes/:id
POST /changes/:id/summarize # run LLM summary on one change
POST /changes/summarize-pending # body: {limit?} — batch
POST /changes/:id/redispatch # re-fire alerts for a changeEach change response includes joined source metadata (source.name, source.url, source.protocol, source.kind) so consumers don't need an N+1 lookup. Responses also include a cursor object with latest / earliest / next_since for clean incremental polling — just use ?since={next_since} on the next request.
Severity can be minor (rule-level: ≤50 diff lines and no major keywords), major (>50 diff lines OR contains breaking/deprecated/version/removed/renamed/required/changelog/migration keywords), or flipflop (a revert — skipped by alerting). LLM-level severity is independent: breaking / additive / editorial. Filter on either with ?severity=.
GET /subscribers?active=true
POST /subscribers
GET /subscribers/:id
PATCH /subscribers/:id
DELETE /subscribers/:id
GET /deliveries?subscriberId=<id>&changeId=<id>&status=success|failed&limit=100Create a webhook subscriber:
curl -X POST http://localhost:3000/subscribers \
-H "authorization: Bearer $ADMIN_TOKEN" \
-H "content-type: application/json" \
-d '{
"kind": "webhook",
"url": "https://your-service.com/hooks/tracker",
"name": "prod alerting",
"filterProtocol": "UCP",
"filterSeverity": "breaking",
"filterMinChangeLines": 20,
"secret": "your-shared-secret"
}'Webhook payloads are signed when secret is set. Headers on each delivery:
Content-Type: application/json
User-Agent: protocol-tracker/0.2
X-Tracker-Change-Id: <change id>
X-Tracker-Signature: sha256=<hmac-sha256 of body using secret>
Body:
{
"event": "change.detected",
"change": {
"id": "…",
"source_id": "…",
"detected_at": "2026-04-11T08:24:46.951Z",
"diff_summary": "+203 -0 lines",
"additions": 203,
"deletions": 0,
"severity": "major",
"llm_severity": "additive",
"llm_category": "schema",
"llm_summary": "ACP added new discount extension with …",
"llm_highlights": ["new field x", "new field y"]
},
"source": { "id": "…", "name": "MCP Spec (latest)", "url": "https://…", "protocol": "MCP" }
}Slack and Discord subscribers format the same data as Slack text messages and Discord embeds respectively (post them to incoming-webhook URLs directly).
Filter fields on a subscriber:
filterProtocol— exact match againstsource.protocolfilterSeverity— matches either rule severity orllm_severityfilterMinChangeLines— minimumadditions + deletions- Flipflops are always suppressed (they're revert noise).
GET /findings?status=pending&classification=new_protocol&limit=100
GET /findings/:id
POST /findings/:id/classify # LLM classify one finding
POST /findings/:id/promote # body: {name?, kind?, protocol?, tags?, checkIntervalHours?}
POST /findings/:id/dismissPOST /discovery/scan # {kinds?: ["github","hn"], queries?: [...]}
POST /discovery/websearch # {queries?: [...], providers?: ["brave","tavily","duckduckgo"]}
POST /discovery/classify # {limit?} — classify pending unclassified findings
GET /discovery/scans?limit=50
GET /discovery/websearch/results?trusted=true&recent=true&provider=brave&limit=100GET /digests?limit=30
GET /digests/latest
GET /digests/:id
POST /digests/generate # {periodHours?} — write an ad-hoc digestGET /matches?limit=100&protocol=UCP
POST /matches/run # run the matcher nowGET /rag/search?q=checkout+state+machine&protocol=UCP&limit=20Returns ranked excerpts of snapshot content with source context. Supports basic FTS5 syntax (AND, OR, "exact phrase", prefix*).
POST /admin/seed # re-run built-in seed (idempotent)
POST /admin/run/snapshot-cycle
POST /admin/run/discovery-cycle
POST /admin/run/websearch-cycle
POST /admin/run/digest-cycle
POST /admin/run/matcher
POST /admin/run/prune # run pruning cycle once
POST /admin/prune # {keepPerSource?: number}If ADMIN_TOKEN is set in env, every non-GET route requires:
Authorization: Bearer <ADMIN_TOKEN>
GET routes remain open so the frontend and RSS readers can poll without credentials. Leave ADMIN_TOKEN unset for open mode (useful in dev / local / behind a trusted reverse proxy).
# trigger a snapshot of every due source, with LLM summaries + matcher
curl -X POST http://localhost:3000/admin/run/snapshot-cycle
# see the feed of changes with LLM severity
curl "http://localhost:3000/changes?limit=20&severity=breaking" | jq
# search snapshot content
curl "http://localhost:3000/rag/search?q=checkout+mandate" | jq
# get the latest digest
curl http://localhost:3000/digests/latest | jqA separate stdio MCP server exposes every piece of tracker state to Claude Code so you can reason over it interactively.
Start it directly:
bun run mcpWire it to Claude Code:
claude mcp add protocol-tracker -- bun run /absolute/path/to/protocol-tracker/src/mcp-server.tsOr add to ~/.claude.json:
{
"mcpServers": {
"protocol-tracker": {
"command": "bun",
"args": ["run", "/absolute/path/to/protocol-tracker/src/mcp-server.ts"],
"env": {
"DATABASE_PATH": "/absolute/path/to/protocol-tracker/data/tracker.db",
"ANTHROPIC_API_KEY": "sk-ant-..."
}
}
}
}| Tool | Purpose |
|---|---|
list_sources |
List tracked sources; filter by protocol, active |
get_source |
Fetch one source |
list_changes |
Newest changes; filter by sourceId, severity, protocol |
get_change |
One change with LLM summary |
list_findings |
Discovery inbox |
list_digests |
Historical digests |
get_digest |
One digest, or latest if id omitted |
list_matches |
Doc↔git correlations |
search_content |
FTS5 RAG search across all snapshot content |
trigger_snapshot_cycle |
Run snapshot cycle now |
trigger_discovery_cycle |
Run GitHub + HN discovery now |
trigger_web_search |
Run Brave/Tavily/DuckDuckGo web search now |
trigger_matcher |
Run doc↔git matcher now |
trigger_digest |
Generate a digest now |
Now in Claude Code you can ask "what broke in UCP this week?" and it will pull list_changes, search_content, and get_digest on its own.
51 sources across 13 protocols covering the full agentic commerce ecosystem:
| Protocol | Sources |
|---|---|
| UCP | homepage + overview + checkout spec (html), github_repo, github_commits (6h cadence), releases (rss), sitemap |
| ACP | homepage + changelog (6h cadence) + delegate-payment spec, github_repo, github_commits, releases (rss), Stripe ACP docs, sitemap |
| AP2 | spec (html), github_repo, github_commits |
| MPP | Stripe MPP docs, Stripe Agentic Commerce Suite blog |
| x402 | homepage, github_repo, github_commits, releases (rss), a2a-x402 git feed |
| WebMCP | W3C spec (html), github_commits |
| MCP | spec (html), github_commits, sitemap, MCP blog RSS |
| Visa TAP | developer docs, visa/mcp git feed, Visa Intelligent Commerce overview + dev portal, Cloudflare blog RSS |
| Mastercard Agent Pay | Agentic Commerce Framework, Verifiable Intent |
| Shopify | shopify.dev/docs/agents + /catalog + /checkout/mcp + /checkout/ecp, Shopify Engineering RSS |
| Firmly | homepage |
| Rye | homepage, blog |
| DaVinci / Wizard | homepages |
| Blogs | Stripe, Shopify Engineering, Cloudflare, nekuda Substack, MCP blog (all RSS) |
Remove any with DELETE /sources/:id or add more via POST /sources.
| Job | Default cadence | Controlled by |
|---|---|---|
| Snapshot cycle | hourly (skips sources not yet due) | CHECK_INTERVAL_MINUTES |
| Per-source cadence | 24h (6h for changelog-style sources) | checkIntervalHours on each source |
| LLM summarize changes | immediately after each cycle that produced changes | automatic |
| Doc↔git matcher | immediately after summarize | automatic |
| Alert dispatch | immediately after summarize — fires to every matching subscriber | automatic |
| Web search | 2× daily at 08:00 and 20:00 UTC | SEARCH_CRON_HOURS_UTC=8,20 |
| Daily digest | 1× daily at 21:00 UTC | DIGEST_CRON_HOUR_UTC=21 |
| Snapshot pruning | 1× daily at 03:00 UTC | PRUNE_CRON_HOUR_UTC=3 |
| GitHub + HN discovery | adhoc (POST /admin/run/discovery-cycle) |
n/a |
Cron slots fire at most once per UTC day per slot (tracked in the meta table). Pruning keeps the N most-recent snapshots per source (SNAPSHOT_KEEP_PER_SOURCE=50) and never deletes rows referenced by change from_snapshot_id / to_snapshot_id, so the diff chain remains intact forever.
We persist a tracker_start_epoch on first boot. The web search pipeline flags a result as:
trusted = trueif its domain matches the allowlist in src/discovery/trust.ts (UCP/ACP/AP2/MCP authorities, primary vendors, research bodies).recent = trueifpublishedAt >= tracker_start_epoch— by definition, new stuff published after the tracker was turned on.
The daily digest prioritizes results where trusted OR recent.
Before hashing/diffing, content is normalized by src/normalize.ts:
- LF line endings
- trailing whitespace stripped
- 3+ blank lines collapsed to 2
- common boilerplate (on-this-page, copyright, "was this helpful?") stripped
- cache-busting query strings (
?v=,?cb=,?t=) removed from URLs - ISO-8601 timestamps collapsed to
<ts> stableJsonHash/stableJsonPrettyhelpers for OpenAPI/OpenRPC schemas
This eliminates ~90% of false-positive churn.
When you add a new HTML source with no history, seed a baseline from archive.org:
curl -X POST http://localhost:3000/sources/<id>/wayback \
-H "content-type: application/json" \
-d '{"limit": 5, "fromYear": 2024}'Fetches N unique-digest captures between fromYear and now, normalizes them, and inserts one snapshot per capture. Instant retroactive history.
| Env | Default | Purpose |
|---|---|---|
PORT |
3000 |
HTTP port |
DATABASE_PATH |
./data/tracker.db |
SQLite path |
USER_AGENT |
protocol-tracker/0.2 |
outbound UA |
CHECK_INTERVAL_MINUTES |
60 |
scheduler wake-up interval |
DISABLE_SCHEDULER |
0 |
1 = HTTP only, no background jobs |
SEARCH_CRON_HOURS_UTC |
8,20 |
web search fire times |
DIGEST_CRON_HOUR_UTC |
21 |
digest fire time |
PRUNE_CRON_HOUR_UTC |
3 |
snapshot pruning fire time |
SNAPSHOT_KEEP_PER_SOURCE |
50 |
how many recent snapshots to keep per source during pruning |
ADMIN_TOKEN |
(unset) | bearer token required on non-GET routes when set |
ANTHROPIC_API_KEY |
(unset) | enables LLM summarization, classification, digest |
GITHUB_TOKEN |
(unset) | raises GitHub API limit 10→30 req/min |
GITHUB_QUERY_WINDOW_DAYS |
90 |
rolling window for GitHub discovery queries |
BRAVE_SEARCH_API_KEY |
(unset) | enables Brave web search |
TAVILY_API_KEY |
(unset) | enables Tavily web search |
HEADLESS |
true |
Playwright headless mode |
BROWSER_CHANNEL |
(unset) | Playwright channel (chrome, msedge) |
- The database is a single SQLite file at
$DATABASE_PATH. Back it up by copying it. - Full normalized content is stored in every snapshot. At default cadence (35 sources × once a day) expect ~2–10 MB/day. Prune via SQL if it grows too large.
- Without an
ANTHROPIC_API_KEY, the LLM pipelines degrade gracefully: changes still detect, findings still classify asunclassified, the daily digest writes a simple fallback summary. - Without web search keys, only DuckDuckGo runs. It's best-effort and may miss results.
- The scheduler is idempotent per UTC day per slot — restarting the process mid-day won't re-fire a slot that already ran.
- Cold-start: on first boot, the snapshot cycle runs immediately so you get populated data without waiting an hour.
See docs/competitors.md in the parent repo for the competitor research this was built from — nekuda Protocol Scout (closest direct competitor, likely git-first), changedetection.io (general-purpose reference), Visualping (LLM summarization reference), Fluxguard (DOM-diff reference).