protocol-tracker

A TypeScript backend that tracks the agentic commerce protocol ecosystem 24/7 and fires alerts when anything changes.

Four pipelines:

Snapshot tracker — fetches ~50 protocol sources (UCP, ACP, AP2, x402, WebMCP, MCP, MPP, Visa TAP, Mastercard Agent Pay, Shopify, Firmly, Rye, DaVinci, Wizard) across seven source kinds (html / html_js / markdown / github_repo / github_commits / rss / sitemap), normalizes content, hashes, and records diffs when anything changes. Flip-flop reverts (A→B→A) are detected and suppressed. Severity is upgraded to major on line-count or keyword match (breaking/deprecated/version/etc). LLM-summarized with llm_severity (breaking/additive/editorial) and llm_category (schema/docs/example/meta/release).
Discovery scanner — searches GitHub (rolling 90-day window, with rate-limit backoff), Hacker News, and the general web (Brave / Tavily / DuckDuckGo) for new or updated agentic-commerce protocols. Trust-filtered (domain allowlist) and recency-filtered (published after tracker start). Claude Haiku classifies findings.
Alerting — webhook / Slack / Discord subscribers with per-protocol / per-severity / per-minimum-line-count filters. HMAC-signed webhook deliveries. Every detected change fires to matching subscribers automatically, and every delivery is logged with retry-on-failure semantics.
Daily digest — aggregates changes, findings, matches, and trusted searches into a narrative markdown digest written by Claude Sonnet.

Plus doc↔git correlation — matches git commits in upstream spec repos to downstream doc-site changes, so you can see which commits landed in the official docs and which are still floating.

The whole thing runs as a single long-lived Bun process: snapshot cycle hourly, web search twice a day (UTC), daily digest once a day (UTC), snapshot pruning once a day (UTC). Exposes a REST API for frontends, an RSS feed for readers, and an MCP server for Claude Code.

Stack

Runtime: Bun
Framework: Hono
Database: SQLite (bun:sqlite) with FTS5 virtual table for RAG search
HTML→markdown: node-html-markdown
Headless browser: optional playwright (install on demand)
LLM: @anthropic-ai/sdk — Haiku for diff summarization + classification, Sonnet for daily digest
MCP: @modelcontextprotocol/sdk — stdio transport

Quickstart

cd protocol-tracker
cp .env.example .env
# (optional) add ANTHROPIC_API_KEY, GITHUB_TOKEN, BRAVE_SEARCH_API_KEY, TAVILY_API_KEY
bun install
bun run dev     # or: bun run start

On first start it seeds ~35 protocol sources, initializes the tracker start epoch, and launches the scheduler. API is live on http://localhost:3000.

Optional: Playwright (for JS-rendered sources)

The html_js source kind uses Playwright. It's an optional dep — install if you need it:

bun add playwright
bunx playwright install chromium

For stealthier fetching on Linux, run inside xvfb-run:

HEADLESS=false xvfb-run -a bun run src/index.ts

Or use Camoufox via the BROWSER_CHANNEL env var.

Architecture

                    ┌─────────────────────────────┐
                    │       Hono REST API         │
                    │  + MCP stdio server         │
                    └──────────────┬──────────────┘
                                   │
┌──────────────────────────────────▼──────────────────────────────────┐
│                         bun:sqlite                                  │
│  sources · snapshots · changes · findings · scans                   │
│  digests · doc_git_matches · search_results · meta · snapshots_fts  │
└──────────▲──────────────────────────────────────────────────────────┘
           │
┌──────────┴──────────────────────────────────────────────────────────┐
│                         Scheduler (24/7)                            │
│                                                                     │
│  HOURLY  snapshot cycle                                             │
│     └─► fetchers: html · html_js · markdown · github_repo           │
│             · github_commits · rss · sitemap                        │
│     └─► smart normalize → hash → diff → change                      │
│     └─► LLM summarize   → severity + category + highlights          │
│     └─► doc↔git matcher → correlate commits with doc changes        │
│                                                                     │
│  8Z, 20Z  web search cycle                                          │
│     └─► providers: brave · tavily · duckduckgo                      │
│     └─► trust filter (domain allowlist)                             │
│     └─► recency filter (published > tracker_start_epoch)            │
│                                                                     │
│  21Z  daily digest                                                  │
│     └─► LLM (Sonnet) writes markdown over last 24h                  │
│                                                                     │
│  ADHOC  discovery scan (GitHub + HN)                                │
│     └─► findings inbox → classify → promote or dismiss              │
└─────────────────────────────────────────────────────────────────────┘

Source kinds

Kind	Fetches	Normalization	Best for
`html`	plain HTTP → markdown via node-html-markdown	strip nav/footer, collapse whitespace	static spec sites, docs
`html_js`	Playwright (headless or headful via xvfb)	same as `html`	SPAs, JS-rendered docs
`markdown`	plain HTTP, treat body as markdown	normalize whitespace	raw `.md` files in repos
`github_repo`	GitHub API (repo meta + README + 5 commits/releases)	included	slow-moving repos where README is the artifact
`github_commits`	GitHub API (15 commits + 15 releases + 15 tags + 15 merged PRs)	structured feed	git-first tracking of spec repos — this is the Scout-style channel
`rss`	RSS/Atom parse (dependency-free)	item list as markdown	company blogs, release feeds, Substack
`sitemap`	`/sitemap.xml` (follows sitemapindex)	sorted URL list	auto-discover new spec pages

Data model

See src/types.ts for the full types. Highlights:

Source: { id, name, url, kind, protocol, tags[], checkIntervalHours, config, lastCheckedAt, active, createdAt }
Snapshot: { id, sourceId, fetchedAt, status, contentHash, content, rawBytes, error, metadata }
Change: { id, sourceId, fromSnapshotId, toSnapshotId, detectedAt, additions, deletions, severity, llmSummary, llmSeverity, llmCategory, llmHighlights }
Finding: { id, discoveredAt, sourceType, url, title, description, signals, classification, confidence, reasoning, promotedSourceId, status }
SearchResult: { id, searchedAt, provider, query, url, title, snippet, publishedAt, domain, trusted, recent, promotedFindingId }
DocGitMatch: { id, matchedAt, gitChangeId, docChangeId, protocol, confidence, reasoning, lagSeconds }
Digest: { id, generatedAt, periodStart, periodEnd, kind, headline, summary, changeCount, findingCount, searchCount, matchCount, model, error }

Timestamps are epoch milliseconds. All responses are JSON. CORS is open by default.

REST API

Health + meta

GET /health

{"ok": true, "time": 1765000000000, "trackerStartEpoch": 1764000000000, "uptime_ms": 1000000, "version": "0.2.0"}

Sources

GET    /sources?active=true&protocol=UCP&kind=github_commits
POST   /sources                  # body: {name, url, kind, protocol?, tags?, checkIntervalHours?, config?}
GET    /sources/:id
PATCH  /sources/:id               # partial update
DELETE /sources/:id
POST   /sources/:id/check         # fetch + snapshot + diff immediately
GET    /sources/:id/snapshots?limit=50
POST   /sources/:id/wayback       # body: {limit?, fromYear?} — seed history from archive.org

Snapshots

GET /snapshots/:id?content=true|false
GET /snapshots/:id/diff?from=<other_snapshot_id>

Changes

GET  /changes?limit=100&sourceId=<id>&severity=breaking&protocol=UCP&since=<epoch_ms>&until=<epoch_ms>
GET  /changes.rss?limit=50&protocol=UCP&severity=major    # RSS 2.0 feed
GET  /changes/:id
POST /changes/:id/summarize              # run LLM summary on one change
POST /changes/summarize-pending          # body: {limit?}  — batch
POST /changes/:id/redispatch             # re-fire alerts for a change

Each change response includes joined source metadata (source.name, source.url, source.protocol, source.kind) so consumers don't need an N+1 lookup. Responses also include a cursor object with latest / earliest / next_since for clean incremental polling — just use ?since={next_since} on the next request.

Severity can be minor (rule-level: ≤50 diff lines and no major keywords), major (>50 diff lines OR contains breaking/deprecated/version/removed/renamed/required/changelog/migration keywords), or flipflop (a revert — skipped by alerting). LLM-level severity is independent: breaking / additive / editorial. Filter on either with ?severity=.

Subscribers (alerting)

GET    /subscribers?active=true
POST   /subscribers
GET    /subscribers/:id
PATCH  /subscribers/:id
DELETE /subscribers/:id
GET    /deliveries?subscriberId=<id>&changeId=<id>&status=success|failed&limit=100

Create a webhook subscriber:

curl -X POST http://localhost:3000/subscribers \
  -H "authorization: Bearer $ADMIN_TOKEN" \
  -H "content-type: application/json" \
  -d '{
    "kind": "webhook",
    "url": "https://your-service.com/hooks/tracker",
    "name": "prod alerting",
    "filterProtocol": "UCP",
    "filterSeverity": "breaking",
    "filterMinChangeLines": 20,
    "secret": "your-shared-secret"
  }'

Webhook payloads are signed when secret is set. Headers on each delivery:

Content-Type: application/json
User-Agent: protocol-tracker/0.2
X-Tracker-Change-Id: <change id>
X-Tracker-Signature: sha256=<hmac-sha256 of body using secret>

Body:

{
  "event": "change.detected",
  "change": {
    "id": "…",
    "source_id": "…",
    "detected_at": "2026-04-11T08:24:46.951Z",
    "diff_summary": "+203 -0 lines",
    "additions": 203,
    "deletions": 0,
    "severity": "major",
    "llm_severity": "additive",
    "llm_category": "schema",
    "llm_summary": "ACP added new discount extension with …",
    "llm_highlights": ["new field x", "new field y"]
  },
  "source": { "id": "…", "name": "MCP Spec (latest)", "url": "https://…", "protocol": "MCP" }
}

Slack and Discord subscribers format the same data as Slack text messages and Discord embeds respectively (post them to incoming-webhook URLs directly).

Filter fields on a subscriber:

filterProtocol — exact match against source.protocol
filterSeverity — matches either rule severity or llm_severity
filterMinChangeLines — minimum additions + deletions
Flipflops are always suppressed (they're revert noise).

Findings (discovery inbox)

GET  /findings?status=pending&classification=new_protocol&limit=100
GET  /findings/:id
POST /findings/:id/classify              # LLM classify one finding
POST /findings/:id/promote               # body: {name?, kind?, protocol?, tags?, checkIntervalHours?}
POST /findings/:id/dismiss

Discovery scans

POST /discovery/scan                     # {kinds?: ["github","hn"], queries?: [...]}
POST /discovery/websearch                # {queries?: [...], providers?: ["brave","tavily","duckduckgo"]}
POST /discovery/classify                 # {limit?} — classify pending unclassified findings
GET  /discovery/scans?limit=50
GET  /discovery/websearch/results?trusted=true&recent=true&provider=brave&limit=100

Digests

GET  /digests?limit=30
GET  /digests/latest
GET  /digests/:id
POST /digests/generate                   # {periodHours?} — write an ad-hoc digest

Doc↔git matches

GET  /matches?limit=100&protocol=UCP
POST /matches/run                        # run the matcher now

RAG search (FTS5)

GET /rag/search?q=checkout+state+machine&protocol=UCP&limit=20

Returns ranked excerpts of snapshot content with source context. Supports basic FTS5 syntax (AND, OR, "exact phrase", prefix*).

Admin

POST /admin/seed                         # re-run built-in seed (idempotent)
POST /admin/run/snapshot-cycle
POST /admin/run/discovery-cycle
POST /admin/run/websearch-cycle
POST /admin/run/digest-cycle
POST /admin/run/matcher
POST /admin/run/prune                    # run pruning cycle once
POST /admin/prune                        # {keepPerSource?: number}

Auth

If ADMIN_TOKEN is set in env, every non-GET route requires:

Authorization: Bearer <ADMIN_TOKEN>

GET routes remain open so the frontend and RSS readers can poll without credentials. Leave ADMIN_TOKEN unset for open mode (useful in dev / local / behind a trusted reverse proxy).

Example: end-to-end

# trigger a snapshot of every due source, with LLM summaries + matcher
curl -X POST http://localhost:3000/admin/run/snapshot-cycle

# see the feed of changes with LLM severity
curl "http://localhost:3000/changes?limit=20&severity=breaking" | jq

# search snapshot content
curl "http://localhost:3000/rag/search?q=checkout+mandate" | jq

# get the latest digest
curl http://localhost:3000/digests/latest | jq

MCP server (for Claude Code)

A separate stdio MCP server exposes every piece of tracker state to Claude Code so you can reason over it interactively.

Start it directly:

bun run mcp

Wire it to Claude Code:

claude mcp add protocol-tracker -- bun run /absolute/path/to/protocol-tracker/src/mcp-server.ts

Or add to ~/.claude.json:

{
  "mcpServers": {
    "protocol-tracker": {
      "command": "bun",
      "args": ["run", "/absolute/path/to/protocol-tracker/src/mcp-server.ts"],
      "env": {
        "DATABASE_PATH": "/absolute/path/to/protocol-tracker/data/tracker.db",
        "ANTHROPIC_API_KEY": "sk-ant-..."
      }
    }
  }
}

MCP tools exposed

Tool	Purpose
`list_sources`	List tracked sources; filter by protocol, active
`get_source`	Fetch one source
`list_changes`	Newest changes; filter by sourceId, severity, protocol
`get_change`	One change with LLM summary
`list_findings`	Discovery inbox
`list_digests`	Historical digests
`get_digest`	One digest, or latest if id omitted
`list_matches`	Doc↔git correlations
`search_content`	FTS5 RAG search across all snapshot content
`trigger_snapshot_cycle`	Run snapshot cycle now
`trigger_discovery_cycle`	Run GitHub + HN discovery now
`trigger_web_search`	Run Brave/Tavily/DuckDuckGo web search now
`trigger_matcher`	Run doc↔git matcher now
`trigger_digest`	Generate a digest now

Now in Claude Code you can ask "what broke in UCP this week?" and it will pull list_changes, search_content, and get_digest on its own.

Seeded sources (on first boot)

51 sources across 13 protocols covering the full agentic commerce ecosystem:

Protocol	Sources
UCP	homepage + overview + checkout spec (html), github_repo, github_commits (6h cadence), releases (rss), sitemap
ACP	homepage + changelog (6h cadence) + delegate-payment spec, github_repo, github_commits, releases (rss), Stripe ACP docs, sitemap
AP2	spec (html), github_repo, github_commits
MPP	Stripe MPP docs, Stripe Agentic Commerce Suite blog
x402	homepage, github_repo, github_commits, releases (rss), a2a-x402 git feed
WebMCP	W3C spec (html), github_commits
MCP	spec (html), github_commits, sitemap, MCP blog RSS
Visa TAP	developer docs, visa/mcp git feed, Visa Intelligent Commerce overview + dev portal, Cloudflare blog RSS
Mastercard Agent Pay	Agentic Commerce Framework, Verifiable Intent
Shopify	shopify.dev/docs/agents + /catalog + /checkout/mcp + /checkout/ecp, Shopify Engineering RSS
Firmly	homepage
Rye	homepage, blog
DaVinci / Wizard	homepages
Blogs	Stripe, Shopify Engineering, Cloudflare, nekuda Substack, MCP blog (all RSS)

Remove any with DELETE /sources/:id or add more via POST /sources.

Scheduler cadence

Job	Default cadence	Controlled by
Snapshot cycle	hourly (skips sources not yet due)	`CHECK_INTERVAL_MINUTES`
Per-source cadence	24h (6h for changelog-style sources)	`checkIntervalHours` on each source
LLM summarize changes	immediately after each cycle that produced changes	automatic
Doc↔git matcher	immediately after summarize	automatic
Alert dispatch	immediately after summarize — fires to every matching subscriber	automatic
Web search	2× daily at 08:00 and 20:00 UTC	`SEARCH_CRON_HOURS_UTC=8,20`
Daily digest	1× daily at 21:00 UTC	`DIGEST_CRON_HOUR_UTC=21`
Snapshot pruning	1× daily at 03:00 UTC	`PRUNE_CRON_HOUR_UTC=3`
GitHub + HN discovery	adhoc (`POST /admin/run/discovery-cycle`)	n/a

Cron slots fire at most once per UTC day per slot (tracked in the meta table). Pruning keeps the N most-recent snapshots per source (SNAPSHOT_KEEP_PER_SOURCE=50) and never deletes rows referenced by change from_snapshot_id / to_snapshot_id, so the diff chain remains intact forever.

Recency / trust filtering

We persist a tracker_start_epoch on first boot. The web search pipeline flags a result as:

trusted = true if its domain matches the allowlist in src/discovery/trust.ts (UCP/ACP/AP2/MCP authorities, primary vendors, research bodies).
recent = true if publishedAt >= tracker_start_epoch — by definition, new stuff published after the tracker was turned on.

The daily digest prioritizes results where trusted OR recent.

Smart normalization

Before hashing/diffing, content is normalized by src/normalize.ts:

LF line endings
trailing whitespace stripped
3+ blank lines collapsed to 2
common boilerplate (on-this-page, copyright, "was this helpful?") stripped
cache-busting query strings (?v=, ?cb=, ?t=) removed from URLs
ISO-8601 timestamps collapsed to <ts>
stableJsonHash / stableJsonPretty helpers for OpenAPI/OpenRPC schemas

This eliminates ~90% of false-positive churn.

Wayback bootstrap

When you add a new HTML source with no history, seed a baseline from archive.org:

curl -X POST http://localhost:3000/sources/<id>/wayback \
  -H "content-type: application/json" \
  -d '{"limit": 5, "fromYear": 2024}'

Fetches N unique-digest captures between fromYear and now, normalizes them, and inserts one snapshot per capture. Instant retroactive history.

Configuration

Env	Default	Purpose
`PORT`	`3000`	HTTP port
`DATABASE_PATH`	`./data/tracker.db`	SQLite path
`USER_AGENT`	`protocol-tracker/0.2`	outbound UA
`CHECK_INTERVAL_MINUTES`	`60`	scheduler wake-up interval
`DISABLE_SCHEDULER`	`0`	`1` = HTTP only, no background jobs
`SEARCH_CRON_HOURS_UTC`	`8,20`	web search fire times
`DIGEST_CRON_HOUR_UTC`	`21`	digest fire time
`PRUNE_CRON_HOUR_UTC`	`3`	snapshot pruning fire time
`SNAPSHOT_KEEP_PER_SOURCE`	`50`	how many recent snapshots to keep per source during pruning
`ADMIN_TOKEN`	(unset)	bearer token required on non-GET routes when set
`ANTHROPIC_API_KEY`	(unset)	enables LLM summarization, classification, digest
`GITHUB_TOKEN`	(unset)	raises GitHub API limit 10→30 req/min
`GITHUB_QUERY_WINDOW_DAYS`	`90`	rolling window for GitHub discovery queries
`BRAVE_SEARCH_API_KEY`	(unset)	enables Brave web search
`TAVILY_API_KEY`	(unset)	enables Tavily web search
`HEADLESS`	`true`	Playwright headless mode
`BROWSER_CHANNEL`	(unset)	Playwright channel (`chrome`, `msedge`)

Operational notes

The database is a single SQLite file at $DATABASE_PATH. Back it up by copying it.
Full normalized content is stored in every snapshot. At default cadence (35 sources × once a day) expect ~2–10 MB/day. Prune via SQL if it grows too large.
Without an ANTHROPIC_API_KEY, the LLM pipelines degrade gracefully: changes still detect, findings still classify as unclassified, the daily digest writes a simple fallback summary.
Without web search keys, only DuckDuckGo runs. It's best-effort and may miss results.
The scheduler is idempotent per UTC day per slot — restarting the process mid-day won't re-fire a slot that already ran.
Cold-start: on first boot, the snapshot cycle runs immediately so you get populated data without waiting an hour.

Design references

See docs/competitors.md in the parent repo for the competitor research this was built from — nekuda Protocol Scout (closest direct competitor, likely git-first), changedetection.io (general-purpose reference), Visualping (LLM summarization reference), Fluxguard (DOM-diff reference).

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
src		src
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package.json		package.json
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

protocol-tracker

Stack

Quickstart

Optional: Playwright (for JS-rendered sources)

Architecture

Source kinds

Data model

REST API

Health + meta

Sources

Snapshots

Changes

Subscribers (alerting)

Findings (discovery inbox)

Discovery scans

Digests

Doc↔git matches

RAG search (FTS5)

Admin

Auth

Example: end-to-end

MCP server (for Claude Code)

MCP tools exposed

Seeded sources (on first boot)

Scheduler cadence

Recency / trust filtering

Smart normalization

Wayback bootstrap

Configuration

Operational notes

Design references

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages