Skip to content

tryrankly/protocol-tracker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

protocol-tracker

A TypeScript backend that tracks the agentic commerce protocol ecosystem 24/7 and fires alerts when anything changes.

Four pipelines:

  1. Snapshot tracker — fetches ~50 protocol sources (UCP, ACP, AP2, x402, WebMCP, MCP, MPP, Visa TAP, Mastercard Agent Pay, Shopify, Firmly, Rye, DaVinci, Wizard) across seven source kinds (html / html_js / markdown / github_repo / github_commits / rss / sitemap), normalizes content, hashes, and records diffs when anything changes. Flip-flop reverts (A→B→A) are detected and suppressed. Severity is upgraded to major on line-count or keyword match (breaking/deprecated/version/etc). LLM-summarized with llm_severity (breaking/additive/editorial) and llm_category (schema/docs/example/meta/release).
  2. Discovery scanner — searches GitHub (rolling 90-day window, with rate-limit backoff), Hacker News, and the general web (Brave / Tavily / DuckDuckGo) for new or updated agentic-commerce protocols. Trust-filtered (domain allowlist) and recency-filtered (published after tracker start). Claude Haiku classifies findings.
  3. Alerting — webhook / Slack / Discord subscribers with per-protocol / per-severity / per-minimum-line-count filters. HMAC-signed webhook deliveries. Every detected change fires to matching subscribers automatically, and every delivery is logged with retry-on-failure semantics.
  4. Daily digest — aggregates changes, findings, matches, and trusted searches into a narrative markdown digest written by Claude Sonnet.

Plus doc↔git correlation — matches git commits in upstream spec repos to downstream doc-site changes, so you can see which commits landed in the official docs and which are still floating.

The whole thing runs as a single long-lived Bun process: snapshot cycle hourly, web search twice a day (UTC), daily digest once a day (UTC), snapshot pruning once a day (UTC). Exposes a REST API for frontends, an RSS feed for readers, and an MCP server for Claude Code.


Stack

  • Runtime: Bun
  • Framework: Hono
  • Database: SQLite (bun:sqlite) with FTS5 virtual table for RAG search
  • HTML→markdown: node-html-markdown
  • Headless browser: optional playwright (install on demand)
  • LLM: @anthropic-ai/sdk — Haiku for diff summarization + classification, Sonnet for daily digest
  • MCP: @modelcontextprotocol/sdk — stdio transport

Quickstart

cd protocol-tracker
cp .env.example .env
# (optional) add ANTHROPIC_API_KEY, GITHUB_TOKEN, BRAVE_SEARCH_API_KEY, TAVILY_API_KEY
bun install
bun run dev     # or: bun run start

On first start it seeds ~35 protocol sources, initializes the tracker start epoch, and launches the scheduler. API is live on http://localhost:3000.

Optional: Playwright (for JS-rendered sources)

The html_js source kind uses Playwright. It's an optional dep — install if you need it:

bun add playwright
bunx playwright install chromium

For stealthier fetching on Linux, run inside xvfb-run:

HEADLESS=false xvfb-run -a bun run src/index.ts

Or use Camoufox via the BROWSER_CHANNEL env var.


Architecture

                    ┌─────────────────────────────┐
                    │       Hono REST API         │
                    │  + MCP stdio server         │
                    └──────────────┬──────────────┘
                                   │
┌──────────────────────────────────▼──────────────────────────────────┐
│                         bun:sqlite                                  │
│  sources · snapshots · changes · findings · scans                   │
│  digests · doc_git_matches · search_results · meta · snapshots_fts  │
└──────────▲──────────────────────────────────────────────────────────┘
           │
┌──────────┴──────────────────────────────────────────────────────────┐
│                         Scheduler (24/7)                            │
│                                                                     │
│  HOURLY  snapshot cycle                                             │
│     └─► fetchers: html · html_js · markdown · github_repo           │
│             · github_commits · rss · sitemap                        │
│     └─► smart normalize → hash → diff → change                      │
│     └─► LLM summarize   → severity + category + highlights          │
│     └─► doc↔git matcher → correlate commits with doc changes        │
│                                                                     │
│  8Z, 20Z  web search cycle                                          │
│     └─► providers: brave · tavily · duckduckgo                      │
│     └─► trust filter (domain allowlist)                             │
│     └─► recency filter (published > tracker_start_epoch)            │
│                                                                     │
│  21Z  daily digest                                                  │
│     └─► LLM (Sonnet) writes markdown over last 24h                  │
│                                                                     │
│  ADHOC  discovery scan (GitHub + HN)                                │
│     └─► findings inbox → classify → promote or dismiss              │
└─────────────────────────────────────────────────────────────────────┘

Source kinds

Kind Fetches Normalization Best for
html plain HTTP → markdown via node-html-markdown strip nav/footer, collapse whitespace static spec sites, docs
html_js Playwright (headless or headful via xvfb) same as html SPAs, JS-rendered docs
markdown plain HTTP, treat body as markdown normalize whitespace raw .md files in repos
github_repo GitHub API (repo meta + README + 5 commits/releases) included slow-moving repos where README is the artifact
github_commits GitHub API (15 commits + 15 releases + 15 tags + 15 merged PRs) structured feed git-first tracking of spec repos — this is the Scout-style channel
rss RSS/Atom parse (dependency-free) item list as markdown company blogs, release feeds, Substack
sitemap /sitemap.xml (follows sitemapindex) sorted URL list auto-discover new spec pages

Data model

See src/types.ts for the full types. Highlights:

  • Source: { id, name, url, kind, protocol, tags[], checkIntervalHours, config, lastCheckedAt, active, createdAt }
  • Snapshot: { id, sourceId, fetchedAt, status, contentHash, content, rawBytes, error, metadata }
  • Change: { id, sourceId, fromSnapshotId, toSnapshotId, detectedAt, additions, deletions, severity, llmSummary, llmSeverity, llmCategory, llmHighlights }
  • Finding: { id, discoveredAt, sourceType, url, title, description, signals, classification, confidence, reasoning, promotedSourceId, status }
  • SearchResult: { id, searchedAt, provider, query, url, title, snippet, publishedAt, domain, trusted, recent, promotedFindingId }
  • DocGitMatch: { id, matchedAt, gitChangeId, docChangeId, protocol, confidence, reasoning, lagSeconds }
  • Digest: { id, generatedAt, periodStart, periodEnd, kind, headline, summary, changeCount, findingCount, searchCount, matchCount, model, error }

Timestamps are epoch milliseconds. All responses are JSON. CORS is open by default.


REST API

Health + meta

GET /health
{"ok": true, "time": 1765000000000, "trackerStartEpoch": 1764000000000, "uptime_ms": 1000000, "version": "0.2.0"}

Sources

GET    /sources?active=true&protocol=UCP&kind=github_commits
POST   /sources                  # body: {name, url, kind, protocol?, tags?, checkIntervalHours?, config?}
GET    /sources/:id
PATCH  /sources/:id               # partial update
DELETE /sources/:id
POST   /sources/:id/check         # fetch + snapshot + diff immediately
GET    /sources/:id/snapshots?limit=50
POST   /sources/:id/wayback       # body: {limit?, fromYear?} — seed history from archive.org

Snapshots

GET /snapshots/:id?content=true|false
GET /snapshots/:id/diff?from=<other_snapshot_id>

Changes

GET  /changes?limit=100&sourceId=<id>&severity=breaking&protocol=UCP&since=<epoch_ms>&until=<epoch_ms>
GET  /changes.rss?limit=50&protocol=UCP&severity=major    # RSS 2.0 feed
GET  /changes/:id
POST /changes/:id/summarize              # run LLM summary on one change
POST /changes/summarize-pending          # body: {limit?}  — batch
POST /changes/:id/redispatch             # re-fire alerts for a change

Each change response includes joined source metadata (source.name, source.url, source.protocol, source.kind) so consumers don't need an N+1 lookup. Responses also include a cursor object with latest / earliest / next_since for clean incremental polling — just use ?since={next_since} on the next request.

Severity can be minor (rule-level: ≤50 diff lines and no major keywords), major (>50 diff lines OR contains breaking/deprecated/version/removed/renamed/required/changelog/migration keywords), or flipflop (a revert — skipped by alerting). LLM-level severity is independent: breaking / additive / editorial. Filter on either with ?severity=.

Subscribers (alerting)

GET    /subscribers?active=true
POST   /subscribers
GET    /subscribers/:id
PATCH  /subscribers/:id
DELETE /subscribers/:id
GET    /deliveries?subscriberId=<id>&changeId=<id>&status=success|failed&limit=100

Create a webhook subscriber:

curl -X POST http://localhost:3000/subscribers \
  -H "authorization: Bearer $ADMIN_TOKEN" \
  -H "content-type: application/json" \
  -d '{
    "kind": "webhook",
    "url": "https://your-service.com/hooks/tracker",
    "name": "prod alerting",
    "filterProtocol": "UCP",
    "filterSeverity": "breaking",
    "filterMinChangeLines": 20,
    "secret": "your-shared-secret"
  }'

Webhook payloads are signed when secret is set. Headers on each delivery:

Content-Type: application/json
User-Agent: protocol-tracker/0.2
X-Tracker-Change-Id: <change id>
X-Tracker-Signature: sha256=<hmac-sha256 of body using secret>

Body:

{
  "event": "change.detected",
  "change": {
    "id": "",
    "source_id": "",
    "detected_at": "2026-04-11T08:24:46.951Z",
    "diff_summary": "+203 -0 lines",
    "additions": 203,
    "deletions": 0,
    "severity": "major",
    "llm_severity": "additive",
    "llm_category": "schema",
    "llm_summary": "ACP added new discount extension with …",
    "llm_highlights": ["new field x", "new field y"]
  },
  "source": { "id": "", "name": "MCP Spec (latest)", "url": "https://…", "protocol": "MCP" }
}

Slack and Discord subscribers format the same data as Slack text messages and Discord embeds respectively (post them to incoming-webhook URLs directly).

Filter fields on a subscriber:

  • filterProtocol — exact match against source.protocol
  • filterSeverity — matches either rule severity or llm_severity
  • filterMinChangeLines — minimum additions + deletions
  • Flipflops are always suppressed (they're revert noise).

Findings (discovery inbox)

GET  /findings?status=pending&classification=new_protocol&limit=100
GET  /findings/:id
POST /findings/:id/classify              # LLM classify one finding
POST /findings/:id/promote               # body: {name?, kind?, protocol?, tags?, checkIntervalHours?}
POST /findings/:id/dismiss

Discovery scans

POST /discovery/scan                     # {kinds?: ["github","hn"], queries?: [...]}
POST /discovery/websearch                # {queries?: [...], providers?: ["brave","tavily","duckduckgo"]}
POST /discovery/classify                 # {limit?} — classify pending unclassified findings
GET  /discovery/scans?limit=50
GET  /discovery/websearch/results?trusted=true&recent=true&provider=brave&limit=100

Digests

GET  /digests?limit=30
GET  /digests/latest
GET  /digests/:id
POST /digests/generate                   # {periodHours?} — write an ad-hoc digest

Doc↔git matches

GET  /matches?limit=100&protocol=UCP
POST /matches/run                        # run the matcher now

RAG search (FTS5)

GET /rag/search?q=checkout+state+machine&protocol=UCP&limit=20

Returns ranked excerpts of snapshot content with source context. Supports basic FTS5 syntax (AND, OR, "exact phrase", prefix*).

Admin

POST /admin/seed                         # re-run built-in seed (idempotent)
POST /admin/run/snapshot-cycle
POST /admin/run/discovery-cycle
POST /admin/run/websearch-cycle
POST /admin/run/digest-cycle
POST /admin/run/matcher
POST /admin/run/prune                    # run pruning cycle once
POST /admin/prune                        # {keepPerSource?: number}

Auth

If ADMIN_TOKEN is set in env, every non-GET route requires:

Authorization: Bearer <ADMIN_TOKEN>

GET routes remain open so the frontend and RSS readers can poll without credentials. Leave ADMIN_TOKEN unset for open mode (useful in dev / local / behind a trusted reverse proxy).

Example: end-to-end

# trigger a snapshot of every due source, with LLM summaries + matcher
curl -X POST http://localhost:3000/admin/run/snapshot-cycle

# see the feed of changes with LLM severity
curl "http://localhost:3000/changes?limit=20&severity=breaking" | jq

# search snapshot content
curl "http://localhost:3000/rag/search?q=checkout+mandate" | jq

# get the latest digest
curl http://localhost:3000/digests/latest | jq

MCP server (for Claude Code)

A separate stdio MCP server exposes every piece of tracker state to Claude Code so you can reason over it interactively.

Start it directly:

bun run mcp

Wire it to Claude Code:

claude mcp add protocol-tracker -- bun run /absolute/path/to/protocol-tracker/src/mcp-server.ts

Or add to ~/.claude.json:

{
  "mcpServers": {
    "protocol-tracker": {
      "command": "bun",
      "args": ["run", "/absolute/path/to/protocol-tracker/src/mcp-server.ts"],
      "env": {
        "DATABASE_PATH": "/absolute/path/to/protocol-tracker/data/tracker.db",
        "ANTHROPIC_API_KEY": "sk-ant-..."
      }
    }
  }
}

MCP tools exposed

Tool Purpose
list_sources List tracked sources; filter by protocol, active
get_source Fetch one source
list_changes Newest changes; filter by sourceId, severity, protocol
get_change One change with LLM summary
list_findings Discovery inbox
list_digests Historical digests
get_digest One digest, or latest if id omitted
list_matches Doc↔git correlations
search_content FTS5 RAG search across all snapshot content
trigger_snapshot_cycle Run snapshot cycle now
trigger_discovery_cycle Run GitHub + HN discovery now
trigger_web_search Run Brave/Tavily/DuckDuckGo web search now
trigger_matcher Run doc↔git matcher now
trigger_digest Generate a digest now

Now in Claude Code you can ask "what broke in UCP this week?" and it will pull list_changes, search_content, and get_digest on its own.


Seeded sources (on first boot)

51 sources across 13 protocols covering the full agentic commerce ecosystem:

Protocol Sources
UCP homepage + overview + checkout spec (html), github_repo, github_commits (6h cadence), releases (rss), sitemap
ACP homepage + changelog (6h cadence) + delegate-payment spec, github_repo, github_commits, releases (rss), Stripe ACP docs, sitemap
AP2 spec (html), github_repo, github_commits
MPP Stripe MPP docs, Stripe Agentic Commerce Suite blog
x402 homepage, github_repo, github_commits, releases (rss), a2a-x402 git feed
WebMCP W3C spec (html), github_commits
MCP spec (html), github_commits, sitemap, MCP blog RSS
Visa TAP developer docs, visa/mcp git feed, Visa Intelligent Commerce overview + dev portal, Cloudflare blog RSS
Mastercard Agent Pay Agentic Commerce Framework, Verifiable Intent
Shopify shopify.dev/docs/agents + /catalog + /checkout/mcp + /checkout/ecp, Shopify Engineering RSS
Firmly homepage
Rye homepage, blog
DaVinci / Wizard homepages
Blogs Stripe, Shopify Engineering, Cloudflare, nekuda Substack, MCP blog (all RSS)

Remove any with DELETE /sources/:id or add more via POST /sources.


Scheduler cadence

Job Default cadence Controlled by
Snapshot cycle hourly (skips sources not yet due) CHECK_INTERVAL_MINUTES
Per-source cadence 24h (6h for changelog-style sources) checkIntervalHours on each source
LLM summarize changes immediately after each cycle that produced changes automatic
Doc↔git matcher immediately after summarize automatic
Alert dispatch immediately after summarize — fires to every matching subscriber automatic
Web search 2× daily at 08:00 and 20:00 UTC SEARCH_CRON_HOURS_UTC=8,20
Daily digest 1× daily at 21:00 UTC DIGEST_CRON_HOUR_UTC=21
Snapshot pruning 1× daily at 03:00 UTC PRUNE_CRON_HOUR_UTC=3
GitHub + HN discovery adhoc (POST /admin/run/discovery-cycle) n/a

Cron slots fire at most once per UTC day per slot (tracked in the meta table). Pruning keeps the N most-recent snapshots per source (SNAPSHOT_KEEP_PER_SOURCE=50) and never deletes rows referenced by change from_snapshot_id / to_snapshot_id, so the diff chain remains intact forever.


Recency / trust filtering

We persist a tracker_start_epoch on first boot. The web search pipeline flags a result as:

  • trusted = true if its domain matches the allowlist in src/discovery/trust.ts (UCP/ACP/AP2/MCP authorities, primary vendors, research bodies).
  • recent = true if publishedAt >= tracker_start_epoch — by definition, new stuff published after the tracker was turned on.

The daily digest prioritizes results where trusted OR recent.

Smart normalization

Before hashing/diffing, content is normalized by src/normalize.ts:

  • LF line endings
  • trailing whitespace stripped
  • 3+ blank lines collapsed to 2
  • common boilerplate (on-this-page, copyright, "was this helpful?") stripped
  • cache-busting query strings (?v=, ?cb=, ?t=) removed from URLs
  • ISO-8601 timestamps collapsed to <ts>
  • stableJsonHash / stableJsonPretty helpers for OpenAPI/OpenRPC schemas

This eliminates ~90% of false-positive churn.

Wayback bootstrap

When you add a new HTML source with no history, seed a baseline from archive.org:

curl -X POST http://localhost:3000/sources/<id>/wayback \
  -H "content-type: application/json" \
  -d '{"limit": 5, "fromYear": 2024}'

Fetches N unique-digest captures between fromYear and now, normalizes them, and inserts one snapshot per capture. Instant retroactive history.


Configuration

Env Default Purpose
PORT 3000 HTTP port
DATABASE_PATH ./data/tracker.db SQLite path
USER_AGENT protocol-tracker/0.2 outbound UA
CHECK_INTERVAL_MINUTES 60 scheduler wake-up interval
DISABLE_SCHEDULER 0 1 = HTTP only, no background jobs
SEARCH_CRON_HOURS_UTC 8,20 web search fire times
DIGEST_CRON_HOUR_UTC 21 digest fire time
PRUNE_CRON_HOUR_UTC 3 snapshot pruning fire time
SNAPSHOT_KEEP_PER_SOURCE 50 how many recent snapshots to keep per source during pruning
ADMIN_TOKEN (unset) bearer token required on non-GET routes when set
ANTHROPIC_API_KEY (unset) enables LLM summarization, classification, digest
GITHUB_TOKEN (unset) raises GitHub API limit 10→30 req/min
GITHUB_QUERY_WINDOW_DAYS 90 rolling window for GitHub discovery queries
BRAVE_SEARCH_API_KEY (unset) enables Brave web search
TAVILY_API_KEY (unset) enables Tavily web search
HEADLESS true Playwright headless mode
BROWSER_CHANNEL (unset) Playwright channel (chrome, msedge)

Operational notes

  • The database is a single SQLite file at $DATABASE_PATH. Back it up by copying it.
  • Full normalized content is stored in every snapshot. At default cadence (35 sources × once a day) expect ~2–10 MB/day. Prune via SQL if it grows too large.
  • Without an ANTHROPIC_API_KEY, the LLM pipelines degrade gracefully: changes still detect, findings still classify as unclassified, the daily digest writes a simple fallback summary.
  • Without web search keys, only DuckDuckGo runs. It's best-effort and may miss results.
  • The scheduler is idempotent per UTC day per slot — restarting the process mid-day won't re-fire a slot that already ran.
  • Cold-start: on first boot, the snapshot cycle runs immediately so you get populated data without waiting an hour.

Design references

See docs/competitors.md in the parent repo for the competitor research this was built from — nekuda Protocol Scout (closest direct competitor, likely git-first), changedetection.io (general-purpose reference), Visualping (LLM summarization reference), Fluxguard (DOM-diff reference).

About

This project tracks updates in agentic commerce protocol across different platform

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors