Edge-native, accountless, pay-per-use AI inference using Cloudflare Durable Objects + the x402 payment protocol.
Flow: Client → POST /infer → 402 + PAYMENT-REQUIRED header → client pays on-chain (USDC on Base) → retries with PAYMENT-SIGNATURE header → streamed AI output.
No signup. No API key. No custody of funds.
# Authenticate with Cloudflare (required for Workers AI, even in local dev)
npx wrangler loginWorkers AI calls are remote even during local development — you need a Cloudflare account with Workers AI access.
npm install
npx wrangler dev # starts on http://localhost:8787Ensure .dev.vars contains:
PAYMENT_ADDRESS=0x...
BASE_RPC_URL=https://mainnet.base.org
MOCK_PAYMENTS=true
SESSION_SECRET=<any-hex-string>
ADMIN_SECRET=dev-admin-secret-for-local-testing
Run unit tests (no server needed):
npm testRun the 9-phase end-to-end test (requires running dev server):
npm run test:e2eThe E2E test generates a random wallet, signs SIWE messages programmatically using @noble/curves, authenticates, deposits with a mock proof, runs inference, and verifies balance/history.
# Set production secrets (use wrangler secret, not vars)
npx wrangler secret put PAYMENT_ADDRESS # your USDC-receiving wallet
npx wrangler secret put BASE_RPC_URL # e.g. https://mainnet.base.org
npx wrangler secret put SESSION_SECRET # generate with: openssl rand -hex 32
npx wrangler secret put ADMIN_SECRET # generate with: openssl rand -hex 32
npx wrangler deploy
# Verify
curl https://<worker>.workers.dev/healthAuthenticated endpoints accept either an Authorization: Bearer <token> header or an ig_session HttpOnly cookie (both obtained via SIWE login).
| Endpoint | Description |
|---|---|
GET /health |
Liveness probe |
GET /payment-info |
Payment address + network details |
| Endpoint | Description |
|---|---|
GET /auth/nonce?wallet=0x... |
Generate one-time nonce |
POST /auth/login |
Verify SIWE signature → session cookie + JWT |
POST /auth/logout |
Clear session cookie |
| Endpoint | Description |
|---|---|
POST /infer |
Run inference (post-billed from balance, SSE stream). Accepts optional systemPrompt (max 2000 chars) for persistent instructions. |
POST /deposit |
Top-up balance with payment proof |
GET /balance |
Token balance + usage stats |
GET /history |
Conversation messages + metadata |
DELETE /history |
Clear conversation |
POST /documents |
Upload document for RAG (embedding cost deducted) |
GET /documents |
List uploaded documents |
DELETE /documents/:id |
Delete document + Vectorize embeddings |
POST /documents/reindex |
Re-upsert all document vectors (fixes metadata indexing) |
| Endpoint | Description |
|---|---|
GET /admin/wallets |
Paginated list of registered wallets |
GET /admin/wallets/:wallet/status |
Detailed wallet status (balance, usage, documents) |
GET /admin/stats |
Aggregate statistics (total wallets) |
GET /admin/stale |
Identify zero-balance inactive wallets |
x402-compatible clients can skip the nonce/login flow and pass a SIGN-IN-WITH-X header on POST /infer for stateless, single-request wallet authentication. The 402 response advertises supported chains via a sign-in-with-x extension.
dox402 publishes machine-readable specs so AI agents can discover and use the API without out-of-band configuration.
| Endpoint | Description |
|---|---|
GET /openapi.json |
OpenAPI 3.1 specification of every REST endpoint |
GET /SKILL.md |
Agent-readable usage guide |
GET /.well-known/agent.json |
A2A agent card (identity, transport, skill) |
GET /.well-known/agents.json |
Multi-step flows (login, infer, balance, etc.) |
GET /.well-known/api-catalog |
RFC 9727 linkset (application/linkset+json) pointing to OpenAPI / SKILL.md / health |
GET /.well-known/agent-skills/index.json |
Cloudflare Agent Skills Discovery v0.2.0 index |
GET /robots.txt |
Crawl rules + AI bot policies + Content Signals (ai-train=no, search=yes, ai-input=yes) |
GET /sitemap.xml |
Canonical URLs |
Accept: text/markdown→ returnsSKILL.mdwithContent-Type: text/markdown; charset=utf-8andx-markdown-tokensheader (Cloudflare Markdown for Agents).Accept: text/html→ HTML response includes RFC 8288Linkheaders advertisingopenapi.json,agent.json,agents.json, andSKILL.md.
When the homepage loads in a WebMCP-capable browser, it registers 5 tools via navigator.modelContext.registerTool(): connect_wallet, get_balance, send_inference, view_history, open_deposit_ui. No tool auto-spends — deposits still require an explicit user wallet signature.
See public/.well-known/README.md for the rationale on /.well-known/openid-configuration, /.well-known/oauth-authorization-server, /.well-known/oauth-protected-resource, /.well-known/http-message-signatures-directory, /.well-known/mcp/server-card.json, /.well-known/ucp, and /.well-known/acp.json — none of which fit dox402's architecture (wallet-signed auth, per-call API monetization, no MCP endpoint, no product catalog).
| Model | Context Window | Speed |
|---|---|---|
| Llama 3.1 8B | 7,968 tokens | Fast |
| Llama 3.3 70B | 24,000 tokens | Medium |
| Gemma 3 12B | 8,000 tokens | Fast |
| Mistral 7B | 8,000 tokens | Fast |
| DeepSeek R1 32B | 80,000 tokens | Medium |
Total input (prompt + conversation history + RAG file context) is validated against the selected model's context window before inference. Requests exceeding the limit receive a 413 error.
| Header | Direction | Content |
|---|---|---|
PAYMENT-REQUIRED |
Server → Client | base64-encoded PaymentRequired JSON |
PAYMENT-SIGNATURE |
Client → Server | base64-encoded PaymentProof JSON |
- Durable Object (
InferenceGate): one instance per wallet address with embedded SQLite storage. Holds token balance, conversation history, rate limits, and replay-prevention data. All balance updates happen insidestorage.transactionSync()to prevent race conditions. New DOs are co-located in Eastern North America (locationHint: 'enam') to minimize latency to Base chain RPC providers. - Worker (
index.ts): validates wallet address format, authenticates via SIWE session tokens or SIWX headers, routes to the correct DO instance via typed RPC stubs. - KV Registry (
WALLET_REGISTRY): global index of all active wallet DO instances, updated on first use via fire-and-forgetctx.waitUntil()calls. Powers the admin endpoints for fleet visibility. - Replay prevention: each payment hash stored in the
seen_transactionsSQL table with automatic 1-hour TTL cleanup via DO alarms. - Authentication: SIWE (EIP-4361) proves wallet ownership; HMAC-SHA256 JWTs delivered as HttpOnly cookies (browser) or Bearer tokens (API). SIWX single-request auth also supported for x402 clients.
- Verification: Tier 1 structural checks + Tier 2 on-chain RPC receipt verification via
eth_getTransactionReceipt. Grace mode provides provisional tokens when RPC is unreachable, with automatic alarm-based re-verification. - Streaming: SSE responses with heartbeat keepalive (
:keepalivecomments every 15s of inactivity) and a 2-minute max-duration guard to prevent runaway streams. Backpressure is handled naturally viaawait writer.write(). - Billing safeguards: failed AI responses (empty, error JSON, stream errors) are detected and not billed — tokens are only deducted for successful inference.
- RAG (Retrieval-Augmented Generation): per-wallet document knowledge base powered by Cloudflare Vectorize and Workers AI embeddings (
bge-base-en-v1.5). Documents are chunked (1600 chars, 200 char overlap), embedded, and stored in a shared Vectorize index with per-wallet metadata filtering. Opt-in viauseRag: trueon/infer— relevant chunks are retrieved (top-5, cosine similarity ≥ 0.45) and injected as system context. Total input (prompt + history + RAG context) is validated against each model's context window. RAG failure is non-fatal.