A small Node.js + TypeScript service that simulates a usage-based LLM API platform: deployment lifecycle, authenticated request handling with metering, and aggregated usage/billing.
Stack: Express, TypeScript, MongoDB (Mongoose), Redis (ioredis), Docker Compose, Vitest.
docker compose up --buildThis starts MongoDB, Redis, and the API on port 3000. Wait for screener api listening on :3000.
docker-compose.yml does not publish mongo/redis to the host by default (avoids conflicts with locally-installed instances). For Option B, either:
- Use your existing local mongo + redis on default ports, or
- Uncomment the
ports:lines undermongoandredisindocker-compose.yml, then:
docker compose up -d mongo redis
npm install
cp .env.example .env # optional, defaults are fine
npm run devnpm testTests use mongodb-memory-server and ioredis-mock, so they run hermetically without docker.
# 1. Create a deployment
curl -s -X POST http://localhost:3000/deployments \
-H 'content-type: application/json' \
-d '{"model":"model-a"}'
# -> { "deployment_id": "...", "status": "provisioning" }
# 2. Wait ~10s, then fetch — status flips to "ready" and api_key/endpoint_url appear
curl -s http://localhost:3000/deployments/<id>
# 3. Call the completion endpoint
curl -s -X POST http://localhost:3000/v1/<id>/completions \
-H "Authorization: Bearer <api_key>" \
-H "content-type: application/json" \
-d '{"prompt":"hello world"}'
# 4. View usage
curl -s "http://localhost:3000/usage?api_key=<api_key>&from=2026-01-01T00:00:00Z&to=2027-01-01T00:00:00Z&group_by=day"
# 5. Terminate
curl -s -X DELETE http://localhost:3000/deployments/<id>Two collections — one for mutable lifecycle state, one for immutable usage events.
| field | type | notes |
|---|---|---|
_id |
ObjectId | returned as deployment_id |
model |
"model-a" | "model-b" |
enum-validated |
status |
"provisioning" | "ready" | "terminated" |
|
endpoint_url |
string | null | populated when ready |
api_key |
string | null | populated when ready, sk_<48 hex> |
created_at, ready_at, terminated_at |
Date | null |
Index: { api_key: 1 } — sparse, unique. Sparse so multiple null keys (provisioning rows) coexist; unique because the api_key is the auth identity.
| field | type | notes |
|---|---|---|
_id |
ObjectId | |
api_key |
string | event owner — also the rate-limit / billing key |
deployment_id |
ObjectId | |
model |
"model-a" | "model-b" |
denormalized so /usage?group_by=model doesn't need a join |
input_tokens, output_tokens |
number | |
timestamp |
Date |
Index: { api_key: 1, timestamp: 1 } — covers the /usage?api_key&from&to range scan and supports the per-day/per-model $group.
- Denormalized
modelon the event. Aggregations by model would otherwise require a$lookupintodeployments. A 7-byte string per event is a fair price for a single-collection group. api_keyis the natural tenant key on the hot path. Rate limiting, auth, metering, and billing all key off it. The compound(api_key, timestamp)index covers the only access pattern/usageperforms.
The current implementation writes one Mongo doc per completion synchronously inside the request handler. That's fine for a test at ~10k inserts/sec it would buckle (write amp, index churn, latency variance, every request stalls on the metering write).
What I'd change, layered:
-
Move metering off the hot path. The completion handler should publish a usage event to a buffered, fire-and-forget channel and return. The simplest version is an in-process queue + small batched writer; the real version is a message bus (Kafka, Redpanda, NATS JetStream, or even Redis Streams) keyed by
api_keyso events for one tenant land on the same partition. Decoupling means a slow Mongo doesn't slow down user requests, and we can buffer through write spikes. -
Use a store designed for append-only time-series. I'd keep Mongo for deployments (lifecycle state) and route usage events into ClickHouse, BigQuery, or TimescaleDB. Columnar stores compress event streams ~10–50x and aggregate millions of rows in milliseconds — exactly what
/usageneeds. Mongo aggregations work but become a real cost center as event volume grows. -
Pre-aggregate at ingest. Most billing queries don't need event-level granularity. A second consumer rolls events into per-(api_key, hour, model) buckets in a separate
usage_rollupscollection/table./usagereads from rollups for spans > a few hours and from raw events for "last hour" queries. This cuts read cost dramatically. -
Rate-limit upstream of the worker. The Redis token-bucket already protects the metering write — tighten it (and allow per-customer overrides) so abusive keys can't push the bus.
- Persist provisioning across restarts. A real job queue instead of an in-process
setTimeout. Today, if the API restarts during the 10s window, the deployment is stuck inprovisioningforever. - Sliding-window rate limit. Fixed-window allows a 2× burst across boundary edges. A Redis sorted-set sliding window or Lua-implemented token bucket fixes that.
- Pagination + range capping on
/usage. A query for a year of events would return one row per day. For larger group cardinalities (e.g., per-deployment), addlimit/offset. - Structured logging + metrics. pino + a
/metricsendpoint (Prometheus) with counters for each status code and the metering write latency. - More tests. Concurrency on rate-limit, cache invalidation flow, malformed body fuzzing, and the 409 path when a deployment terminates between auth-cache hit and DB write.
- OpenAPI spec. A single source of truth for the contract.
- In-process
setTimeoutfor provisioning. Lossy on restart. Chosen because a job queue would have doubled the surface area for a screener and the spec said "simulate." Documented above. - Plain-text
api_key. Spec doesn't require hashing; hashing would add a lookup-by-hash path and make caching slightly more interesting. Skipped. - Fixed-window rate limit. Slightly burstier than sliding-window across the minute boundary. The token-bucket equivalent is one Lua script away, but the simpler
INCR + EXPIREis correct enough and easier to read. mongodb-memory-server+ioredis-mockin tests. Trades a one-time binary download for fully self-contained tests (no docker needed fornpm test). Faster iteration during development; reviewer doesn't need to run docker just to run tests.- Single Express app, not split into services. Three routes, one process. Splitting metering off (per the scaling answer) is the next step, but premature for the screener volume.
- Cache invalidation on
DELETE, not on every status read. Thedep:<api_key>cache has a 60s TTL, andDELETEactively invalidates it. For provisioning→ready transitions, no cache exists yet (api_key is null while provisioning), so no invalidation needed there.
I used Claude for:
- Drafting the spec from the PDF and pressure-testing the data-model.
- Generating the initial scaffolding (package.json, tsconfig, Dockerfile, docker-compose), then editing.
- Drafting the test cases.
- Documentation.
I read and understand every line. I made the architectural choices (status state machine, sparse-unique api_key, fixed-window rate limit, denormalized model on events).
| Method | Path | Notes |
|---|---|---|
POST |
/deployments |
body { "model": "model-a" | "model-b" }, returns { deployment_id, status } |
GET |
/deployments/:id |
returns lifecycle state; includes endpoint_url + api_key when ready |
DELETE |
/deployments/:id |
marks terminated, idempotent |
POST |
/v1/:deployment_id/completions |
requires Authorization: Bearer <api_key>, body { "prompt": string } |
GET |
/usage |
query: api_key, from (ISO), to (ISO), group_by=day|model (default day) |
GET |
/health |
{ ok: true } |
| Code | When |
|---|---|
200 |
success |
400 |
malformed body |
401 |
missing/invalid Bearer token |
403 |
token belongs to a different deployment |
409 |
deployment is provisioning or terminated |
429 |
> 100 requests in the current minute (per api_key) |
src/
app.ts express app factory
index.ts boot (connect mongo+redis, listen)
config.ts env vars
db/{mongo,redis}.ts connections (redis swaps to mock under NODE_ENV=test)
models/{deployment,usageEvent}.ts
routes/{deployments,completions,usage}.ts
middleware/{auth,rateLimit}.ts
services/{provisioning,pricing}.ts
tests/
setup.ts mongodb-memory-server bootstrap, per-test cleanup
helpers.ts app factory + create-ready-deployment helper
lifecycle.test.ts provisioning -> ready -> terminated
completions.test.ts auth, status, rate-limit, usage event recorded
usage.test.ts aggregation correctness (day + model) and bad input
docker-compose.yml app + mongo + redis
Dockerfile multi-stage build
docs/superpowers/specs/ design doc
spec (45+ minutes), scaffolding and code (30+ minutes), tests, and README (30+ minutes).