Screener — Usage-Based API Platform

A small Node.js + TypeScript service that simulates a usage-based LLM API platform: deployment lifecycle, authenticated request handling with metering, and aggregated usage/billing.

Stack: Express, TypeScript, MongoDB (Mongoose), Redis (ioredis), Docker Compose, Vitest.

1. Setup & run

Option A — fully containerized (recommended for review)

docker compose up --build

This starts MongoDB, Redis, and the API on port 3000. Wait for screener api listening on :3000.

Option B — local dev with hot reload

docker-compose.yml does not publish mongo/redis to the host by default (avoids conflicts with locally-installed instances). For Option B, either:

Use your existing local mongo + redis on default ports, or
Uncomment the ports: lines under mongo and redis in docker-compose.yml, then:

docker compose up -d mongo redis
npm install
cp .env.example .env       # optional, defaults are fine
npm run dev

Run the tests

npm test

Tests use mongodb-memory-server and ioredis-mock, so they run hermetically without docker.

Smoke test (curl)

# 1. Create a deployment
curl -s -X POST http://localhost:3000/deployments \
  -H 'content-type: application/json' \
  -d '{"model":"model-a"}'
# -> { "deployment_id": "...", "status": "provisioning" }

# 2. Wait ~10s, then fetch — status flips to "ready" and api_key/endpoint_url appear
curl -s http://localhost:3000/deployments/<id>

# 3. Call the completion endpoint
curl -s -X POST http://localhost:3000/v1/<id>/completions \
  -H "Authorization: Bearer <api_key>" \
  -H "content-type: application/json" \
  -d '{"prompt":"hello world"}'

# 4. View usage
curl -s "http://localhost:3000/usage?api_key=<api_key>&from=2026-01-01T00:00:00Z&to=2027-01-01T00:00:00Z&group_by=day"

# 5. Terminate
curl -s -X DELETE http://localhost:3000/deployments/<id>

2. Data model

Two collections — one for mutable lifecycle state, one for immutable usage events.

`deployments`

field	type	notes
`_id`	ObjectId	returned as `deployment_id`
`model`	`"model-a" \| "model-b"`	enum-validated
`status`	`"provisioning" \| "ready" \| "terminated"`
`endpoint_url`	string \| null	populated when `ready`
`api_key`	string \| null	populated when `ready`, `sk_<48 hex>`
`created_at`, `ready_at`, `terminated_at`	Date \| null

Index: { api_key: 1 } — sparse, unique. Sparse so multiple null keys (provisioning rows) coexist; unique because the api_key is the auth identity.

`usage_events`

field	type	notes
`_id`	ObjectId
`api_key`	string	event owner — also the rate-limit / billing key
`deployment_id`	ObjectId
`model`	`"model-a" \| "model-b"`	denormalized so `/usage?group_by=model` doesn't need a join
`input_tokens`, `output_tokens`	number
`timestamp`	Date

Index: { api_key: 1, timestamp: 1 } — covers the /usage?api_key&from&to range scan and supports the per-day/per-model $group.

Why this shape

Denormalized model on the event. Aggregations by model would otherwise require a $lookup into deployments. A 7-byte string per event is a fair price for a single-collection group.
api_key is the natural tenant key on the hot path. Rate limiting, auth, metering, and billing all key off it. The compound (api_key, timestamp) index covers the only access pattern /usage performs.

3. Scaling the metering pipeline to 10,000 req/sec

The current implementation writes one Mongo doc per completion synchronously inside the request handler. That's fine for a test at ~10k inserts/sec it would buckle (write amp, index churn, latency variance, every request stalls on the metering write).

What I'd change, layered:

Move metering off the hot path. The completion handler should publish a usage event to a buffered, fire-and-forget channel and return. The simplest version is an in-process queue + small batched writer; the real version is a message bus (Kafka, Redpanda, NATS JetStream, or even Redis Streams) keyed by api_key so events for one tenant land on the same partition. Decoupling means a slow Mongo doesn't slow down user requests, and we can buffer through write spikes.
Use a store designed for append-only time-series. I'd keep Mongo for deployments (lifecycle state) and route usage events into ClickHouse, BigQuery, or TimescaleDB. Columnar stores compress event streams ~10–50x and aggregate millions of rows in milliseconds — exactly what /usage needs. Mongo aggregations work but become a real cost center as event volume grows.
Pre-aggregate at ingest. Most billing queries don't need event-level granularity. A second consumer rolls events into per-(api_key, hour, model) buckets in a separate usage_rollups collection/table. /usage reads from rollups for spans > a few hours and from raw events for "last hour" queries. This cuts read cost dramatically.
Rate-limit upstream of the worker. The Redis token-bucket already protects the metering write — tighten it (and allow per-customer overrides) so abusive keys can't push the bus.

4. What I'd do differently with more time

Persist provisioning across restarts. A real job queue instead of an in-process setTimeout. Today, if the API restarts during the 10s window, the deployment is stuck in provisioning forever.
Sliding-window rate limit. Fixed-window allows a 2× burst across boundary edges. A Redis sorted-set sliding window or Lua-implemented token bucket fixes that.
Pagination + range capping on /usage. A query for a year of events would return one row per day. For larger group cardinalities (e.g., per-deployment), add limit/offset.
Structured logging + metrics. pino + a /metrics endpoint (Prometheus) with counters for each status code and the metering write latency.
More tests. Concurrency on rate-limit, cache invalidation flow, malformed body fuzzing, and the 409 path when a deployment terminates between auth-cache hit and DB write.
OpenAPI spec. A single source of truth for the contract.

5. Trade-offs

In-process setTimeout for provisioning. Lossy on restart. Chosen because a job queue would have doubled the surface area for a screener and the spec said "simulate." Documented above.
Plain-text api_key. Spec doesn't require hashing; hashing would add a lookup-by-hash path and make caching slightly more interesting. Skipped.
Fixed-window rate limit. Slightly burstier than sliding-window across the minute boundary. The token-bucket equivalent is one Lua script away, but the simpler INCR + EXPIRE is correct enough and easier to read.
mongodb-memory-server + ioredis-mock in tests. Trades a one-time binary download for fully self-contained tests (no docker needed for npm test). Faster iteration during development; reviewer doesn't need to run docker just to run tests.
Single Express app, not split into services. Three routes, one process. Splitting metering off (per the scaling answer) is the next step, but premature for the screener volume.
Cache invalidation on DELETE, not on every status read. The dep:<api_key> cache has a 60s TTL, and DELETE actively invalidates it. For provisioning→ready transitions, no cache exists yet (api_key is null while provisioning), so no invalidation needed there.

6. AI assistance

I used Claude for:

Drafting the spec from the PDF and pressure-testing the data-model.
Generating the initial scaffolding (package.json, tsconfig, Dockerfile, docker-compose), then editing.
Drafting the test cases.
Documentation.

I read and understand every line. I made the architectural choices (status state machine, sparse-unique api_key, fixed-window rate limit, denormalized model on events).

API reference

Method	Path	Notes
`POST`	`/deployments`	body `{ "model": "model-a" \| "model-b" }`, returns `{ deployment_id, status }`
`GET`	`/deployments/:id`	returns lifecycle state; includes `endpoint_url` + `api_key` when `ready`
`DELETE`	`/deployments/:id`	marks `terminated`, idempotent
`POST`	`/v1/:deployment_id/completions`	requires `Authorization: Bearer <api_key>`, body `{ "prompt": string }`
`GET`	`/usage`	query: `api_key`, `from` (ISO), `to` (ISO), `group_by=day\|model` (default `day`)
`GET`	`/health`	`{ ok: true }`

Status codes on `/v1/.../completions`

Code	When
`200`	success
`400`	malformed body
`401`	missing/invalid Bearer token
`403`	token belongs to a different deployment
`409`	deployment is `provisioning` or `terminated`
`429`	> 100 requests in the current minute (per api_key)

Project layout

src/
  app.ts                   express app factory
  index.ts                 boot (connect mongo+redis, listen)
  config.ts                env vars
  db/{mongo,redis}.ts      connections (redis swaps to mock under NODE_ENV=test)
  models/{deployment,usageEvent}.ts
  routes/{deployments,completions,usage}.ts
  middleware/{auth,rateLimit}.ts
  services/{provisioning,pricing}.ts
tests/
  setup.ts                 mongodb-memory-server bootstrap, per-test cleanup
  helpers.ts               app factory + create-ready-deployment helper
  lifecycle.test.ts        provisioning -> ready -> terminated
  completions.test.ts      auth, status, rate-limit, usage event recorded
  usage.test.ts            aggregation correctness (day + model) and bad input
docker-compose.yml         app + mongo + redis
Dockerfile                 multi-stage build
docs/superpowers/specs/    design doc

Time spent

spec (45+ minutes), scaffolding and code (30+ minutes), tests, and README (30+ minutes).

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Screener — Usage-Based API Platform

1. Setup & run

Option A — fully containerized (recommended for review)

Option B — local dev with hot reload

Run the tests

Smoke test (curl)

2. Data model

`deployments`

`usage_events`

Why this shape

3. Scaling the metering pipeline to 10,000 req/sec

4. What I'd do differently with more time

5. Trade-offs

6. AI assistance

API reference

Status codes on `/v1/.../completions`

Project layout

Time spent

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Screener — Usage-Based API Platform

1. Setup & run

Option A — fully containerized (recommended for review)

Option B — local dev with hot reload

Run the tests

Smoke test (curl)

2. Data model

deployments

usage_events

Why this shape

3. Scaling the metering pipeline to 10,000 req/sec

4. What I'd do differently with more time

5. Trade-offs

6. AI assistance

API reference

Status codes on /v1/.../completions

Project layout

Time spent

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`deployments`

`usage_events`

Status codes on `/v1/.../completions`

Packages