Skip to content

omprakashk22/assessment_2

Repository files navigation

Screener — Usage-Based API Platform

A small Node.js + TypeScript service that simulates a usage-based LLM API platform: deployment lifecycle, authenticated request handling with metering, and aggregated usage/billing.

Stack: Express, TypeScript, MongoDB (Mongoose), Redis (ioredis), Docker Compose, Vitest.

1. Setup & run

Option A — fully containerized (recommended for review)

docker compose up --build

This starts MongoDB, Redis, and the API on port 3000. Wait for screener api listening on :3000.

Option B — local dev with hot reload

docker-compose.yml does not publish mongo/redis to the host by default (avoids conflicts with locally-installed instances). For Option B, either:

  • Use your existing local mongo + redis on default ports, or
  • Uncomment the ports: lines under mongo and redis in docker-compose.yml, then:
docker compose up -d mongo redis
npm install
cp .env.example .env       # optional, defaults are fine
npm run dev

Run the tests

npm test

Tests use mongodb-memory-server and ioredis-mock, so they run hermetically without docker.

Smoke test (curl)

# 1. Create a deployment
curl -s -X POST http://localhost:3000/deployments \
  -H 'content-type: application/json' \
  -d '{"model":"model-a"}'
# -> { "deployment_id": "...", "status": "provisioning" }

# 2. Wait ~10s, then fetch — status flips to "ready" and api_key/endpoint_url appear
curl -s http://localhost:3000/deployments/<id>

# 3. Call the completion endpoint
curl -s -X POST http://localhost:3000/v1/<id>/completions \
  -H "Authorization: Bearer <api_key>" \
  -H "content-type: application/json" \
  -d '{"prompt":"hello world"}'

# 4. View usage
curl -s "http://localhost:3000/usage?api_key=<api_key>&from=2026-01-01T00:00:00Z&to=2027-01-01T00:00:00Z&group_by=day"

# 5. Terminate
curl -s -X DELETE http://localhost:3000/deployments/<id>

2. Data model

Two collections — one for mutable lifecycle state, one for immutable usage events.

deployments

field type notes
_id ObjectId returned as deployment_id
model "model-a" | "model-b" enum-validated
status "provisioning" | "ready" | "terminated"
endpoint_url string | null populated when ready
api_key string | null populated when ready, sk_<48 hex>
created_at, ready_at, terminated_at Date | null

Index: { api_key: 1 } — sparse, unique. Sparse so multiple null keys (provisioning rows) coexist; unique because the api_key is the auth identity.

usage_events

field type notes
_id ObjectId
api_key string event owner — also the rate-limit / billing key
deployment_id ObjectId
model "model-a" | "model-b" denormalized so /usage?group_by=model doesn't need a join
input_tokens, output_tokens number
timestamp Date

Index: { api_key: 1, timestamp: 1 } — covers the /usage?api_key&from&to range scan and supports the per-day/per-model $group.

Why this shape

  • Denormalized model on the event. Aggregations by model would otherwise require a $lookup into deployments. A 7-byte string per event is a fair price for a single-collection group.
  • api_key is the natural tenant key on the hot path. Rate limiting, auth, metering, and billing all key off it. The compound (api_key, timestamp) index covers the only access pattern /usage performs.

3. Scaling the metering pipeline to 10,000 req/sec

The current implementation writes one Mongo doc per completion synchronously inside the request handler. That's fine for a test at ~10k inserts/sec it would buckle (write amp, index churn, latency variance, every request stalls on the metering write).

What I'd change, layered:

  1. Move metering off the hot path. The completion handler should publish a usage event to a buffered, fire-and-forget channel and return. The simplest version is an in-process queue + small batched writer; the real version is a message bus (Kafka, Redpanda, NATS JetStream, or even Redis Streams) keyed by api_key so events for one tenant land on the same partition. Decoupling means a slow Mongo doesn't slow down user requests, and we can buffer through write spikes.

  2. Use a store designed for append-only time-series. I'd keep Mongo for deployments (lifecycle state) and route usage events into ClickHouse, BigQuery, or TimescaleDB. Columnar stores compress event streams ~10–50x and aggregate millions of rows in milliseconds — exactly what /usage needs. Mongo aggregations work but become a real cost center as event volume grows.

  3. Pre-aggregate at ingest. Most billing queries don't need event-level granularity. A second consumer rolls events into per-(api_key, hour, model) buckets in a separate usage_rollups collection/table. /usage reads from rollups for spans > a few hours and from raw events for "last hour" queries. This cuts read cost dramatically.

  4. Rate-limit upstream of the worker. The Redis token-bucket already protects the metering write — tighten it (and allow per-customer overrides) so abusive keys can't push the bus.

4. What I'd do differently with more time

  • Persist provisioning across restarts. A real job queue instead of an in-process setTimeout. Today, if the API restarts during the 10s window, the deployment is stuck in provisioning forever.
  • Sliding-window rate limit. Fixed-window allows a 2× burst across boundary edges. A Redis sorted-set sliding window or Lua-implemented token bucket fixes that.
  • Pagination + range capping on /usage. A query for a year of events would return one row per day. For larger group cardinalities (e.g., per-deployment), add limit/offset.
  • Structured logging + metrics. pino + a /metrics endpoint (Prometheus) with counters for each status code and the metering write latency.
  • More tests. Concurrency on rate-limit, cache invalidation flow, malformed body fuzzing, and the 409 path when a deployment terminates between auth-cache hit and DB write.
  • OpenAPI spec. A single source of truth for the contract.

5. Trade-offs

  • In-process setTimeout for provisioning. Lossy on restart. Chosen because a job queue would have doubled the surface area for a screener and the spec said "simulate." Documented above.
  • Plain-text api_key. Spec doesn't require hashing; hashing would add a lookup-by-hash path and make caching slightly more interesting. Skipped.
  • Fixed-window rate limit. Slightly burstier than sliding-window across the minute boundary. The token-bucket equivalent is one Lua script away, but the simpler INCR + EXPIRE is correct enough and easier to read.
  • mongodb-memory-server + ioredis-mock in tests. Trades a one-time binary download for fully self-contained tests (no docker needed for npm test). Faster iteration during development; reviewer doesn't need to run docker just to run tests.
  • Single Express app, not split into services. Three routes, one process. Splitting metering off (per the scaling answer) is the next step, but premature for the screener volume.
  • Cache invalidation on DELETE, not on every status read. The dep:<api_key> cache has a 60s TTL, and DELETE actively invalidates it. For provisioning→ready transitions, no cache exists yet (api_key is null while provisioning), so no invalidation needed there.

6. AI assistance

I used Claude for:

  • Drafting the spec from the PDF and pressure-testing the data-model.
  • Generating the initial scaffolding (package.json, tsconfig, Dockerfile, docker-compose), then editing.
  • Drafting the test cases.
  • Documentation.

I read and understand every line. I made the architectural choices (status state machine, sparse-unique api_key, fixed-window rate limit, denormalized model on events).

API reference

Method Path Notes
POST /deployments body { "model": "model-a" | "model-b" }, returns { deployment_id, status }
GET /deployments/:id returns lifecycle state; includes endpoint_url + api_key when ready
DELETE /deployments/:id marks terminated, idempotent
POST /v1/:deployment_id/completions requires Authorization: Bearer <api_key>, body { "prompt": string }
GET /usage query: api_key, from (ISO), to (ISO), group_by=day|model (default day)
GET /health { ok: true }

Status codes on /v1/.../completions

Code When
200 success
400 malformed body
401 missing/invalid Bearer token
403 token belongs to a different deployment
409 deployment is provisioning or terminated
429 > 100 requests in the current minute (per api_key)

Project layout

src/
  app.ts                   express app factory
  index.ts                 boot (connect mongo+redis, listen)
  config.ts                env vars
  db/{mongo,redis}.ts      connections (redis swaps to mock under NODE_ENV=test)
  models/{deployment,usageEvent}.ts
  routes/{deployments,completions,usage}.ts
  middleware/{auth,rateLimit}.ts
  services/{provisioning,pricing}.ts
tests/
  setup.ts                 mongodb-memory-server bootstrap, per-test cleanup
  helpers.ts               app factory + create-ready-deployment helper
  lifecycle.test.ts        provisioning -> ready -> terminated
  completions.test.ts      auth, status, rate-limit, usage event recorded
  usage.test.ts            aggregation correctness (day + model) and bad input
docker-compose.yml         app + mongo + redis
Dockerfile                 multi-stage build
docs/superpowers/specs/    design doc

Time spent

spec (45+ minutes), scaffolding and code (30+ minutes), tests, and README (30+ minutes).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors