A production-style real-time chess platform built from scratch as a backend engineering learning project. A high-performance C++ WebSocket game server runs authoritative gameplay while a FastAPI control plane handles auth, matchmaking, and persistence. Services communicate via gRPC. The full stack is containerized and observable via Prometheus and Grafana.
| Layer | Technology |
|---|---|
| Frontend | Next.js (TypeScript) |
| Control Plane | FastAPI (Python) — REST + gRPC server |
| Game Server | C++ with uWebSockets — authoritative move validation |
| Service-to-service | gRPC (protobuf) |
| Primary DB | PostgreSQL |
| Cache / Ephemeral | Redis |
| Observability | Prometheus + Grafana |
| Infra | Docker Compose |
Manual JWT flow — short-lived access tokens (never stored) and long-lived refresh tokens stored as bcrypt hashes in Postgres, enabling revocation. Token rotation on every refresh. Redis-based rate limiting on all auth endpoints using a fixed window algorithm, keyed by IP or user ID.
Redis queue per time control — players are paired in arrival order. Short-lived signed match tickets are issued on match found and verified by the game server before a player is seated. A polling endpoint lets the first queued player retrieve their ticket from Redis.
Authoritative C++ WebSocket server — move validation runs entirely server-side using a chess library. Clients are never trusted. Turn enforcement, full legal move generation, and game-over detection (checkmate, stalemate, insufficient material, fifty-move rule, threefold repetition) are all handled server-side. Both players receive every move broadcast in real time.
The C++ game server communicates with the FastAPI control plane via gRPC. VerifyMatchTicket is called before seating any player. ReportGameEnd fires on game completion, persisting the result and PGN to Postgres automatically. CreateResumeToken and ResolveResumeToken handle the reconnect flow.
When a player disconnects mid-game, their room stays alive. A resume token issued at join time is stored in Redis and consumed on reconnect (GETDEL — single use). The reconnecting player is re-seated into the existing room and immediately sent the current board position and full move history to re-sync.
Per-connection WebSocket rate limiting (10 messages/sec) tracked in-memory per socket. Message size limit (1KB) enforced before JSON parsing to prevent CPU-burn attacks. Append-only audit_logs table in Postgres with DB-level rules blocking UPDATE and DELETE — events captured include signup, login, failed login, logout, match start, and game end.
FastAPI is auto-instrumented via prometheus-fastapi-instrumentator. The C++ game server exposes custom Prometheus metrics on a dedicated port: connected socket count, active game count, total moves processed, move handling latency histogram, and error counts labeled by reason. Grafana dashboards are provisioned automatically on startup — no manual import required.
Load test: 100 total games, up to 50 games in flight, 37 moves each (3,700 total moves), measured end-to-end from WebSocket send to broadcast receipt by both players.
| Metric | Result |
|---|---|
| p50 | 1.20 ms |
| p95 | 5.06 ms |
| p99 | 7.09 ms |
| min | 0.38 ms |
| max | 13.96 ms |
| mean | 1.79 ms |
| Games completed | 100 / 100 |
| Moves measured | 3,700 / 3,700 |
See loadtests/ for how to reproduce these numbers.
Prerequisites: Docker + Docker Compose
# 1. Boot the full stack
docker compose --env-file infra/.env -f infra/docker-compose.yml up -d
# 2. Apply the database schema (first time only)
docker compose -f infra/docker-compose.yml exec -T postgres psql \
-U <POSTGRES_USER> -d <POSTGRES_DB> < infra/tables.sql
# 3. Services
# Frontend: http://localhost:3000
# API: http://localhost:8000
# Game server: ws://localhost:9001/ws
# Prometheus: http://localhost:9090
# Grafana: http://localhost:3001 (admin / your GRAFANA_PASSWORD)Copy infra/.env.example to infra/.env and fill in the required secrets before running.
Secure-Chess/
proto/
gamecontrol.proto # gRPC contract between C++ and Python
services/
api/ # FastAPI control plane (Python)
gameserver/ # C++ WebSocket server
frontend/ # Next.js client
infra/
docker-compose.yml
tables.sql # initial schema
migrations/ # subsequent schema changes
grafana/ # provisioned dashboards + datasources
prometheus.yml # scrape config
scripts/ # codegen + deliverable test scripts
loadtests/ # latency load test + rate limit tests
docs/
progress.md