A verbal chess web app. Players speak their moves out loud; the server validates and relays. Two modes: easy (board visible) and blindfold (only opponent's last move shown). Lichess-style time controls (5+0, 10+0). Open source, ~100 expected users at launch.
| Area | Decision | Why |
|---|---|---|
| Transport between players | WebSockets, not WebRTC | Opponents must not hear each other, so peer audio is unwanted. WebRTC buys nothing here. |
| Speech-to-Text | Deepgram Nova-3 streaming (server-side), Web Speech API as fallback | Only vendor with strong per-turn vocabulary biasing ("keyterm prompting") β the single biggest accuracy lever for "knight vs night", "Nf3 vs enough three". Sub-300 ms partials, ~$0.0043/min. |
| Move parsing | Go voice normalizer + notnil/chess legality | The Go server normalizes common spoken move forms, then validates against the authoritative game state. |
| Board UI | react-chessboard 5.x | Active, React-native, MIT, supports last-move highlight + disabling drag (needed for "voice-only"). chessground is GPL-3.0; only use if we accept GPL for the whole app. |
| Engine | Stockfish 18 (WASM) β v2 feature only | Not needed for PvP. Used later for blunder analysis in match replay, optional hint feature, and a future bot mode. |
| Rating | Glicko-1, start at 1200 | Pure ELO converges too slowly; Glicko-2's volatility math needs more games than we'll have. Glicko-1 = right complexity for our scale. 1200 gives headroom in both directions and converges faster than 800 with a thin user base. |
| Matchmaking | Per-pool expanding-window queue (Β±50 every 10 s, cap Β±400) | 4 pools = mode Γ time control. Pools will be thin β offer opt-in time-control expansion + a bot fallback after 60 s. |
| Database | MongoDB Atlas (per spec) | Schema flexibility for match documents + native TTL indexes for the 7-week eviction. |
| Auth | Clerk (per spec) | Stores Clerk userId as the foreign key everywhere. |
| Real-time server | Go WebSocket server on Railway | Vercel is ideal for the web app, while Railway can run the long-lived WebSocket process. Goroutines and mutex-protected game actors fit the launch concurrency model. |
| Mid-game disconnects | 10 s reconnect grace, then opponent wins | Chess.com-style behavior: brief grace for flaky mobile/network drops, but the waiting player is not held hostage. If the game never progressed past move 0, record it as a resignation-style early exit. |
| Voice privacy | No peer audio. STT happens server-side; opponent gets parsed text (+ optional TTS) | Hard requirement from spec. Also simplifies anti-cheat (server sees the audio path). |
| OSS split | Public app monorepo + private infra repo | Mirrors Lichess (lila public, ops private). |
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Browser (Next.js client, Clerk auth) β
β ββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββββββββββββ β
β β Lobby / β β Game UI β β Mic capture (own turn) β β
β β Matchmaking β β (react- β β getUserMedia β Opus 20ms β β
β β β β chessboard) β β β WS audio frames β β
β ββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββββββββββββ β
ββββββββββββββ¬βββββββββββββββββββ¬ββββββββββββββββββββ¬ββββββββββββββββββ
β HTTPS (Next.js) β WS /game β WS /audio
β β β
βββββββββββΌβββββββββ ββββββββΌβββββββββββ ββββββββΌβββββββββββββββ
β Next.js API β β Game server β β STT worker β
β (Vercel) β β (Railway: Go WS)β β (Railway worker) β
β - profile β β β β - holds Deepgram β
β - history list β β - matchmaking β β socket per game β
β - replay fetch β β - per-game β β - legal-move β
β - leaderboard β β state β β keyterms each β
β β β - notnil/chess β β turn β
β β β - clocks β β - voice parser β
ββββββββββ¬ββββββββββ β - ratings β ββββββββββββ¬βββββββββββ
β ββββββββββ¬βββββββββ β
β β (in-proc or β
β β WS callback) β
β β ββββββββββββββββββββββ
βΌ βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MongoDB Atlas β
β - users, ratings, games (TTL 7w), match summaries β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The STT worker and Game server can be the same Go process for v1 (simpler). Split later if STT load grows.
Per turn:
- Game server announces "your turn, white" to player A over
/gameWS. Player B's UI just shows the clock and "waiting". - Player A's browser starts
getUserMedia(mic was pre-authorized on game start). Streams Opus chunks over/audioWS to STT worker. - Game server computes
chess.moves({ verbose: true })β the set of legal SAN moves from this position. Builds a Deepgram keyterm list:- Every legal SAN:
["Nf3", "Bxe5", "O-O", "O-O-O", "e4", ...] - Every legal move in NL form via chess-nlp:
["knight to f3", "bishop takes e5", "castles kingside", ...] - NATO phonetic for files:
"alpha"="a","bravo"="b","charlie"="c","delta"="d","echo"="e","foxtrot"="f","golf"="g","hotel"="h". (Lichess discovered this fixes a huge chunk of file-recognition errors.) - Common chess words:
"check","mate","takes","captures","promotes","queen", etc. - Pushes this keyterm list to STT worker.
- Every legal SAN:
- STT worker sends
KeytermPromptto Deepgram for this turn's session. - As Deepgram returns interim transcripts, worker forwards them to player A's UI for live feedback ("knight to e..." β "knight to e4").
- On
is_final+ endpointing, worker runs:- Normalization (lowercase, "night"β"knight", "be"β"b", "ate"β"8", strip filler).
- chess-nlp
textToSan(normalized)β candidate SAN. - notnil/chess validates legality.
- If valid: emit
move-confirmedto game server β updates state, broadcasts to both players, swaps turn, resets clock. Opponent's UI shows the move as text and (optionally) speaks it viaSpeechSynthesis. - If invalid / unparseable: increment
illegalCount[playerId]. Reply to player A: "Couldn't parse β try again" or "Illegal move, 1 of 3". On 3rd strike β player A loses, game ends, broadcast result.
getUserMediaonce at game start (avoid mid-game permission prompts).- Toggle
track.enabledbased on turn (server-confirmed, not local guess). - Manual audio toggle setting: when enabled, mic is only hot while user holds a push-to-talk key (default: spacebar). Default off (automatic per-turn).
When the turn flips to the local player, the UI needs to be unmistakable. The active-player surface has three layers:
- Whole-screen state change β board border / page accent shifts to the player's color, opponent's panel dims, the clock for the active player gets a subtle pulse. The user should know it's their turn from peripheral vision alone.
- Voice capsule (live mic component) β a prominent pill-shaped component at the bottom-center of the screen with:
- A live audio waveform visualizing the user's mic input in real time (driven by
AnalyserNodefrom the Web Audio API, ~60fps canvas render of frequency bins). - Above/inside the capsule, the live interim transcript from Deepgram updates as the user speaks:
"knight..."β"knight to..."β"knight to e4". Renders the partial in a muted color, snaps to a confident color onis_final. - Subtle prompt copy when idle: "Your move β say it out loud" or, for manual-audio users, "Hold space and speak".
- Color-coded states: idle (gray) β listening (active color, animated waveform) β parsing (brief shimmer) β confirmed (green flash) or rejected (red flash + "couldn't parse, try again β 1 of 3").
- A live audio waveform visualizing the user's mic input in real time (driven by
- Opponent's screen β the inverse: their voice capsule is collapsed/dimmed, their clock is static, a "waiting on opponentβ¦" affordance appears. They should never wonder whose turn it is.
In blindfold mode, the voice capsule and turn-state cues become the entire UI (no board), so they need to carry even more visual weight β full-width capsule, larger waveform, larger clocks.
The capsule component lives in /app/components/voice-capsule.tsx and is the centerpiece of the in-game experience.
- Detect Firefox or Web Speech availability. If Deepgram is down or user opts out, use browser-side
SpeechRecognitionwith the same normalization β Go voice parser β notnil/chess layer. Worse accuracy, no biasing.
users
_id: ObjectId
clerkUserId: string (unique index)
username: string (display name)
createdAt: Date
settings: {
manualAudio: boolean (default false)
ttsAnnouncements: boolean (default true)
preferredColor: "white" | "black" | "random"
}
ratings
_id: ObjectId
userId: ObjectId (ref users) β index
mode: "easy" | "blindfold" β index
rating: number (default 1200)
rd: number (Glicko deviation, default 350)
games: number (default 0)
updatedAt: Date
// compound unique index: (userId, mode)
games // TTL 7 weeks
_id: ObjectId
mode: "easy" | "blindfold"
timeControl: { initial: 300|600, increment: 0 } // seconds
white: { userId, ratingBefore, ratingAfter }
black: { userId, ratingBefore, ratingAfter }
result: "white" | "black" | "draw" | null
termination: "checkmate" | "resignation" | "timeout" | "illegal_strikes" | "disconnect" | "draw_*"
pgn: string // standard PGN, replayable by chess libraries
moves: [ // for replay UI
{
san: string, // "Nf3"
uci: string, // "g1f3"
raw: string, // "knight to f three" (what player said, for transparency)
msFromStart: number,
whiteClockMs: number,
blackClockMs: number
}
]
illegalCount: { white: number, black: number }
startedAt: Date
endedAt: Date
expiresAt: Date // TTL index: db.games.createIndex({expiresAt:1},{expireAfterSeconds:0})
queue // ephemeral, in-memory on game server, not persisted
// { userId, mode, timeControl, rating, joinedAt }
moves[]separately from PGN β PGN gives you a portable replay, but the per-move clock state and the raw verbal transcript are valuable for users (replay debugging, transparency that we heard them right) and not part of standard PGN.- TTL on
gamesβ MongoDB'sexpireAfterSeconds: 0on anexpiresAtfield is the canonical eviction pattern; setexpiresAt = endedAt + 7 weeksat game completion. - Rating stored separately from
usersβ allows independent updates per mode without doc rewrites, and lets you fetch leaderboards by(mode, rating)index without loading user docs.
Starting rating = 1200, separate per mode.
Glicko-1 parameters:
- Rating period: per game (simpler than batching; fine at our scale).
- New player:
rating=1200, RD=350. - After each game, update both players using Glicko-1 formulas (q = ln(10)/400 β 0.00575).
- RD decays toward 350 over time (cap on inactive players' confidence).
Implementation: a single rateGame(white, black, result) function in the game server. ~50 LOC. No external library needed; we'll write and test it directly.
Four pools: (easy, 5min), (easy, 10min), (blindfold, 5min), (blindfold, 10min).
Algorithm (per pool, runs in the game server, polls every 1 s):
- Sort waiting users by
joinedAt. - For each user, compute current acceptance window:
Β±(50 + 50 * floor(secondsWaited / 10)), capped atΒ±400. - Pair the longest-waiting user with the closest-rating opponent within the window.
- After 30 s waiting, prompt the user "no match yet β also search 10min?" (opt-in expansion across time-control pools, never across modes).
- After 60β90 s, offer a "play against bot" option (uses Stockfish at a difficulty matched to their rating β v2 feature).
No external matchmaking lib. ~80 LOC. Revisit at 10k+ DAU.
- Lobby β user clicks "Play easy 5+0". Joins queue. Lobby shows "queuingβ¦ N players online".
- Match found β both clients receive
game-startwithgameId, color, opponent username/rating. - Pre-game β both clients call
getUserMediaand join the game WS room. Server starts white's clock. - Turn cycle β voice pipeline (Section 3). On every move: server updates state, persists incremental move to MongoDB, broadcasts to both.
- End conditions β checkmate / stalemate / draw (notnil/chess detects) / timeout (server timer) / resignation (button: "say 'I resign' or click") / 3 illegal strikes.
- Post-game β show result, rating delta, link to replay. Both
ratingsrows updated.gamesdoc finalized withexpiresAt.
Use a Chess.com-style grace period rather than keeping disconnected games alive indefinitely:
- If a player's
/gameWebSocket drops aftergame:start, the server detaches that socket and marks the player as disconnected. - The disconnected player gets 10 seconds to reconnect with the same Clerk/guest identity and send
game:resumefor the activegameId. - During the grace window, the opponent remains in-game and sees a reconnecting/abandoned-game countdown. The authoritative game clock can keep running, but the disconnect adjudication timer is separate so a network drop does not create an unbounded wait.
- If the player reconnects in time, the server reattaches the new socket, sends a fresh full
game:statesnapshot (FEN, clocks, all moves, illegal counts), and clears the disconnect timer. - If the grace window expires, the opponent wins automatically. Use a distinct disconnect/abandonment termination for games with at least one move; if no moves have been played, record the result as a resignation-style early exit.
- This v1 policy is single-process only. Multi-server deployments need sticky routing or shared game/session state before reconnect can be reliable across instances.
- No board rendered.
- Show: clocks, your color, last opponent move (in text + optional spoken aloud once), move count, "your turn / opponent's turn".
- Optional setting: speak the full move list aloud on demand (button: "repeat moves").
- Fetch
gamesdoc by_id(auth: only if user was a player). - Reconstruct position by stepping through
moves[]and applying to a fresh chess engine instance. - UI: chessboard + forward/back buttons + move list (chess.com style). Highlight last move.
- "Show what I said" toggle reveals the
rawtranscript per move.
Two repos:
- Next.js app (
/app) - Game server (
/server) - Shared types/game logic (
/shared) - Docker
compose.ymlfor local dev: Next + game server + Mongo + (optionally) a stub STT worker that uses Web Speech API only .env.examplewith every key documented (CLERK_PUBLISHABLE_KEY,MONGODB_URI,DEEPGRAM_API_KEY, ...) and dummy valuesREADME.md"run locally in 3 commands"SECURITY.md,CONTRIBUTING.md,docs/self-hosting.md- License: AGPL-3.0 (matches Lichess; prevents proprietary fork-and-host without contributing back).
- Terraform/Pulumi: Atlas cluster, Railway service, Vercel project, DNS, Cloudflare
- Production env files (encrypted with SOPS or stored only in Railway/Vercel dashboards)
- Runbooks, on-call docs
- Anti-cheat thresholds, abuse-response procedures (publishing these = publishing the bypass)
.env.localgitignored.- Local dev: dummy keys work for everything except STT β provide a Web-Speech-API-only fallback so contributors don't need a Deepgram key.
- Production secrets live in Vercel + Railway env stores. Never in either repo.
| Layer | Choice | License |
|---|---|---|
| Web framework | Next.js 15 (App Router) | MIT |
| Auth | Clerk | proprietary SaaS |
| Database | MongoDB Atlas | SSPL (server) / various drivers |
| Real-time server | Go WebSocket server on Railway | BSD-style stdlib + Gorilla WebSocket |
| Board UI | react-chessboard 5.x | MIT |
| Game logic | notnil/chess | MIT |
| Move NL parser | chess-nlp (vendored + extended) | MIT |
| Speech-to-Text | Deepgram Nova-3 + Web Speech API fallback | SaaS / browser-native |
| Future engine | Stockfish 18 (WASM) | GPL-3.0 (load only as analysis; isolate behind API to keep main app license clean) |
| Hosting | Vercel (web) + Railway (game/STT server) + MongoDB Atlas | β |
| CI | GitHub Actions | β |
- Next.js scaffold, Clerk auth, MongoDB connection, basic profile page.
- Public/private repo structure,
.env.example, README, local Docker compose.
- Game server (Go + WebSockets), one game between two browser tabs.
- notnil/chess integration, react-chessboard, clocks, move-by-move broadcast.
gamesdoc persisted, replay UI works.
- Mic capture, push-to-server audio WS, Deepgram integration, keyterm prompting per turn.
- Go voice parsing + normalization + notnil/chess validation.
- 3-strikes illegal move rule.
- Voice capsule component: live waveform (Web Audio
AnalyserNode+ canvas) + live interim transcript + turn-state colors. - Whole-screen "your turn" affordance (board accent, opponent dim, clock pulse).
- TTS for opponent's move (browser
SpeechSynthesis).
- Blindfold UI (no board) β voice capsule becomes the centerpiece, larger waveform.
- Glicko-1 implementation, rating updates, per-mode rating display.
- Profile page shows rating, history list.
- 4 pools, expanding window, opt-in pool expansion.
- "N players online" indicator.
- Resign flow, draw offers.
- Manual audio toggle setting.
- Web Speech API fallback path (Firefox/no-key).
- Match-history replay with raw-transcript toggle.
- 7-week TTL verified working.
- Self-hosting docs.
- Stockfish blunder analysis in replay.
- Bot opponent mode (matchmaking fallback).
- Mobile app (React Native / Expo, sharing the
/sharedpackage). - Spectator mode (read-only WS subscribers).
- Tournament/arena mode.
- Sound packs, themes.
- chess-nlp is unmaintained. Plan: vendor it into
/shared/voice-parser, add tests against a corpus of "how chess players actually talk" (we'll need to build this β start by recording ourselves for 30 minutes). Likely we need to extend its grammar. - Deepgram cost at scale. ~$0.005/min Γ ~2 s/move Γ 40 moves/game Γ 100 users Γ heavy daily play β $10β30/mo. Cheap at our size, monitor as usage grows.
- Safari/Firefox parity. Web Speech API fallback covers Firefox poorly. Document this clearly; consider whisper.cpp WASM in v2.
- Anti-cheat. Verbal chess is harder to bot than text chess (you'd need TTS β audio β captured), but engine assistance is still possible (player listens to engine, speaks the move). Deferring detection to v2 β note in private infra repo what heuristics we use.
- The "delta 2 to delta 4" problem. Single-square-name moves (e.g. "e4") confuse STT engines on accents. Lichess solved this by requiring NATO phonetic + source square for pawn moves. We should support both NATO and natural ("knight to e4") β and confirm ambiguous parses back to the user ("did you mean Nf3 or Nh3? say 'first' or 'second'").
- Latency budget. End-to-end β€1 s = mic-to-Deepgram (~50 ms) + Deepgram processing (~300 ms) + parse+validate (<10 ms) + WS hop to opponent (~50 ms) + render (~50 ms) β 460 ms typical. Achievable. Watch worst-case (Deepgram cold-start, network jitter).
Once this plan is approved, I'd start M1: scaffold Next.js + Clerk + Mongo + the two-repo structure, and write the .env.example + README so the OSS scaffold is right from commit #1.
Users should be able to create a private invite link and send it to a friend. The rough v1 flow is:
- Player A chooses mode + time control and clicks Create friend link.
- Server creates an in-memory invite ID and returns it to Player A.
- Player A shares
/play?invite=<inviteId>. - Player B opens the link, authenticates with Clerk, and joins that invite.
- Server starts a normal
GameActorwith the selected mode/time control. The game uses the same rules, voice path, clocks, illegal move strikes, rating updates, and match history persistence as matchmaking games.
For v1, invites can be ephemeral and stored in the game server process. Later, move them to Mongo or Redis if we need multi-instance servers, invite expiration after restarts, or pending invite pages.
Production STT
Choose STT provider for production, likely Deepgram Nova-3.
Add server-side STT streaming instead of dev transcript input.
Add chess keyterm prompting for piece names, squares, captures, castling, promotion.
Handle interim transcript updates in the voice capsule.
Add timeout/retry/error states for failed recognition.
Add privacy note: opponent never receives audio. Production Infra
Decide hosting split: web app, WebSocket server, MongoDB Atlas.
Add production env var docs.
Add deployment docs for self-hosting.
Add Dockerfile or deployment config for server.
Add health checks for web, server, Mongo, and STT.
Add basic observability: logs, request IDs, game IDs. Scaled Matchmaking
Replace in-memory queues with Redis or Mongo-backed queues.
Add rating-window expansion over time.
Add reconnect handling for dropped WebSocket clients: 10 s same-identity game:resume, full state snapshot on success.
Add abandoned-game cleanup: after the reconnect grace expires, award the opponent a win; move-0 disconnects are recorded as resignation-style exits.
Add multi-server game routing strategy.
Add bot fallback when queue wait is too long. UI Polish
Finish Lichess-like spacing and color consistency.
Polish mobile /play layout.
Improve blindfold mode screen.
Add clearer turn state and illegal move warnings.
Improve history/replay visuals.
Add loading/empty/error states across dashboard pages. Deployment Hardening
Add rate limits for auth, game actions, transcript events, and invites.
Validate all WebSocket messages with schemas.
Add origin checks for WebSocket connections.
Add production-safe Clerk JWT verification config.
Add Mongo indexes migration/check command to setup docs.
Add CI for typecheck, tests, and lint.
Add smoke tests for bot game, guest game, and replay.