Interview Coach

Real-time voice pipeline for conversational AI, built as a mock interview app.

The hard problem isn't interview prep. It's engineering a low-latency voice loop: capture speech, reason about it, and speak back fast enough that the conversation feels natural. This project is a working implementation of that pipeline.

How It Works

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│  Speech-to-  │────▶│  LLM Engine  │────▶│  Text-to-   │
│  Text (STT)  │     │  (Reasoning) │     │  Speech(TTS)│
└─────────────┘     └──────────────┘     └─────────────┘
  OpenAI Whisper      GPT-4.1-mini        OpenAI TTS
  ↕ fallback          ↕ fallback          ↕ fallback
  Deepgram            xAI / Gemini        ElevenLabs
  local Whisper CLI   deterministic       browser synth
                      engine

Voice capture: Browser MediaRecorder captures audio, sends to /api/interview/sessions/:id/stt for server-side transcription. Falls back to Web Speech API if server STT is unavailable.

Turn generation: A deterministic state machine (engine.ts) decides turn type (kickoff, question, follow-up, wrap-up) based on session state, time/question budgets, and answer quality heuristics. The LLM rewrites for natural phrasing, but the state machine controls conversation flow.

Speech output: Assistant text hits /api/interview/sessions/:id/tts, returns audio streamed to an <audio> element. Falls back to browser speechSynthesis.

Fallback chain: Every layer degrades gracefully. No API keys? Browser voice + deterministic engine. LLM down? Engine handles turn logic alone. Server TTS fails? Client auto-switches to browser synthesis with a UI notice.

Why It's Fast

The LLM isn't driving conversation structure. The deterministic state machine computes the next turn type in microseconds. The LLM only rewrites for phrasing (capped at 320 tokens, 30s timeout). If the LLM is slow or fails, the deterministic turn ships immediately. Worst-case latency is STT time + TTS time. The LLM is never on the critical path.

Architecture

Layer	Tech	Role
Frontend	Next.js 16, React 19	Session UI, voice controls, transcript
Voice	OpenAI Whisper / TTS, Deepgram, ElevenLabs	Server-side STT + TTS with provider chains
Reasoning	GPT-4.1-mini + deterministic engine	Turn generation, scorecard evaluation
State	SQLite (WAL mode)	Sessions, messages, scores, provider telemetry
Testing	Vitest + Playwright	36 tests: unit, integration, e2e

Run Locally

npm install
cp .env.example .env.local   # add OPENAI_API_KEY at minimum
npm run dev                   # http://localhost:3000/setup

Works without API keys in degraded mode (browser voice + deterministic engine).

npm test              # unit + integration (Vitest)
npm run test:e2e      # smoke tests (Playwright)

Stack

TypeScript · Next.js 16 · React 19 · SQLite · OpenAI (Whisper + GPT-4.1-mini + TTS) · Vitest · Playwright

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
app		app
docs		docs
lib		lib
public		public
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
components.json		components.json
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
playwright.config.ts		playwright.config.ts
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Interview Coach

How It Works

Why It's Fast

Architecture

Run Locally

Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Interview Coach

How It Works

Why It's Fast

Architecture

Run Locally

Stack

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages