Real-time public speaking coach powered by Gemini Live API with native audio streaming, deployed on Google Cloud Run.
Live demo: https://ate-the-mic-test-316301290609.us-west1.run.app
Ate the Mic is a real-time AI speaking coach that listens to you talk, watches your body language through your camera, and gives you live feedback — just like having a coach in the room. It uses the Gemini Live API's bidirectional audio streaming to create a natural, conversational coaching experience where the AI can interrupt you mid-sentence to point out filler words or encourage you to slow down.
| Mode | What happens |
|---|---|
| Coach | Supportive mentor — tips on pacing, eye contact, filler words, confidence |
| Heckler | Adversarial audience — tests your composure under pressure |
| Game (Hard) | "Just a Minute" challenge — speak for 60 seconds without filler words or the coach calls you OUT |
| Beginner | Gentle version of Game mode — more forgiving, encouraging feedback |
- Live bidirectional audio — talk naturally, get spoken feedback in real-time (not turn-based)
- Audio transcription — Gemini's
outputAudioTranscriptionconverts coach speech to text for metrics - Session metrics — confidence score, filler word count, vocal energy meter update live during sessions
- Session history — dashboard tracks scores, confidence, and mode across sessions
- Progress tracking — charts show improvement over time
- Invite-only access — admin creates invitation tokens, users authenticate via token
graph TB
subgraph "Browser Client"
UI["React SPA<br/>Tailwind CSS"]
MIC["Microphone<br/>16kHz PCM"]
CAM["Camera<br/>JPEG frames"]
AUDIO_OUT["Audio Playback<br/>24kHz PCM"]
GA["Google Analytics 4"]
end
subgraph "Google Cloud Run"
EXPRESS["Express Server<br/>Node.js + TypeScript"]
WS["WebSocket<br/>/api/live"]
AUTH["Session Auth<br/>Signed Cookies"]
RATE["Rate Limiting<br/>+ Helmet CSP"]
DB["SQLite<br/>(ephemeral /tmp)"]
HEALTH["/api/healthz"]
end
subgraph "Google Cloud"
SM["Secret Manager<br/>API keys, tokens"]
CR["Cloud Run<br/>us-west1"]
end
subgraph "Gemini Live API"
GEMINI["gemini-2.5-flash-native-audio<br/>bidiGenerateContent"]
TRANSCRIPT["outputAudioTranscription"]
end
UI -->|HTTPS| EXPRESS
MIC -->|Base64 PCM| WS
CAM -->|Base64 JPEG| WS
WS -->|Audio + Video frames| GEMINI
GEMINI -->|Audio stream| WS
TRANSCRIPT -->|Text transcript| WS
WS -->|Audio + Text| UI
UI --> AUDIO_OUT
EXPRESS --> AUTH
EXPRESS --> RATE
EXPRESS --> DB
SM --> EXPRESS
CR --> EXPRESS
UI --> GA
EXPRESS --> HEALTH
sequenceDiagram
participant User
participant Browser
participant Server as Express Server<br/>(Cloud Run)
participant Gemini as Gemini Live API
User->>Browser: Click "Start Coaching"
Browser->>Server: WebSocket /api/live<br/>(cookie auth)
Server->>Server: Validate signed cookie
Server->>Gemini: ai.live.connect()<br/>model: gemini-2.5-flash-native-audio
Gemini-->>Server: Session opened
Server-->>Browser: { type: "opened" }
Browser->>Browser: Start mic + camera capture
loop Real-time coaching
User->>Browser: Speaks into mic
Browser->>Server: { type: "audio", data: base64 PCM }
Server->>Gemini: sendRealtimeInput(audio)
Note over User,Browser: Camera sends JPEG frames at 1 FPS
Browser->>Server: { type: "video", data: base64 JPEG }
Server->>Gemini: sendRealtimeInput(video)
Gemini-->>Server: Audio response + text transcript
Server-->>Browser: { type: "model", audioData, text }
Browser->>Browser: Play audio, parse text for metrics
Browser->>User: Spoken feedback + live metrics update
end
User->>Browser: Click "Stop"
Browser->>Server: { type: "stop" }
Browser->>Server: POST /api/sessions (save results)
Server->>Gemini: Close live session
| Layer | Technology | Purpose |
|---|---|---|
| AI Model | Gemini 2.5 Flash Native Audio | Bidirectional live audio coaching via bidiGenerateContent |
| SDK | @google/genai (Google GenAI SDK) |
Server-side Gemini Live API client |
| Cloud | Google Cloud Run | Source-based deployment, auto-scaling |
| Secrets | Google Cloud Secret Manager | API key, session secret, admin token |
| Server | Express + TypeScript + WebSocket | API routes, auth middleware, Gemini proxy |
| Client | React + Tailwind CSS + Vite | SPA with live audio/video capture |
| Analytics | Google Analytics 4 | Usage tracking and user journey |
| Database | SQLite (better-sqlite3) | Users, sessions, invitations |
| Security | Helmet, rate limiting, signed cookies | CSP, HSTS, brute-force protection |
- Go to https://ate-the-mic-test-316301290609.us-west1.run.app
- Sign in with:
- Email:
judge@judge.com - Access Token:
K4BTRYGE
- Email:
- Click Start Coaching and allow mic/camera access
- Start speaking — the AI coach will respond in real-time with spoken feedback
- Try Heckler mode for an adversarial experience, or Game for a 60-second challenge
- Stop the session to see your score, then check your Dashboard for session history
# Prerequisites: Node.js 22+
git clone <repo-url>
cd atethemic
npm install
cp .env.example .env.local
# Set GEMINI_API_KEY in .env.local (needs Gemini Live API access)
npm run dev
# Open http://localhost:3000# Set your GCP project
gcloud config set project YOUR_PROJECT_ID
# Create the API key secret
echo -n "YOUR_GEMINI_API_KEY" | gcloud secrets create gemini_api_key --data-file=-
# Deploy from source
gcloud run deploy ate-the-mic \
--source . \
--region us-west1 \
--allow-unauthenticated \
--set-env-vars "ADMIN_EMAIL=you@example.com,SQLITE_DB_PATH=/tmp/ate_the_mic.db" \
--update-secrets="GEMINI_API_KEY=gemini_api_key:latest"This is not a demo — the deployment includes production-grade security:
- Helmet — CSP, HSTS, X-Frame-Options, X-Content-Type-Options, Referrer-Policy
- Rate limiting — 20 auth attempts / 15 min
- Signed HTTP-only cookies — session auth, no client-side tokens in API calls
- Secret Manager — all secrets stored in Google Cloud Secret Manager
- Gzip compression — all responses compressed
- Immutable asset caching — hashed bundles cached 1 year
- WebSocket connection cap — max 2 concurrent sessions per user
- Graceful shutdown — SIGTERM closes WebSockets and DB before exit
- Global error handler — no stack traces leak to clients
- Request body limit — 1MB max payload
- Gemini model —
gemini-2.5-flash-native-audio-preview-09-2025via Live API - Google GenAI SDK —
@google/genaiforai.live.connect()withbidiGenerateContent - Google Cloud service — Cloud Run (us-west1) + Secret Manager
- Live Agent — bidirectional audio streaming with natural interruption support
- Multimodal — audio input + video input (camera) + audio output + text transcription
server.ts # Express server, Gemini Live proxy, auth, API routes
src/
App.tsx # Landing page, mode selection, navigation
components/
LiveCoach.tsx # Live coaching UI, audio/video capture, metrics
ProfileView.tsx # Dashboard, session history, progress charts
SignIn.tsx # Token-based authentication
AdminPanel.tsx # Invitation management
services/
geminiLive.ts # WebSocket client for live coaching
analytics.ts # Google Analytics 4 event tracking
contexts/
AuthContext.tsx # Session auth state management
liveSessionMetrics.ts # Confidence scoring, filler detection, game rules
types.ts # Shared TypeScript types