Ate the Mic — AI Public Speaking Coach

Real-time public speaking coach powered by Gemini Live API with native audio streaming, deployed on Google Cloud Run.

Live demo: https://ate-the-mic-test-316301290609.us-west1.run.app

What It Does

Ate the Mic is a real-time AI speaking coach that listens to you talk, watches your body language through your camera, and gives you live feedback — just like having a coach in the room. It uses the Gemini Live API's bidirectional audio streaming to create a natural, conversational coaching experience where the AI can interrupt you mid-sentence to point out filler words or encourage you to slow down.

Four Coaching Modes

Mode	What happens
Coach	Supportive mentor — tips on pacing, eye contact, filler words, confidence
Heckler	Adversarial audience — tests your composure under pressure
Game (Hard)	"Just a Minute" challenge — speak for 60 seconds without filler words or the coach calls you OUT
Beginner	Gentle version of Game mode — more forgiving, encouraging feedback

Key Features

Live bidirectional audio — talk naturally, get spoken feedback in real-time (not turn-based)
Audio transcription — Gemini's outputAudioTranscription converts coach speech to text for metrics
Session metrics — confidence score, filler word count, vocal energy meter update live during sessions
Session history — dashboard tracks scores, confidence, and mode across sessions
Progress tracking — charts show improvement over time
Invite-only access — admin creates invitation tokens, users authenticate via token

Architecture

graph TB
    subgraph "Browser Client"
        UI["React SPA<br/>Tailwind CSS"]
        MIC["Microphone<br/>16kHz PCM"]
        CAM["Camera<br/>JPEG frames"]
        AUDIO_OUT["Audio Playback<br/>24kHz PCM"]
        GA["Google Analytics 4"]
    end

    subgraph "Google Cloud Run"
        EXPRESS["Express Server<br/>Node.js + TypeScript"]
        WS["WebSocket<br/>/api/live"]
        AUTH["Session Auth<br/>Signed Cookies"]
        RATE["Rate Limiting<br/>+ Helmet CSP"]
        DB["SQLite<br/>(ephemeral /tmp)"]
        HEALTH["/api/healthz"]
    end

    subgraph "Google Cloud"
        SM["Secret Manager<br/>API keys, tokens"]
        CR["Cloud Run<br/>us-west1"]
    end

    subgraph "Gemini Live API"
        GEMINI["gemini-2.5-flash-native-audio<br/>bidiGenerateContent"]
        TRANSCRIPT["outputAudioTranscription"]
    end

    UI -->|HTTPS| EXPRESS
    MIC -->|Base64 PCM| WS
    CAM -->|Base64 JPEG| WS
    WS -->|Audio + Video frames| GEMINI
    GEMINI -->|Audio stream| WS
    TRANSCRIPT -->|Text transcript| WS
    WS -->|Audio + Text| UI
    UI --> AUDIO_OUT
    EXPRESS --> AUTH
    EXPRESS --> RATE
    EXPRESS --> DB
    SM --> EXPRESS
    CR --> EXPRESS
    UI --> GA
    EXPRESS --> HEALTH

Live Session Data Flow

sequenceDiagram
    participant User
    participant Browser
    participant Server as Express Server<br/>(Cloud Run)
    participant Gemini as Gemini Live API

    User->>Browser: Click "Start Coaching"
    Browser->>Server: WebSocket /api/live<br/>(cookie auth)
    Server->>Server: Validate signed cookie
    Server->>Gemini: ai.live.connect()<br/>model: gemini-2.5-flash-native-audio
    Gemini-->>Server: Session opened
    Server-->>Browser: { type: "opened" }
    Browser->>Browser: Start mic + camera capture

    loop Real-time coaching
        User->>Browser: Speaks into mic
        Browser->>Server: { type: "audio", data: base64 PCM }
        Server->>Gemini: sendRealtimeInput(audio)
        Note over User,Browser: Camera sends JPEG frames at 1 FPS
        Browser->>Server: { type: "video", data: base64 JPEG }
        Server->>Gemini: sendRealtimeInput(video)
        Gemini-->>Server: Audio response + text transcript
        Server-->>Browser: { type: "model", audioData, text }
        Browser->>Browser: Play audio, parse text for metrics
        Browser->>User: Spoken feedback + live metrics update
    end

    User->>Browser: Click "Stop"
    Browser->>Server: { type: "stop" }
    Browser->>Server: POST /api/sessions (save results)
    Server->>Gemini: Close live session

Tech Stack

Layer	Technology	Purpose
AI Model	Gemini 2.5 Flash Native Audio	Bidirectional live audio coaching via `bidiGenerateContent`
SDK	`@google/genai` (Google GenAI SDK)	Server-side Gemini Live API client
Cloud	Google Cloud Run	Source-based deployment, auto-scaling
Secrets	Google Cloud Secret Manager	API key, session secret, admin token
Server	Express + TypeScript + WebSocket	API routes, auth middleware, Gemini proxy
Client	React + Tailwind CSS + Vite	SPA with live audio/video capture
Analytics	Google Analytics 4	Usage tracking and user journey
Database	SQLite (better-sqlite3)	Users, sessions, invitations
Security	Helmet, rate limiting, signed cookies	CSP, HSTS, brute-force protection

Try It — Quick Start for Judges

Option A: Use the live deployment

Go to https://ate-the-mic-test-316301290609.us-west1.run.app
Sign in with:
- Email: judge@judge.com
- Access Token: K4BTRYGE
Click Start Coaching and allow mic/camera access
Start speaking — the AI coach will respond in real-time with spoken feedback
Try Heckler mode for an adversarial experience, or Game for a 60-second challenge
Stop the session to see your score, then check your Dashboard for session history

Option B: Run locally

# Prerequisites: Node.js 22+
git clone <repo-url>
cd atethemic
npm install
cp .env.example .env.local
# Set GEMINI_API_KEY in .env.local (needs Gemini Live API access)
npm run dev
# Open http://localhost:3000

Option C: Deploy your own instance

# Set your GCP project
gcloud config set project YOUR_PROJECT_ID

# Create the API key secret
echo -n "YOUR_GEMINI_API_KEY" | gcloud secrets create gemini_api_key --data-file=-

# Deploy from source
gcloud run deploy ate-the-mic \
  --source . \
  --region us-west1 \
  --allow-unauthenticated \
  --set-env-vars "ADMIN_EMAIL=you@example.com,SQLITE_DB_PATH=/tmp/ate_the_mic.db" \
  --update-secrets="GEMINI_API_KEY=gemini_api_key:latest"

Production Hardening

This is not a demo — the deployment includes production-grade security:

Helmet — CSP, HSTS, X-Frame-Options, X-Content-Type-Options, Referrer-Policy
Rate limiting — 20 auth attempts / 15 min
Signed HTTP-only cookies — session auth, no client-side tokens in API calls
Secret Manager — all secrets stored in Google Cloud Secret Manager
Gzip compression — all responses compressed
Immutable asset caching — hashed bundles cached 1 year
WebSocket connection cap — max 2 concurrent sessions per user
Graceful shutdown — SIGTERM closes WebSockets and DB before exit
Global error handler — no stack traces leak to clients
Request body limit — 1MB max payload

Challenge Criteria Checklist

Gemini model — gemini-2.5-flash-native-audio-preview-09-2025 via Live API
Google GenAI SDK — @google/genai for ai.live.connect() with bidiGenerateContent
Google Cloud service — Cloud Run (us-west1) + Secret Manager
Live Agent — bidirectional audio streaming with natural interruption support
Multimodal — audio input + video input (camera) + audio output + text transcription

Repository Structure

server.ts                 # Express server, Gemini Live proxy, auth, API routes
src/
  App.tsx                 # Landing page, mode selection, navigation
  components/
    LiveCoach.tsx         # Live coaching UI, audio/video capture, metrics
    ProfileView.tsx       # Dashboard, session history, progress charts
    SignIn.tsx            # Token-based authentication
    AdminPanel.tsx        # Invitation management
  services/
    geminiLive.ts         # WebSocket client for live coaching
    analytics.ts          # Google Analytics 4 event tracking
  contexts/
    AuthContext.tsx        # Session auth state management
  liveSessionMetrics.ts   # Confidence scoring, filler detection, game rules
  types.ts                # Shared TypeScript types

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
docs		docs
src		src
.env.example		.env.example
.gcloudignore		.gcloudignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
deploy.sh		deploy.sh
index.html		index.html
metadata.json		metadata.json
package-lock.json		package-lock.json
package.json		package.json
server.ts		server.ts
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ate the Mic — AI Public Speaking Coach

What It Does

Four Coaching Modes

Key Features

Architecture

Live Session Data Flow

Tech Stack

Try It — Quick Start for Judges

Option A: Use the live deployment

Option B: Run locally

Option C: Deploy your own instance

Production Hardening

Challenge Criteria Checklist

Repository Structure

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Ate the Mic — AI Public Speaking Coach

What It Does

Four Coaching Modes

Key Features

Architecture

Live Session Data Flow

Tech Stack

Try It — Quick Start for Judges

Option A: Use the live deployment

Option B: Run locally

Option C: Deploy your own instance

Production Hardening

Challenge Criteria Checklist

Repository Structure

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages