Skip to content

taylorparsons/atethemic

Repository files navigation

Ate the Mic — AI Public Speaking Coach

Real-time public speaking coach powered by Gemini Live API with native audio streaming, deployed on Google Cloud Run.

Live demo: https://ate-the-mic-test-316301290609.us-west1.run.app


What It Does

Ate the Mic is a real-time AI speaking coach that listens to you talk, watches your body language through your camera, and gives you live feedback — just like having a coach in the room. It uses the Gemini Live API's bidirectional audio streaming to create a natural, conversational coaching experience where the AI can interrupt you mid-sentence to point out filler words or encourage you to slow down.

Four Coaching Modes

Mode What happens
Coach Supportive mentor — tips on pacing, eye contact, filler words, confidence
Heckler Adversarial audience — tests your composure under pressure
Game (Hard) "Just a Minute" challenge — speak for 60 seconds without filler words or the coach calls you OUT
Beginner Gentle version of Game mode — more forgiving, encouraging feedback

Key Features

  • Live bidirectional audio — talk naturally, get spoken feedback in real-time (not turn-based)
  • Audio transcription — Gemini's outputAudioTranscription converts coach speech to text for metrics
  • Session metrics — confidence score, filler word count, vocal energy meter update live during sessions
  • Session history — dashboard tracks scores, confidence, and mode across sessions
  • Progress tracking — charts show improvement over time
  • Invite-only access — admin creates invitation tokens, users authenticate via token

Architecture

graph TB
    subgraph "Browser Client"
        UI["React SPA<br/>Tailwind CSS"]
        MIC["Microphone<br/>16kHz PCM"]
        CAM["Camera<br/>JPEG frames"]
        AUDIO_OUT["Audio Playback<br/>24kHz PCM"]
        GA["Google Analytics 4"]
    end

    subgraph "Google Cloud Run"
        EXPRESS["Express Server<br/>Node.js + TypeScript"]
        WS["WebSocket<br/>/api/live"]
        AUTH["Session Auth<br/>Signed Cookies"]
        RATE["Rate Limiting<br/>+ Helmet CSP"]
        DB["SQLite<br/>(ephemeral /tmp)"]
        HEALTH["/api/healthz"]
    end

    subgraph "Google Cloud"
        SM["Secret Manager<br/>API keys, tokens"]
        CR["Cloud Run<br/>us-west1"]
    end

    subgraph "Gemini Live API"
        GEMINI["gemini-2.5-flash-native-audio<br/>bidiGenerateContent"]
        TRANSCRIPT["outputAudioTranscription"]
    end

    UI -->|HTTPS| EXPRESS
    MIC -->|Base64 PCM| WS
    CAM -->|Base64 JPEG| WS
    WS -->|Audio + Video frames| GEMINI
    GEMINI -->|Audio stream| WS
    TRANSCRIPT -->|Text transcript| WS
    WS -->|Audio + Text| UI
    UI --> AUDIO_OUT
    EXPRESS --> AUTH
    EXPRESS --> RATE
    EXPRESS --> DB
    SM --> EXPRESS
    CR --> EXPRESS
    UI --> GA
    EXPRESS --> HEALTH
Loading

Live Session Data Flow

sequenceDiagram
    participant User
    participant Browser
    participant Server as Express Server<br/>(Cloud Run)
    participant Gemini as Gemini Live API

    User->>Browser: Click "Start Coaching"
    Browser->>Server: WebSocket /api/live<br/>(cookie auth)
    Server->>Server: Validate signed cookie
    Server->>Gemini: ai.live.connect()<br/>model: gemini-2.5-flash-native-audio
    Gemini-->>Server: Session opened
    Server-->>Browser: { type: "opened" }
    Browser->>Browser: Start mic + camera capture

    loop Real-time coaching
        User->>Browser: Speaks into mic
        Browser->>Server: { type: "audio", data: base64 PCM }
        Server->>Gemini: sendRealtimeInput(audio)
        Note over User,Browser: Camera sends JPEG frames at 1 FPS
        Browser->>Server: { type: "video", data: base64 JPEG }
        Server->>Gemini: sendRealtimeInput(video)
        Gemini-->>Server: Audio response + text transcript
        Server-->>Browser: { type: "model", audioData, text }
        Browser->>Browser: Play audio, parse text for metrics
        Browser->>User: Spoken feedback + live metrics update
    end

    User->>Browser: Click "Stop"
    Browser->>Server: { type: "stop" }
    Browser->>Server: POST /api/sessions (save results)
    Server->>Gemini: Close live session
Loading

Tech Stack

Layer Technology Purpose
AI Model Gemini 2.5 Flash Native Audio Bidirectional live audio coaching via bidiGenerateContent
SDK @google/genai (Google GenAI SDK) Server-side Gemini Live API client
Cloud Google Cloud Run Source-based deployment, auto-scaling
Secrets Google Cloud Secret Manager API key, session secret, admin token
Server Express + TypeScript + WebSocket API routes, auth middleware, Gemini proxy
Client React + Tailwind CSS + Vite SPA with live audio/video capture
Analytics Google Analytics 4 Usage tracking and user journey
Database SQLite (better-sqlite3) Users, sessions, invitations
Security Helmet, rate limiting, signed cookies CSP, HSTS, brute-force protection

Try It — Quick Start for Judges

Option A: Use the live deployment

  1. Go to https://ate-the-mic-test-316301290609.us-west1.run.app
  2. Sign in with:
    • Email: judge@judge.com
    • Access Token: K4BTRYGE
  3. Click Start Coaching and allow mic/camera access
  4. Start speaking — the AI coach will respond in real-time with spoken feedback
  5. Try Heckler mode for an adversarial experience, or Game for a 60-second challenge
  6. Stop the session to see your score, then check your Dashboard for session history

Option B: Run locally

# Prerequisites: Node.js 22+
git clone <repo-url>
cd atethemic
npm install
cp .env.example .env.local
# Set GEMINI_API_KEY in .env.local (needs Gemini Live API access)
npm run dev
# Open http://localhost:3000

Option C: Deploy your own instance

# Set your GCP project
gcloud config set project YOUR_PROJECT_ID

# Create the API key secret
echo -n "YOUR_GEMINI_API_KEY" | gcloud secrets create gemini_api_key --data-file=-

# Deploy from source
gcloud run deploy ate-the-mic \
  --source . \
  --region us-west1 \
  --allow-unauthenticated \
  --set-env-vars "ADMIN_EMAIL=you@example.com,SQLITE_DB_PATH=/tmp/ate_the_mic.db" \
  --update-secrets="GEMINI_API_KEY=gemini_api_key:latest"

Production Hardening

This is not a demo — the deployment includes production-grade security:

  • Helmet — CSP, HSTS, X-Frame-Options, X-Content-Type-Options, Referrer-Policy
  • Rate limiting — 20 auth attempts / 15 min
  • Signed HTTP-only cookies — session auth, no client-side tokens in API calls
  • Secret Manager — all secrets stored in Google Cloud Secret Manager
  • Gzip compression — all responses compressed
  • Immutable asset caching — hashed bundles cached 1 year
  • WebSocket connection cap — max 2 concurrent sessions per user
  • Graceful shutdown — SIGTERM closes WebSockets and DB before exit
  • Global error handler — no stack traces leak to clients
  • Request body limit — 1MB max payload

Challenge Criteria Checklist

  • Gemini modelgemini-2.5-flash-native-audio-preview-09-2025 via Live API
  • Google GenAI SDK@google/genai for ai.live.connect() with bidiGenerateContent
  • Google Cloud service — Cloud Run (us-west1) + Secret Manager
  • Live Agent — bidirectional audio streaming with natural interruption support
  • Multimodal — audio input + video input (camera) + audio output + text transcription

Repository Structure

server.ts                 # Express server, Gemini Live proxy, auth, API routes
src/
  App.tsx                 # Landing page, mode selection, navigation
  components/
    LiveCoach.tsx         # Live coaching UI, audio/video capture, metrics
    ProfileView.tsx       # Dashboard, session history, progress charts
    SignIn.tsx            # Token-based authentication
    AdminPanel.tsx        # Invitation management
  services/
    geminiLive.ts         # WebSocket client for live coaching
    analytics.ts          # Google Analytics 4 event tracking
  contexts/
    AuthContext.tsx        # Session auth state management
  liveSessionMetrics.ts   # Confidence scoring, filler detection, game rules
  types.ts                # Shared TypeScript types

About

Real-time AI speaking coach built on Gemini Live API. Helps users overcome speaking anxiety and level up public speaking through live, interactive feedback.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages