Skip to content

ZackAkil/gemini-live-api-twilio-phone

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gemini Live API via Phone call (with Twilio)

This project demonstrates a real-time voice conversation using Twilio (over phone) and Google's Gemini Multimodal Live API (via Vertex AI).

Architecture

  1. Twilio: Handles the phone call and streams audio to this server via WebSocket.
  2. FastAPI Server: Receives audio from Twilio, transcodes it, and sends it to Gemini.
  3. Gemini (Vertex AI): Processes the audio and returns a generated audio response.
  4. FastAPI Server: Receives audio from Gemini, transcodes it back, and sends it to Twilio.

Prerequisites

  • Python 3.10+
  • A Google Cloud Project with Vertex AI API and Cloud Run API enabled.
  • Google Cloud CLI installed and authenticated (gcloud auth login).
  • A Twilio Account and a purchased phone number.

Setup & Deployment

  1. Install Dependencies (Local Development):

    pip install -r requirements.txt
  2. Google Cloud Authentication:

    gcloud auth login
    gcloud config set project YOUR_PROJECT_ID
  3. Deploy to Cloud Run:

    We will deploy the container directly to Cloud Run. This handles the SSL and public URL for us.

    ./deploy.sh
    • If prompted to enable APIs (Cloud Build, Cloud Run), say yes.
    • Once finished, it will output a Service URL (e.g., https://call-me-live-api-xyz.a.run.app).

Twilio Configuration

  1. Go to the Twilio Console.

  2. Navigate to Voice > TwiML > TwiML Apps.

  3. Create a new TwiML App (or update an existing one).

  4. Set the Voice Request URL to your Cloud Run URL with the /incoming-call path:

    https://YOUR-CLOUD-RUN-URL.a.run.app/incoming-call

  5. Configure your Twilio Phone Number to use this TwiML App.

Usage

  1. Call your Twilio phone number.
  2. Speak to Gemini!

Technical Details

  • Hosting: Google Cloud Run (Serverless Container).
  • Audio Format: Twilio uses G.711 mulaw at 8000Hz. Gemini uses PCM (Linear 16-bit) at 24000Hz.
  • Transcoding: The audioop library is used to convert between these formats in real-time.

About

This project demonstrates a real-time voice conversation using Twilio (over phone) and Google's Gemini Multimodal Live API (via Vertex AI).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors