This project demonstrates a real-time voice conversation using Twilio (over phone) and Google's Gemini Multimodal Live API (via Vertex AI).
- Twilio: Handles the phone call and streams audio to this server via WebSocket.
- FastAPI Server: Receives audio from Twilio, transcodes it, and sends it to Gemini.
- Gemini (Vertex AI): Processes the audio and returns a generated audio response.
- FastAPI Server: Receives audio from Gemini, transcodes it back, and sends it to Twilio.
- Python 3.10+
- A Google Cloud Project with Vertex AI API and Cloud Run API enabled.
- Google Cloud CLI installed and authenticated (
gcloud auth login). - A Twilio Account and a purchased phone number.
-
Install Dependencies (Local Development):
pip install -r requirements.txt
-
Google Cloud Authentication:
gcloud auth login gcloud config set project YOUR_PROJECT_ID -
Deploy to Cloud Run:
We will deploy the container directly to Cloud Run. This handles the SSL and public URL for us.
./deploy.sh
- If prompted to enable APIs (Cloud Build, Cloud Run), say yes.
- Once finished, it will output a Service URL (e.g.,
https://call-me-live-api-xyz.a.run.app).
-
Go to the Twilio Console.
-
Navigate to Voice > TwiML > TwiML Apps.
-
Create a new TwiML App (or update an existing one).
-
Set the Voice Request URL to your Cloud Run URL with the
/incoming-callpath:https://YOUR-CLOUD-RUN-URL.a.run.app/incoming-call -
Configure your Twilio Phone Number to use this TwiML App.
- Call your Twilio phone number.
- Speak to Gemini!
- Hosting: Google Cloud Run (Serverless Container).
- Audio Format: Twilio uses G.711 mulaw at 8000Hz. Gemini uses PCM (Linear 16-bit) at 24000Hz.
- Transcoding: The
audiooplibrary is used to convert between these formats in real-time.