A production-grade voice AI agent capable of holding natural sales conversations with <500ms latency.
Tech Stack:
- Inference: Cerebras (LLaMA 3.3 70B) - Instant token generation.
- Transport: LiveKit - Real-time WebRTC infrastructure.
- STT: Deepgram - Nova-2 Speech-to-Text.
- TTS: Cartesia - Sonic Low-latency Text-to-Speech.
- VAD: Silero Voice Activity Detection.
The agent runs a continuous loop:
- Listen: Ingest audio stream via LiveKit/Deepgram.
- Think: Process query with LLaMA 3.3 on Cerebras Wafer-Scale Engine.
- Speak: Stream text to Cartesia for instant audio synthesis.
This project is designed to run in Google Colab or a local Python environment.
-
Install dependencies:
pip install livekit-agents livekit-plugins-openai livekit-plugins-deepgram livekit-plugins-silero
-
Set API Keys:
CEREBRAS_API_KEYLIVEKIT_API_KEYDEEPGRAM_API_KEYCARTESIA_API_KEY
-
Run the notebook
virtual_sales_agent.ipynb.
I used Antigravity to automate this ReadMe!