A conversational AI assistant with multi-turn dialogue, real-time voice input, LLM-powered tool calling, and smart home integration.
- 🎤 Real-time Voice Input - AssemblyAI streaming transcription
- 🧠 Dual LLM Support - Groq (qwen3-32b) or Gemini with easy switching
- 🔧 Tool Calling - LLM autonomously uses weather, music, IoT, and web search tools
- 🏠 Smart Home Control - Voice-controlled lights and devices
- 🎵 Music Playback - YouTube Music integration with queue management
- 🌤️ Weather Queries - Real-time weather with 3-day forecast
- 🔍 Web Search - Tavily-powered search for current information
- 🗣️ Text-to-Speech - Rime TTS with low-latency PCM streaming
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Frontend │────▶│ Backend │────▶│ Modal Classifier│
│ (Astro/React) │ │ (FastAPI) │ │ (XLM-R NLU) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │
│ ├──▶ Groq/Gemini LLM (Tool Calling)
│ ├──▶ WeatherAPI
│ ├──▶ YouTube Music API
│ ├──▶ Tavily Search API
│ └──▶ Rime TTS API
│
└──────────────▶ AssemblyAI (Real-time STT)
- Docker & Docker Compose
- API Keys: AssemblyAI, Groq, WeatherAPI, (optional) Tavily
-
Clone and configure:
git clone <repo-url> cd VCNI cp .env.example .env # Edit .env with your API keys
-
Run with Docker:
docker compose up --build
-
Access:
- Frontend: http://localhost:4321
- Backend API: http://localhost:8000/docs
| Variable | Required | Description |
|---|---|---|
ASSEMBLYAI_API_KEY |
Yes | Real-time speech-to-text |
GROQ_API_KEY |
Yes* | Groq LLM for tool calling |
GEMINI_API_KEY |
Yes* | Google Gemini (alternative LLM) |
LLM_PROVIDER |
No | groq or gemini (default: groq) |
WEATHERAPI_KEY |
No | Weather data |
TAVILY_API_KEY |
No | Web search |
RIME_API_KEY |
No | Text-to-speech |
*At least one LLM provider required
# In .env
LLM_PROVIDER=groq # Use Groq with tool calling
LLM_PROVIDER=gemini # Use Gemini| Method | Endpoint | Description |
|---|---|---|
| GET | /health |
Health check |
| GET | /api/assemblyai/token |
Get STT token |
| POST | /api/nlu/process |
Process text through NLU |
| POST | /api/tts/stream |
Stream TTS audio |
VCNI/
├── backend/
│ ├── app/
│ │ ├── services/ # Weather, Music, IoT, Groq, Search
│ │ ├── controller.py # Main orchestration
│ │ ├── tool_executor.py # LLM tool execution
│ │ └── main.py # FastAPI app
│ └── Dockerfile
├── frontend/
│ ├── src/
│ │ ├── pages/api/ # API proxy routes
│ │ ├── store/ # Zustand + VoiceClient
│ │ └── components/ # React widgets
│ └── Dockerfile
├── ML/
│ └── inference/ # Modal classifier
├── docker-compose.yml
└── .env.example
| Intent | UI Mode | Description |
|---|---|---|
weather_query |
weather | Get weather info |
play_music |
music | Play music |
iot_hue_* |
smart_home | Control devices |
qa_factoid |
ai_response | General questions |
general_greet |
ai_response | Greetings |
MIT