This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
A LiveKit-based voice AI agent for the SAM 2 No Code Finetuning web application. The agent assists users via real-time voice conversation, answering questions about LoRA fine-tuning, manufacturing processes, and the web app's training configuration. It uses RAG (MongoDB Atlas Vector Search + Voyage AI embeddings) and web search (Parallel API) as tool-augmented knowledge sources.
This repo is a git submodule of the parent sam2finetuning project and connects to the same Neon PostgreSQL database used by the Next.js frontend for user/session management.
# Install dependencies (uses uv, not pip)
uv sync --locked
# Run the agent locally (connects to LiveKit)
uv run python3 -m src.agent start
# Pre-download ML models (Silero VAD, etc.)
uv run python3 -m src.agent download-files
# Container build and run (uses podman, not docker)
make build # Build container image
make run # Run container (mounts ./src for live reload)
make attach # Shell into running container
make remove # Remove container
make rmi # Remove image- Job request (
on_request): Validates user metadata from the LiveKit room, checks user exists in Neon DB and hasn't exceededSESSION_TIME_LIMIT_SECONDS - Session start (
entrypoint): CreatesAgentSessionwith Deepgram STT → OpenAI GPT-4.1-nano LLM → Cartesia TTS pipeline, starts Lemonslice avatar, connects MongoDB for RAG - Conversation: The
Assistantagent handles voice interaction with two function tools:search_knowledge_base(RAG first) andweb_search(Parallel API fallback) - Session end: Writes usage to DB, generates conversation summary via OpenAI, emails it to the user via Resend, then tears down the LiveKit room
| File | Purpose |
|---|---|
src/agent.py |
Main entrypoint — Assistant class, job lifecycle, session time limit logic |
src/rag.py |
MongoDB vector database wrapper and RAG class (embed, upload, query, rerank) |
src/usersession.py |
SQLModel ORM models (User, UserSession) and VoiceAgentUsageDatabase async context manager |
src/email.py |
Post-session email: summary generation (OpenAI), HTML templating, sending (Resend) |
src/prompts.py |
System prompt for the voice agent |
src/observability.py |
Logfire/OpenTelemetry setup, shared with LiveKit's tracer |
- Neon PostgreSQL (
DATABASE_URL): User accounts and voice session usage tracking via SQLModel/asyncpg - MongoDB Atlas (
MONGO_URI): RAG vector store (sam2webappdocs.vectorstore) with Voyage AI contextualized embeddings (1024-dim, dotProduct similarity)
| Service | Env Vars |
|---|---|
| LiveKit | LIVEKIT_URL, LIVEKIT_API_KEY, LIVEKIT_API_SECRET |
| Deepgram (STT) | DEEPGRAM_API_KEY |
| OpenAI (LLM + summaries) | OPENAI_API_KEY |
| Cartesia (TTS) | CARTESIA_API_KEY, CARTESIA_VOICE_ID, CARTESIA_PRONUNCIATION_DICT_ID |
| Lemonslice (Avatar) | LEMONSLICE_API_KEY |
| Voyage AI (Embeddings) | VOYAGE_API_KEY, VOYAGE_EMBEDDING_MODEL |
| MongoDB Atlas | MONGO_URI |
| Neon PostgreSQL | DATABASE_URL |
| Parallel (Web search) | PARALLEL_API_KEY |
| Resend (Email) | RESEND_API_KEY |
| Logfire (Observability) | LOGFIRE_TOKEN |
- agent-starter-embed/ — LiveKit agent embed starter (frontend widget for embedding the voice agent in the web app)
- Python 3.13+, managed by
uv - All env vars loaded from
.env.local(gitignored) - Container runtime is podman (not docker) — see
Makefile - Session time limit controlled by
SESSION_TIME_LIMIT_SECONDSenv var (default: 120s)