CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

A LiveKit-based voice AI agent for the SAM 2 No Code Finetuning web application. The agent assists users via real-time voice conversation, answering questions about LoRA fine-tuning, manufacturing processes, and the web app's training configuration. It uses RAG (MongoDB Atlas Vector Search + Voyage AI embeddings) and web search (Parallel API) as tool-augmented knowledge sources.

This repo is a git submodule of the parent sam2finetuning project and connects to the same Neon PostgreSQL database used by the Next.js frontend for user/session management.

Common Commands

# Install dependencies (uses uv, not pip)
uv sync --locked

# Run the agent locally (connects to LiveKit)
uv run python3 -m src.agent start

# Pre-download ML models (Silero VAD, etc.)
uv run python3 -m src.agent download-files

# Container build and run (uses podman, not docker)
make build        # Build container image
make run          # Run container (mounts ./src for live reload)
make attach       # Shell into running container
make remove       # Remove container
make rmi          # Remove image

Architecture

Agent Lifecycle

Job request (on_request): Validates user metadata from the LiveKit room, checks user exists in Neon DB and hasn't exceeded SESSION_TIME_LIMIT_SECONDS
Session start (entrypoint): Creates AgentSession with Deepgram STT → OpenAI GPT-4.1-nano LLM → Cartesia TTS pipeline, starts Lemonslice avatar, connects MongoDB for RAG
Conversation: The Assistant agent handles voice interaction with two function tools: search_knowledge_base (RAG first) and web_search (Parallel API fallback)
Session end: Writes usage to DB, generates conversation summary via OpenAI, emails it to the user via Resend, then tears down the LiveKit room

Key Files

File	Purpose
`src/agent.py`	Main entrypoint — `Assistant` class, job lifecycle, session time limit logic
`src/rag.py`	`MongoDB` vector database wrapper and `RAG` class (embed, upload, query, rerank)
`src/usersession.py`	SQLModel ORM models (`User`, `UserSession`) and `VoiceAgentUsageDatabase` async context manager
`src/email.py`	Post-session email: summary generation (OpenAI), HTML templating, sending (Resend)
`src/prompts.py`	System prompt for the voice agent
`src/observability.py`	Logfire/OpenTelemetry setup, shared with LiveKit's tracer

Data Stores

Neon PostgreSQL (DATABASE_URL): User accounts and voice session usage tracking via SQLModel/asyncpg
MongoDB Atlas (MONGO_URI): RAG vector store (sam2webappdocs.vectorstore) with Voyage AI contextualized embeddings (1024-dim, dotProduct similarity)

External Services

Service	Env Vars
LiveKit	`LIVEKIT_URL`, `LIVEKIT_API_KEY`, `LIVEKIT_API_SECRET`
Deepgram (STT)	`DEEPGRAM_API_KEY`
OpenAI (LLM + summaries)	`OPENAI_API_KEY`
Cartesia (TTS)	`CARTESIA_API_KEY`, `CARTESIA_VOICE_ID`, `CARTESIA_PRONUNCIATION_DICT_ID`
Lemonslice (Avatar)	`LEMONSLICE_API_KEY`
Voyage AI (Embeddings)	`VOYAGE_API_KEY`, `VOYAGE_EMBEDDING_MODEL`
MongoDB Atlas	`MONGO_URI`
Neon PostgreSQL	`DATABASE_URL`
Parallel (Web search)	`PARALLEL_API_KEY`
Resend (Email)	`RESEND_API_KEY`
Logfire (Observability)	`LOGFIRE_TOKEN`

Submodule

agent-starter-embed/ — LiveKit agent embed starter (frontend widget for embedding the voice agent in the web app)

Environment

Python 3.13+, managed by uv
All env vars loaded from .env.local (gitignored)
Container runtime is podman (not docker) — see Makefile
Session time limit controlled by SESSION_TIME_LIMIT_SECONDS env var (default: 120s)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md

Project Overview

Common Commands

Architecture

Agent Lifecycle

Key Files

Data Stores

External Services

Submodule

Environment

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Project Overview

Common Commands

Architecture

Agent Lifecycle

Key Files

Data Stores

External Services

Submodule

Environment