Skip to content

Latest commit

 

History

History
81 lines (59 loc) · 3.79 KB

File metadata and controls

81 lines (59 loc) · 3.79 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

A LiveKit-based voice AI agent for the SAM 2 No Code Finetuning web application. The agent assists users via real-time voice conversation, answering questions about LoRA fine-tuning, manufacturing processes, and the web app's training configuration. It uses RAG (MongoDB Atlas Vector Search + Voyage AI embeddings) and web search (Parallel API) as tool-augmented knowledge sources.

This repo is a git submodule of the parent sam2finetuning project and connects to the same Neon PostgreSQL database used by the Next.js frontend for user/session management.

Common Commands

# Install dependencies (uses uv, not pip)
uv sync --locked

# Run the agent locally (connects to LiveKit)
uv run python3 -m src.agent start

# Pre-download ML models (Silero VAD, etc.)
uv run python3 -m src.agent download-files

# Container build and run (uses podman, not docker)
make build        # Build container image
make run          # Run container (mounts ./src for live reload)
make attach       # Shell into running container
make remove       # Remove container
make rmi          # Remove image

Architecture

Agent Lifecycle

  1. Job request (on_request): Validates user metadata from the LiveKit room, checks user exists in Neon DB and hasn't exceeded SESSION_TIME_LIMIT_SECONDS
  2. Session start (entrypoint): Creates AgentSession with Deepgram STT → OpenAI GPT-4.1-nano LLM → Cartesia TTS pipeline, starts Lemonslice avatar, connects MongoDB for RAG
  3. Conversation: The Assistant agent handles voice interaction with two function tools: search_knowledge_base (RAG first) and web_search (Parallel API fallback)
  4. Session end: Writes usage to DB, generates conversation summary via OpenAI, emails it to the user via Resend, then tears down the LiveKit room

Key Files

File Purpose
src/agent.py Main entrypoint — Assistant class, job lifecycle, session time limit logic
src/rag.py MongoDB vector database wrapper and RAG class (embed, upload, query, rerank)
src/usersession.py SQLModel ORM models (User, UserSession) and VoiceAgentUsageDatabase async context manager
src/email.py Post-session email: summary generation (OpenAI), HTML templating, sending (Resend)
src/prompts.py System prompt for the voice agent
src/observability.py Logfire/OpenTelemetry setup, shared with LiveKit's tracer

Data Stores

  • Neon PostgreSQL (DATABASE_URL): User accounts and voice session usage tracking via SQLModel/asyncpg
  • MongoDB Atlas (MONGO_URI): RAG vector store (sam2webappdocs.vectorstore) with Voyage AI contextualized embeddings (1024-dim, dotProduct similarity)

External Services

Service Env Vars
LiveKit LIVEKIT_URL, LIVEKIT_API_KEY, LIVEKIT_API_SECRET
Deepgram (STT) DEEPGRAM_API_KEY
OpenAI (LLM + summaries) OPENAI_API_KEY
Cartesia (TTS) CARTESIA_API_KEY, CARTESIA_VOICE_ID, CARTESIA_PRONUNCIATION_DICT_ID
Lemonslice (Avatar) LEMONSLICE_API_KEY
Voyage AI (Embeddings) VOYAGE_API_KEY, VOYAGE_EMBEDDING_MODEL
MongoDB Atlas MONGO_URI
Neon PostgreSQL DATABASE_URL
Parallel (Web search) PARALLEL_API_KEY
Resend (Email) RESEND_API_KEY
Logfire (Observability) LOGFIRE_TOKEN

Submodule

  • agent-starter-embed/ — LiveKit agent embed starter (frontend widget for embedding the voice agent in the web app)

Environment

  • Python 3.13+, managed by uv
  • All env vars loaded from .env.local (gitignored)
  • Container runtime is podman (not docker) — see Makefile
  • Session time limit controlled by SESSION_TIME_LIMIT_SECONDS env var (default: 120s)