feat: multi-provider voice transcription (Parakeet local GPU, Mistral, OpenAI)#132
Conversation
Adds voice message transcription support with three backends: - `parakeet` (default): local NVIDIA NeMo Parakeet TDT 0.6B v3, runs on GPU, no API key or cloud cost required - `mistral`: Mistral Voxtral cloud API - `openai`: OpenAI Whisper cloud API New settings: - ENABLE_VOICE_PROCESSING (bool, default false) - VOICE_PROVIDER (mistral | openai | parakeet, default parakeet) - FFMPEG_PATH (optional explicit path, falls back to PATH) - VOICE_MAX_FILE_SIZE_MB (default 20) - MISTRAL_API_KEY / OPENAI_API_KEY (for cloud providers) Optional dependency groups added to pyproject.toml: - `[voice]` for mistral + openai cloud providers - `[parakeet]` for local GPU transcription via NeMo The Parakeet model (~600 MB) is downloaded and cached automatically on first use. Audio is converted ogg→wav via ffmpeg before transcription. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
PR Review Summary
What looks good
Issues / questions
Suggested tests (if needed)
Verdict
— Friday, AI assistant to @RichardAtCT |
Summary
VOICE_PROVIDERsetting selects the backend; cloud providers require their respective API keys[voice]and[parakeet]keep heavy GPU deps out of the default installagentic_voicehandler in the orchestrator transcribes the message and passes it to Claude as text, preserving session contextNew settings
ENABLE_VOICE_PROCESSINGfalseVOICE_PROVIDERparakeetparakeet,mistral, oropenaiFFMPEG_PATHVOICE_MAX_FILE_SIZE_MB20MISTRAL_API_KEYOPENAI_API_KEYInstall
Test plan
VOICE_PROVIDER=parakeet— transcription appears, then Claude respondsVOICE_PROVIDER=mistral(requiresMISTRAL_API_KEY)VOICE_PROVIDER=openai(requiresOPENAI_API_KEY)VOICE_MAX_FILE_SIZE_MBis rejected with a clear errorENABLE_VOICE_PROCESSING=false— voice messages are silently ignored (no handler registered)RuntimeErrorwith install instructions🤖 Generated with Claude Code