feat: multi-provider voice transcription (Parakeet local GPU, Mistral, OpenAI) by BasilPadre · Pull Request #132 · RichardAtCT/claude-code-telegram

BasilPadre · 2026-03-06T15:15:18Z

Summary

Adds voice message transcription with three pluggable backends: Parakeet (local GPU, default), Mistral Voxtral (cloud), OpenAI Whisper (cloud)
New VOICE_PROVIDER setting selects the backend; cloud providers require their respective API keys
Parakeet uses NVIDIA NeMo Parakeet TDT 0.6B v3 — runs entirely on-device with no API cost; model is downloaded and cached automatically on first use
Optional dependency groups [voice] and [parakeet] keep heavy GPU deps out of the default install
New agentic_voice handler in the orchestrator transcribes the message and passes it to Claude as text, preserving session context

New settings

Variable	Default	Description
`ENABLE_VOICE_PROCESSING`	`false`	Enable voice transcription
`VOICE_PROVIDER`	`parakeet`	`parakeet`, `mistral`, or `openai`
`FFMPEG_PATH`	(PATH)	Explicit path to ffmpeg binary
`VOICE_MAX_FILE_SIZE_MB`	`20`	Reject files larger than this
`MISTRAL_API_KEY`	—	Required for Mistral provider
`OPENAI_API_KEY`	—	Required for OpenAI provider

Install

# Local GPU (Parakeet)
pip install "claude-code-telegram[parakeet]"

# Cloud providers
pip install "claude-code-telegram[voice]"

Test plan

Send voice message with VOICE_PROVIDER=parakeet — transcription appears, then Claude responds
Send voice message with VOICE_PROVIDER=mistral (requires MISTRAL_API_KEY)
Send voice message with VOICE_PROVIDER=openai (requires OPENAI_API_KEY)
File exceeding VOICE_MAX_FILE_SIZE_MB is rejected with a clear error
ENABLE_VOICE_PROCESSING=false — voice messages are silently ignored (no handler registered)
Missing optional deps raise a helpful RuntimeError with install instructions

🤖 Generated with Claude Code

Adds voice message transcription support with three backends: - `parakeet` (default): local NVIDIA NeMo Parakeet TDT 0.6B v3, runs on GPU, no API key or cloud cost required - `mistral`: Mistral Voxtral cloud API - `openai`: OpenAI Whisper cloud API New settings: - ENABLE_VOICE_PROCESSING (bool, default false) - VOICE_PROVIDER (mistral | openai | parakeet, default parakeet) - FFMPEG_PATH (optional explicit path, falls back to PATH) - VOICE_MAX_FILE_SIZE_MB (default 20) - MISTRAL_API_KEY / OPENAI_API_KEY (for cloud providers) Optional dependency groups added to pyproject.toml: - `[voice]` for mistral + openai cloud providers - `[parakeet]` for local GPU transcription via NeMo The Parakeet model (~600 MB) is downloaded and cached automatically on first use. Audio is converted ogg→wav via ffmpeg before transcription. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

FridayOpenClawBot · 2026-03-06T15:24:14Z

PR Review
Reviewed head: 14d665b64dc714c9718a59dfee86e0e05e8d6ee8

Summary

Adds voice transcription with three backends (Parakeet local GPU, Mistral Voxtral, OpenAI Whisper) behind a feature flag. Good structure overall, but has a broken packaging pattern, a surprising default, and conflicts with existing CLAUDE.md conventions that need resolving before merge.

What looks good

Lazy client/model initialisation — no import cost unless the provider is actually used
Triple file-size guard (pre-download metadata, post-get-file, post-download bytes) is thorough
run_in_executor correctly offloads the blocking Parakeet inference off the event loop
Graceful error wrapping with structured logging on cloud provider failures
Feature flag defaults to False, so existing deployments are unaffected on upgrade

Issues / questions

[Blocker] pyproject.toml — The optional deps are declared inside [tool.poetry.group.voice.dependencies] and [tool.poetry.group.parakeet.dependencies], not in [tool.poetry.dependencies]. Poetry only exports pip extras from the main dependencies table. As written, pip install "claude-code-telegram[voice]" and pip install "claude-code-telegram[parakeet]" will silently install nothing extra — extras groups are a Poetry-dev concept, not a packaging artifact. Move mistralai, openai, nemo_toolkit, and torch into [tool.poetry.dependencies] with optional = true, then keep the [tool.poetry.extras] block as-is.
[Blocker] src/config/settings.py — VOICE_PROVIDER defaults to "parakeet", but Parakeet requires a CUDA GPU and a ~600 MB NeMo model download. Most cloud-deployed instances of this bot have no GPU. CLAUDE.md also explicitly documents the default as mistral. This will fail on first voice message for the majority of users. Change default to "mistral" to match documented behaviour, or at minimum add a startup validation that raises a clear error if parakeet is selected without CUDA available.
[Blocker] src/config/settings.py — This PR renames ENABLE_VOICE_MESSAGES → ENABLE_VOICE_PROCESSING and drops VOICE_TRANSCRIPTION_MODEL, silently breaking existing deployments that set those env vars. CLAUDE.md documents the old names. Is this intentional? If so, CLAUDE.md must be updated in this PR and a migration note added to the README. If not, revert to the documented names.
[Important] src/bot/features/voice_handler.py:_parakeet (property) — No lock around the lazy model load. If two voice messages arrive concurrently, both threads entering _run_parakeet via the executor could race through if self._parakeet_model is None and load the model twice. Add a threading.Lock acquired before the None check.
[Important] src/bot/features/voice_handler.py:process_voice_message — The pre-download voice.file_size check is best-effort only: Telegram doesn't always populate that field. The real guard is the post-download byte-length check, which works correctly. Worth adding a comment so the next reader doesn't wonder why the triple check is needed.
[Nit] src/bot/features/voice_handler.py:_run_parakeet — Missing return type annotation (-> str). CLAUDE.md requires type hints on all functions and mypy strict is enforced.

Suggested tests (if needed)

Unit test _check_file_size with boundary values (exactly at limit, one byte over)
Mock nemo_asr.models.ASRModel.from_pretrained and assert the property only calls it once across two concurrent _run_parakeet calls (regression for the race once fixed)
Integration smoke: VOICE_PROVIDER=openai with a mocked AsyncOpenAI client returns a ProcessedVoice with correct transcription and prompt fields

Verdict

⚠️ Merge after fixes (blockers 1–3 need resolving; 4 is straightforward)

— Friday, AI assistant to @RichardAtCT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: multi-provider voice transcription (Parakeet local GPU, Mistral, OpenAI)#132

feat: multi-provider voice transcription (Parakeet local GPU, Mistral, OpenAI)#132
BasilPadre wants to merge 1 commit intoRichardAtCT:mainfrom
BasilPadre:feature/multi-provider-voice-transcription

BasilPadre commented Mar 6, 2026

Uh oh!

FridayOpenClawBot commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

BasilPadre commented Mar 6, 2026

Summary

New settings

Install

Test plan

Uh oh!

FridayOpenClawBot commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants