A fast, modern desktop dictation app built with Tauri 2.x. Speak naturally and have your words transcribed, cleaned up, and pasted into any application — including CLI tools like Claude Code and Codex.
Whisperi primarily relies on cloud transcription services (OpenAI, Groq, Mistral) rather than local models. While local speech-to-text models like whisper.cpp exist, they require significant computational resources to achieve acceptable speed and accuracy. For most users, cloud APIs deliver near-instant, high-quality transcription that local models on consumer hardware simply cannot match.
See Supported Providers for the full list of models and our recommended setup.
- Voice Transcription — OpenAI, Groq, Mistral, and Qwen with model selection
- Text Enhancement — Post-process transcriptions with GPT, Claude, Gemini, Groq, Qwen, or any model via OpenRouter
- Auto-Paste — Transcribed text is automatically pasted into the active window, including CLI tools
- Custom Dictionary — Add names, jargon, and technical terms to improve accuracy
- Transcribe & Chat Modes — Cleans up speech by default; say the agent name to switch to a conversational AI chatbot
- Hotkey Support — Tap-to-toggle or push-to-talk activation modes
Whisperi's language selector (Settings > General > Language) controls the output language, not the input language. This means you can speak in one language and have the output automatically produced in another.
- Auto-detect — output matches whatever language you speak
- Specific language (e.g., "English") — output is always in the selected language, regardless of what language you speak
This effectively gives you real-time speech translation. For example, speak in Chinese and set the output language to English — Whisperi will transcribe your speech and produce clean English text. Or speak in English and output in French, Japanese, etc.
The language setting overrides the system prompt language — even if your custom prompt is written in Chinese, selecting "English" as the output language will produce English output.
Most dictation apps can only paste into standard GUI text fields. Whisperi uses native Win32 SendInput to simulate real keystrokes, which means it can paste directly into command-line interfaces and terminal emulators — something most competitors simply cannot do.
This makes Whisperi especially useful for developers who work with AI coding assistants in the terminal:
- Claude Code — dictate prompts and instructions directly into the Claude Code CLI
- Codex CLI — speak your coding requests instead of typing them
- Any terminal — PowerShell, Windows Terminal, cmd.exe, WSL terminals
No need to type out long prompts manually. Just press the hotkey, speak, and your words appear right in the terminal input.
After testing across providers, the following combination delivers the best balance of speed and accuracy:
| Stage | Provider | Model | Why |
|---|---|---|---|
| Transcription | Groq | Whisper Large v3 | Highest accuracy cloud transcription at 299x real-time speed |
| Enhancement | Groq | LLaMA 3.3 70B | Best speed-to-quality ratio for text cleanup |
Both models run on Groq's inference engine, so you only need a single API key. Transcription + enhancement typically completes in under 2 seconds end-to-end.
If you need more sophisticated enhancement (complex restructuring, tone adjustments, or nuanced formatting), switch to LLaMA 4 Maverick or LLaMA 4 Scout on Groq. These models produce higher-quality rewrites but take noticeably longer per request.
Tip for Asian languages: If you primarily dictate in Chinese, Japanese, Korean, or other Asian languages, consider using Qwen (Alibaba Cloud) for both transcription and text enhancement. Qwen3 ASR Flash delivers superior CJK speech recognition, and Qwen's reasoning models have stronger understanding of Asian language grammar, idioms, and punctuation conventions compared to English-centric models.
| Provider | Models |
|---|---|
| OpenAI | GPT-4o Mini Transcribe, GPT-4o Transcribe, Whisper |
| Groq | Whisper Large v3, Whisper Large v3 Turbo |
| Mistral | Voxtral Mini |
| Qwen | Qwen3 ASR Flash |
| OpenRouter | Any model — enter provider/model-name |
| Provider | Models |
|---|---|
| OpenAI | GPT-5.2, GPT-5.2 Pro, GPT-5 Mini, GPT-5 Nano, GPT-4.1 family |
| Anthropic | Claude Opus 4.6, Claude Sonnet 4.5, Claude Haiku 4.5 |
| Google Gemini | Gemini 3 Pro, Gemini 3 Flash, Gemini 2.5 Pro, Gemini 2.5 Flash |
| Groq | LLaMA 4 Maverick, LLaMA 4 Scout, Qwen3 32B, GPT-OSS 120B/20B, LLaMA 3.3 70B |
| Qwen | Qwen3 235B (MoE), Qwen3 32B |
| OpenRouter | Any model — enter provider/model-name (e.g. anthropic/claude-sonnet-4) |
Whisperi supports custom system prompts to control how the AI cleans up your transcriptions. Example prompts are available in examples/prompts/:
custom-prompt-en.txt— Englishcustom-prompt-zh.txt— Chinese (中文)
To use a custom prompt, go to Settings > Enhancement > System Prompt, switch to the "Custom Prompt" tab, and paste your prompt text.
- Global hotkey may stop working after a remote desktop session — If you use remote desktop software (RustDesk, Windows RDP, AnyDesk, etc.), the OS-level global hotkey registration can be disrupted when the remote session connects or disconnects. Whisperi will automatically re-register the hotkey when its overlay window regains focus, so click the Whisperi overlay button once to restore the shortcut. If the hotkey still doesn't work, open Preferences and re-set the hotkey.
Whisperi currently targets Windows only, but it is built with Tauri, which supports macOS and Linux as well. If you'd like to see support for other platforms, please open an issue.
If you need local/offline transcription (no cloud API keys), check out these alternatives that bundle Whisper models for on-device processing:
- OpenWhispr — cross-platform dictation with local and cloud models
- Epicenter (formerly Whispering) — local-first open-source speech-to-text ecosystem
Prerequisites: Rust (stable), bun, Windows 10/11
bun install # install dependencies
bun run tauri dev # dev mode (Vite + Tauri)
bun run typecheck # TypeScript check
cd src-tauri && cargo test # Rust tests
bun run tauri build # production buildThis project is licensed under the MIT License — see the LICENSE file for details.
