Skip to content

A fast, modern desktop dictation app. Speak naturally and have your words transcribed, cleaned up, and pasted into any application.

License

Notifications You must be signed in to change notification settings

xarthurx/whisperi

Repository files navigation

Whisperi

Whisperi

Built on Windows, for Windows.

CI Release License Platform Tauri

Speech → Voice Transcription → Text Enhancement → Output

A fast, modern desktop dictation app built with Tauri 2.x. Speak naturally and have your words transcribed, cleaned up, and pasted into any application — including CLI tools like Claude Code and Codex.

Why Cloud-First?

Whisperi primarily relies on cloud transcription services (OpenAI, Groq, Mistral) rather than local models. While local speech-to-text models like whisper.cpp exist, they require significant computational resources to achieve acceptable speed and accuracy. For most users, cloud APIs deliver near-instant, high-quality transcription that local models on consumer hardware simply cannot match.

See Supported Providers for the full list of models and our recommended setup.

Features

Overlay button states: Idle → Recording → Processing

  • Voice Transcription — OpenAI, Groq, Mistral, and Qwen with model selection
  • Text Enhancement — Post-process transcriptions with GPT, Claude, Gemini, Groq, Qwen, or any model via OpenRouter
  • Auto-Paste — Transcribed text is automatically pasted into the active window, including CLI tools
  • Custom Dictionary — Add names, jargon, and technical terms to improve accuracy
  • Transcribe & Chat Modes — Cleans up speech by default; say the agent name to switch to a conversational AI chatbot
  • Hotkey Support — Tap-to-toggle or push-to-talk activation modes

Language & Translation

Whisperi's language selector (Settings > General > Language) controls the output language, not the input language. This means you can speak in one language and have the output automatically produced in another.

  • Auto-detect — output matches whatever language you speak
  • Specific language (e.g., "English") — output is always in the selected language, regardless of what language you speak

This effectively gives you real-time speech translation. For example, speak in Chinese and set the output language to English — Whisperi will transcribe your speech and produce clean English text. Or speak in English and output in French, Japanese, etc.

The language setting overrides the system prompt language — even if your custom prompt is written in Chinese, selecting "English" as the output language will produce English output.

Whisperi Settings Window

Paste Anywhere — Including CLI Tools

Most dictation apps can only paste into standard GUI text fields. Whisperi uses native Win32 SendInput to simulate real keystrokes, which means it can paste directly into command-line interfaces and terminal emulators — something most competitors simply cannot do.

This makes Whisperi especially useful for developers who work with AI coding assistants in the terminal:

  • Claude Code — dictate prompts and instructions directly into the Claude Code CLI
  • Codex CLI — speak your coding requests instead of typing them
  • Any terminal — PowerShell, Windows Terminal, cmd.exe, WSL terminals

No need to type out long prompts manually. Just press the hotkey, speak, and your words appear right in the terminal input.

Supported Providers

Recommended Models

After testing across providers, the following combination delivers the best balance of speed and accuracy:

Stage Provider Model Why
Transcription Groq Whisper Large v3 Highest accuracy cloud transcription at 299x real-time speed
Enhancement Groq LLaMA 3.3 70B Best speed-to-quality ratio for text cleanup

Both models run on Groq's inference engine, so you only need a single API key. Transcription + enhancement typically completes in under 2 seconds end-to-end.

If you need more sophisticated enhancement (complex restructuring, tone adjustments, or nuanced formatting), switch to LLaMA 4 Maverick or LLaMA 4 Scout on Groq. These models produce higher-quality rewrites but take noticeably longer per request.

Tip for Asian languages: If you primarily dictate in Chinese, Japanese, Korean, or other Asian languages, consider using Qwen (Alibaba Cloud) for both transcription and text enhancement. Qwen3 ASR Flash delivers superior CJK speech recognition, and Qwen's reasoning models have stronger understanding of Asian language grammar, idioms, and punctuation conventions compared to English-centric models.

Voice Transcription

Provider Models
OpenAI GPT-4o Mini Transcribe, GPT-4o Transcribe, Whisper
Groq Whisper Large v3, Whisper Large v3 Turbo
Mistral Voxtral Mini
Qwen Qwen3 ASR Flash
OpenRouter Any model — enter provider/model-name

Text Enhancement

Provider Models
OpenAI GPT-5.2, GPT-5.2 Pro, GPT-5 Mini, GPT-5 Nano, GPT-4.1 family
Anthropic Claude Opus 4.6, Claude Sonnet 4.5, Claude Haiku 4.5
Google Gemini Gemini 3 Pro, Gemini 3 Flash, Gemini 2.5 Pro, Gemini 2.5 Flash
Groq LLaMA 4 Maverick, LLaMA 4 Scout, Qwen3 32B, GPT-OSS 120B/20B, LLaMA 3.3 70B
Qwen Qwen3 235B (MoE), Qwen3 32B
OpenRouter Any model — enter provider/model-name (e.g. anthropic/claude-sonnet-4)

Example Prompts

Whisperi supports custom system prompts to control how the AI cleans up your transcriptions. Example prompts are available in examples/prompts/:

To use a custom prompt, go to Settings > Enhancement > System Prompt, switch to the "Custom Prompt" tab, and paste your prompt text.

Known Issues

  • Global hotkey may stop working after a remote desktop session — If you use remote desktop software (RustDesk, Windows RDP, AnyDesk, etc.), the OS-level global hotkey registration can be disrupted when the remote session connects or disconnects. Whisperi will automatically re-register the hotkey when its overlay window regains focus, so click the Whisperi overlay button once to restore the shortcut. If the hotkey still doesn't work, open Preferences and re-set the hotkey.

Other Platforms

Whisperi currently targets Windows only, but it is built with Tauri, which supports macOS and Linux as well. If you'd like to see support for other platforms, please open an issue.

If you need local/offline transcription (no cloud API keys), check out these alternatives that bundle Whisper models for on-device processing:

Contributing

Prerequisites: Rust (stable), bun, Windows 10/11

bun install              # install dependencies
bun run tauri dev        # dev mode (Vite + Tauri)
bun run typecheck        # TypeScript check
cd src-tauri && cargo test   # Rust tests
bun run tauri build      # production build

License

This project is licensed under the MIT License — see the LICENSE file for details.

About

A fast, modern desktop dictation app. Speak naturally and have your words transcribed, cleaned up, and pasted into any application.

Resources

License

Stars

Watchers

Forks

Packages

No packages published