Whisperi

Built on Windows, for Windows.

A fast, modern desktop dictation app built with Tauri 2.x. Speak naturally and have your words transcribed, cleaned up, and pasted into any application — including CLI tools like Claude Code and Codex.

Why Cloud-First?

Whisperi primarily relies on cloud transcription services (OpenAI, Groq, Mistral) rather than local models. While local speech-to-text models like whisper.cpp exist, they require significant computational resources to achieve acceptable speed and accuracy. For most users, cloud APIs deliver near-instant, high-quality transcription that local models on consumer hardware simply cannot match.

See Supported Providers for the full list of models and our recommended setup.

Features

Voice Transcription — OpenAI, Groq, Mistral, and Qwen with model selection
Text Enhancement — Post-process transcriptions with GPT, Claude, Gemini, Groq, Qwen, or any model via OpenRouter
Auto-Paste — Transcribed text is automatically pasted into the active window, including CLI tools
Custom Dictionary — Add names, jargon, and technical terms to improve accuracy
Transcribe & Chat Modes — Cleans up speech by default; say the agent name to switch to a conversational AI chatbot
Hotkey Support — Tap-to-toggle or push-to-talk activation modes

Language & Translation

Whisperi's language selector (Settings > General > Language) controls the output language, not the input language. This means you can speak in one language and have the output automatically produced in another.

Auto-detect — output matches whatever language you speak
Specific language (e.g., "English") — output is always in the selected language, regardless of what language you speak

This effectively gives you real-time speech translation. For example, speak in Chinese and set the output language to English — Whisperi will transcribe your speech and produce clean English text. Or speak in English and output in French, Japanese, etc.

The language setting overrides the system prompt language — even if your custom prompt is written in Chinese, selecting "English" as the output language will produce English output.

Paste Anywhere — Including CLI Tools

Most dictation apps can only paste into standard GUI text fields. Whisperi uses native Win32 SendInput to simulate real keystrokes, which means it can paste directly into command-line interfaces and terminal emulators — something most competitors simply cannot do.

This makes Whisperi especially useful for developers who work with AI coding assistants in the terminal:

Claude Code — dictate prompts and instructions directly into the Claude Code CLI
Codex CLI — speak your coding requests instead of typing them
Any terminal — PowerShell, Windows Terminal, cmd.exe, WSL terminals

No need to type out long prompts manually. Just press the hotkey, speak, and your words appear right in the terminal input.

Supported Providers

Recommended Models

After testing across providers, the following combination delivers the best balance of speed and accuracy:

Stage	Provider	Model	Why
Transcription	Groq	Whisper Large v3	Highest accuracy cloud transcription at 299x real-time speed
Enhancement	Groq	LLaMA 3.3 70B	Best speed-to-quality ratio for text cleanup

Both models run on Groq's inference engine, so you only need a single API key. Transcription + enhancement typically completes in under 2 seconds end-to-end.

If you need more sophisticated enhancement (complex restructuring, tone adjustments, or nuanced formatting), switch to LLaMA 4 Maverick or LLaMA 4 Scout on Groq. These models produce higher-quality rewrites but take noticeably longer per request.

Tip for Asian languages: If you primarily dictate in Chinese, Japanese, Korean, or other Asian languages, consider using Qwen (Alibaba Cloud) for both transcription and text enhancement. Qwen3 ASR Flash delivers superior CJK speech recognition, and Qwen's reasoning models have stronger understanding of Asian language grammar, idioms, and punctuation conventions compared to English-centric models.

Voice Transcription

Provider	Models
OpenAI	GPT-4o Mini Transcribe, GPT-4o Transcribe, Whisper
Groq	Whisper Large v3, Whisper Large v3 Turbo
Mistral	Voxtral Mini
Qwen	Qwen3 ASR Flash
OpenRouter	Any model — enter `provider/model-name`

Text Enhancement

Provider	Models
OpenAI	GPT-5.2, GPT-5.2 Pro, GPT-5 Mini, GPT-5 Nano, GPT-4.1 family
Anthropic	Claude Opus 4.6, Claude Sonnet 4.5, Claude Haiku 4.5
Google Gemini	Gemini 3 Pro, Gemini 3 Flash, Gemini 2.5 Pro, Gemini 2.5 Flash
Groq	LLaMA 4 Maverick, LLaMA 4 Scout, Qwen3 32B, GPT-OSS 120B/20B, LLaMA 3.3 70B
Qwen	Qwen3 235B (MoE), Qwen3 32B
OpenRouter	Any model — enter `provider/model-name` (e.g. `anthropic/claude-sonnet-4`)

Example Prompts

Whisperi supports custom system prompts to control how the AI cleans up your transcriptions. Example prompts are available in examples/prompts/:

custom-prompt-en.txt — English
custom-prompt-zh.txt — Chinese (中文)

To use a custom prompt, go to Settings > Enhancement > System Prompt, switch to the "Custom Prompt" tab, and paste your prompt text.

Known Issues

Global hotkey may stop working after a remote desktop session — If you use remote desktop software (RustDesk, Windows RDP, AnyDesk, etc.), the OS-level global hotkey registration can be disrupted when the remote session connects or disconnects. Whisperi will automatically re-register the hotkey when its overlay window regains focus, so click the Whisperi overlay button once to restore the shortcut. If the hotkey still doesn't work, open Preferences and re-set the hotkey.

Other Platforms

Whisperi currently targets Windows only, but it is built with Tauri, which supports macOS and Linux as well. If you'd like to see support for other platforms, please open an issue.

If you need local/offline transcription (no cloud API keys), check out these alternatives that bundle Whisper models for on-device processing:

OpenWhispr — cross-platform dictation with local and cloud models
Epicenter (formerly Whispering) — local-first open-source speech-to-text ecosystem

Contributing

Prerequisites: Rust (stable), bun, Windows 10/11

bun install              # install dependencies
bun run tauri dev        # dev mode (Vite + Tauri)
bun run typecheck        # TypeScript check
cd src-tauri && cargo test   # Rust tests
bun run tauri build      # production build

License

This project is licensed under the MIT License — see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 143 Commits
.github		.github
docs		docs
examples/prompts		examples/prompts
public		public
scripts		scripts
src-tauri		src-tauri
src		src
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
bun.lock		bun.lock
index.html		index.html
package.json		package.json
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whisperi

Why Cloud-First?

Features

Language & Translation

Paste Anywhere — Including CLI Tools

Supported Providers

Recommended Models

Voice Transcription

Text Enhancement

Example Prompts

Known Issues

Other Platforms

Contributing

License

About

Uh oh!

Releases 24

Packages

Languages

License

xarthurx/whisperi

Folders and files

Latest commit

History

Repository files navigation

Whisperi

Why Cloud-First?

Features

Language & Translation

Paste Anywhere — Including CLI Tools

Supported Providers

Recommended Models

Voice Transcription

Text Enhancement

Example Prompts

Known Issues

Other Platforms

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 24

Packages 0

Languages

Packages