A lightweight, local AI voice transcription tool built with Tauri v2, React, and Rust. Whisper Flow runs OpenAI's Whisper models locally on your machine, ensuring privacy and speed without sending audio data to the cloud.
Note: Whisper Flow is currently available for macOS (Silicon/Intel) only.
Download the latest version from GitHub Releases
- 🔒 Local & Private: All transcription happens on-device using quantized Whisper models (ggml).
- 👻 Native Floating Hint: A beautiful, completely transparent floating capsule indicates recording status without blocking your workflow.
- Built with native macOS
NSWindowAPIs for true transparency and click-through capability.
- Built with native macOS
- ⚡️ Global Shortcuts: Toggle recording instantly from anywhere (Default:
Shift + Command + A). - 📂 Drag & Drop: Drag audio/video files directly into the app to transcribe them.
- 📋 Auto-Copy: Transcribed text is automatically copied to your clipboard.
- 🎯 System Integration:
- Native microphone access.
- Accessibility API integration for global input monitoring.
- macOS "Ghost Window" mode (ignores mouse events, fully transparent background).
- Frontend: React, TypeScript, Vite, Modular CSS
- Backend: Rust (Tauri v2),
objc2(for macOS native window management),cpal(Audio),whisper.cpp(Model inference) - State Management: React Hooks + Tauri Event System
- Node.js (v18+)
- Rust (latest stable)
- macOS (tested on Sonoma/Sequoia)
-
Clone the repository
git clone https://github.com/yourusername/whisper-flow.git cd whisper-flow -
Install dependencies
npm install
-
Setup Binaries & Models
- Models: The app will automatically download the Whisper model on first run.
- Binaries: You must manually place the required binaries in
src-tauri/binaries/:ffmpeg: Download a standalone FFmpeg binary (aarch64 for Apple Silicon, x64 for Intel).whisper-cli: Build or downloadwhisper-cppCLI.- Naming: Ensure files are named specifically for your architecture, e.g.,
ffmpeg-aarch64-apple-darwinandwhisper-cli-aarch64-apple-darwin.
-
Run in Development Mode
npm run tauri dev
-
Build for Production
npm run tauri build
On the first launch, the app will request the following permissions:
- Microphone: To record audio.
- Accessibility: To listen for global shortcuts (e.g.,
Shift+Cmd+A) even when the app is in the background.
- Multi-Window System: Uses a dedicated lightweight
hint.htmlentry point for the floating window to minimize resource usage. - Zero-Style-Leakage: The floating window uses a separate CSS stack (
hint.css) and Native DOM injection to prevent main window styles from affecting the transparent background. - Rust-Native Transparency: Leverages
objc2and Cocoa APIs to manipulate the underlyingNSWindow, ensuring a "glass-like" effect that standard Webviews cannot achieve alone.
MIT License