Whisper Flow 🎙️

A lightweight, local AI voice transcription tool built with Tauri v2, React, and Rust. Whisper Flow runs OpenAI's Whisper models locally on your machine, ensuring privacy and speed without sending audio data to the cloud.

📥 Download

Note: Whisper Flow is currently available for macOS (Silicon/Intel) only.

Download the latest version from GitHub Releases

✨ Features

🔒 Local & Private: All transcription happens on-device using quantized Whisper models (ggml).
👻 Native Floating Hint: A beautiful, completely transparent floating capsule indicates recording status without blocking your workflow.
- Built with native macOS NSWindow APIs for true transparency and click-through capability.
⚡️ Global Shortcuts: Toggle recording instantly from anywhere (Default: Shift + Command + A).
📂 Drag & Drop: Drag audio/video files directly into the app to transcribe them.
📋 Auto-Copy: Transcribed text is automatically copied to your clipboard.
🎯 System Integration:
- Native microphone access.
- Accessibility API integration for global input monitoring.
- macOS "Ghost Window" mode (ignores mouse events, fully transparent background).

🛠️ Tech Stack

Frontend: React, TypeScript, Vite, Modular CSS
Backend: Rust (Tauri v2), objc2 (for macOS native window management), cpal (Audio), whisper.cpp (Model inference)
State Management: React Hooks + Tauri Event System

Getting Started

Prerequisites

Node.js (v18+)
Rust (latest stable)
macOS (tested on Sonoma/Sequoia)

Installation

Clone the repository

git clone https://github.com/yourusername/whisper-flow.git
cd whisper-flow

Install dependencies
```
npm install
```
Setup Binaries & Models
- Models: The app will automatically download the Whisper model on first run.
- Binaries: You must manually place the required binaries in src-tauri/binaries/:
  - ffmpeg: Download a standalone FFmpeg binary (aarch64 for Apple Silicon, x64 for Intel).
  - whisper-cli: Build or download whisper-cpp CLI.
  - Naming: Ensure files are named specifically for your architecture, e.g., ffmpeg-aarch64-apple-darwin and whisper-cli-aarch64-apple-darwin.
Run in Development Mode
```
npm run tauri dev
```
Build for Production
```
npm run tauri build
```

🧩 Permissions

On the first launch, the app will request the following permissions:

Microphone: To record audio.
Accessibility: To listen for global shortcuts (e.g., Shift+Cmd+A) even when the app is in the background.

🏗️ Architecture Highlights

Multi-Window System: Uses a dedicated lightweight hint.html entry point for the floating window to minimize resource usage.
Zero-Style-Leakage: The floating window uses a separate CSS stack (hint.css) and Native DOM injection to prevent main window styles from affecting the transparent background.
Rust-Native Transparency: Leverages objc2 and Cocoa APIs to manipulate the underlying NSWindow, ensuring a "glass-like" effect that standard Webviews cannot achieve alone.

📝 License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.vscode		.vscode
public		public
src-tauri		src-tauri
src		src
.gitignore		.gitignore
README.md		README.md
hint.html		hint.html
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whisper Flow 🎙️

📥 Download

✨ Features

🛠️ Tech Stack

Getting Started

Prerequisites

Installation

🧩 Permissions

🏗️ Architecture Highlights

📝 License

About

Uh oh!

Releases 12

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Whisper Flow 🎙️

📥 Download

✨ Features

🛠️ Tech Stack

Getting Started

Prerequisites

Installation

🧩 Permissions

🏗️ Architecture Highlights

📝 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 12

Contributors

Uh oh!

Languages