A native macOS menu bar app that turns your voice into clean, formatted text — anywhere your cursor is. Press a hotkey, speak, and watch the cleaned-up text stream into your active text field. Like WispFlow, but open source and bring-your-own-OpenAI-key.
- Global hotkey —
⌥Spacefrom any app (configurable) - Floating overlay — translucent pill that appears over your work without stealing focus
- Live waveform — visual feedback as you speak
- Whisper transcription — OpenAI Whisper API; accent-aware, multilingual
- GPT formatting — GPT-4o-mini cleans grammar, removes filler words, preserves your voice and intent
- Streaming responses — formatted text appears token-by-token in the overlay as GPT generates it
- Smart text injection — types at your cursor via Accessibility API, falls back to clipboard paste in apps that don't expose native text fields (Cursor, VS Code, Chrome, Slack, etc.)
- Silence auto-stop — configurable pause length (default 3 s) before recording auto-stops
- Silence detection — empty recordings are dropped before they reach Whisper, so you never get ghost transcripts like "Thank you" or "You" from silence hallucinations
- Edit-before-inject (optional) — preview the formatted text in an editable field;
⌘↵to inject,Escto cancel - Recent transcripts — last 10 transcripts available from the menu bar for one-click copy
- Custom system prompt — fully editable in Settings
- Launch at login — toggleable
- Clipboard-history hygiene — uses the
org.nspasteboard.TransientTypeconvention so text injections aren't recorded by Raycast / Paste / Maccy / Pastebot
- macOS 13 (Ventura) or later
- Xcode 15+ installed (the Swift compiler + frameworks are needed; you never need to open Xcode itself)
- An OpenAI API key — covers both Whisper transcription and GPT-4o-mini formatting
git clone https://github.com/opeoyeleke/voicescribe.git
cd voicescribe/VoiceScribe
# (Recommended, one-time) Create a stable signing identity in your login
# keychain so the macOS Accessibility grant survives across rebuilds.
./scripts/setup-signing.sh
# Build a release .app bundle
./scripts/build.sh
# Move it wherever you want and launch
mv .build/release/VoiceScribe.app ~/Applications/
open ~/Applications/VoiceScribe.appOn first launch:
- Grant Microphone access when prompted
- Open System Settings → Privacy & Security → Accessibility → enable VoiceScribe
- Click the menu bar icon → Settings… → API tab → paste your OpenAI key
- Press ⌥Space anywhere and start speaking
- Place your cursor in any text field (any app)
- Press ⌥Space
- Speak naturally
- Press ⌥Space again, or pause for ~3 s — recording auto-stops
- Whisper transcribes → GPT formats → text appears at your cursor
If you've enabled Preview before injecting in Settings, an editable overlay appears after formatting. ⌘↵ injects, Esc cancels.
Open via the menu bar icon → Settings…
- General
- Hotkey (default
⌥Space) - Auto-inject text into focused app
- Auto-stop after silence (1.0–6.0 s)
- Preview before injecting
- Launch at login
- Hotkey (default
- API
- OpenAI API key
- Formatting model (GPT-4o-mini or GPT-4o)
- Test Connection button
- Prompt
- Edit the system prompt that GPT receives
- Reset to default
VoiceScribe/
├── Package.swift SPM manifest (one dep: KeyboardShortcuts)
├── Resources/
│ ├── Info.plist Bundle metadata, LSUIElement, permission strings
│ └── AppIcon.icns App icon (regenerate with scripts/generate-icon.sh)
├── scripts/
│ ├── build.sh Release build → .build/release/VoiceScribe.app
│ ├── setup-signing.sh One-time: self-signed cert in login keychain
│ ├── generate-icon.swift CoreGraphics icon renderer
│ └── generate-icon.sh Wraps Swift renderer + sips + iconutil
└── Sources/VoiceScribe/
├── App/
│ ├── VoiceScribeApp.swift @main entry; placeholder Settings scene
│ ├── AppDelegate.swift Menu bar, hotkey, settings window, recent submenu
│ └── OverlayWindow.swift Borderless NSWindow; key only in edit-confirm mode
├── Models/
│ └── AppState.swift Observable state + @AppStorage settings
├── Services/
│ ├── AudioRecorderService.swift AVAudioRecorder → 16 kHz mono PCM .wav, level meter, silence auto-stop
│ ├── WhisperService.swift POST .wav to OpenAI Whisper API → raw transcript
│ ├── GPTFormatterService.swift POST transcript to OpenAI Chat API → clean text (streaming + non-streaming)
│ ├── TextInjectionService.swift AXUIElement injection, clipboard fallback (transient pasteboard type)
│ └── RecordingCoordinator.swift Owns the pipeline: record → transcribe → format → inject
└── Views/
├── OverlayView.swift SwiftUI floating pill UI
└── SettingsView.swift Tabbed Settings: General / API / Prompt
VoiceScribe currently sends language=en to Whisper. To change, edit WhisperService.swift — adjust the language init parameter or expose it in Settings.
Settings → Prompt tab. The default prompt instructs GPT to act as a transcription editor (not a chatbot), preserve greetings, and never answer questions or invent content. Reset to default if you want to pick up prompt updates after a git pull.
The Whisper API call passes a prompt parameter to discourage common training-set hallucinations ("Thank you", "Subscribe to my channel"). Tune it in WhisperService.swift buildMultipartBody().
swift build
cp .build/debug/VoiceScribe .build/VoiceScribe.app/Contents/MacOS/
codesign --force --deep --sign "VoiceScribe Dev" .build/VoiceScribe.app
open .build/VoiceScribe.app(After running setup-signing.sh once. Without it, substitute --sign - for ad-hoc signing — but be aware that ad-hoc rebuilds invalidate the macOS Accessibility grant on every iteration.)
./scripts/build.shProduces a signed .app at .build/release/VoiceScribe.app. Uses "VoiceScribe Dev" if available (run setup-signing.sh first), otherwise falls back to ad-hoc with a warning.
./scripts/generate-icon.shRenders a fresh Resources/AppIcon.icns from the Swift CoreGraphics script. Edit colours / SF Symbol in scripts/generate-icon.swift.
This repo is set up for bring-your-own-build: you clone, run the build script, and use the resulting .app. There is no signed/notarised release in the GitHub Releases.
If you do distribute a signed .app to others, two things matter:
- Ad-hoc signed (
--sign -) builds can be opened by anyone but each macOS install treats them as unidentified, requiring a right-click → Open the first time. Accessibility grants are tied to the code identity, so updates from someone else's machine require re-granting. - Properly notarised distribution requires the Apple Developer Program ($99/yr) so you can sign with a Developer ID certificate and submit the bundle to Apple's notarisation service. Once notarised, downloads work without any Gatekeeper friction.
- Audio is sent to OpenAI's Whisper API for transcription
- The raw transcript is sent to OpenAI's Chat Completions API for formatting
- Both endpoints are reached directly from your machine via
URLSession; no third-party servers are involved - Your OpenAI API key is stored in macOS UserDefaults (
com.opeoyeleke.voicescribe), never transmitted anywhere except OpenAI - Recent transcripts are kept locally in UserDefaults (last 10 entries; clearable from the menu)
- Refer to OpenAI's data usage policy for what they do with API audio and text
Done:
- Whisper transcription via OpenAI API
- GPT-4o-mini formatting (streaming)
- Native AX text injection + clipboard fallback
- Configurable silence auto-stop
- Edit-before-inject mode
- Recent transcripts menu
- Whisper hallucination biasing
- Stable code-signing (no permission churn on rebuild)
Possible next steps:
- Local Whisper via
whisper.cppfor offline / zero-latency transcription - Per-app prompt profiles (different prompt for code vs prose vs Slack)
- Voice commands ("new paragraph", "delete that", "scratch that")
- Sparkle auto-updater
- Notarised distribution
PRs welcome. Each service has a single responsibility and is easy to swap or extend:
- New transcription backend → conform to the
WhisperServiceinterface (file URL in, transcript out) - New formatter → conform to
GPTFormatterService.format/formatStreamingshape - Different injection strategy → the
TextInjectionServiceinterface is one method:inject(text:) -> Bool
Build and test before opening a PR:
swift build 2>&1 | grep -E "error:|warning:"If you're adding UI, also do a manual smoke test of the overlay flow (record → format → inject) in at least TextEdit (native AX) and one Electron app (clipboard fallback).
MIT