VoiceScribe

A native macOS menu bar app that turns your voice into clean, formatted text — anywhere your cursor is. Press a hotkey, speak, and watch the cleaned-up text stream into your active text field. Like WispFlow, but open source and bring-your-own-OpenAI-key.

Features

Global hotkey — ⌥Space from any app (configurable)
Floating overlay — translucent pill that appears over your work without stealing focus
Live waveform — visual feedback as you speak
Whisper transcription — OpenAI Whisper API; accent-aware, multilingual
GPT formatting — GPT-4o-mini cleans grammar, removes filler words, preserves your voice and intent
Streaming responses — formatted text appears token-by-token in the overlay as GPT generates it
Smart text injection — types at your cursor via Accessibility API, falls back to clipboard paste in apps that don't expose native text fields (Cursor, VS Code, Chrome, Slack, etc.)
Silence auto-stop — configurable pause length (default 3 s) before recording auto-stops
Silence detection — empty recordings are dropped before they reach Whisper, so you never get ghost transcripts like "Thank you" or "You" from silence hallucinations
Edit-before-inject (optional) — preview the formatted text in an editable field; ⌘↵ to inject, Esc to cancel
Recent transcripts — last 10 transcripts available from the menu bar for one-click copy
Custom system prompt — fully editable in Settings
Launch at login — toggleable
Clipboard-history hygiene — uses the org.nspasteboard.TransientType convention so text injections aren't recorded by Raycast / Paste / Maccy / Pastebot

Requirements

macOS 13 (Ventura) or later
Xcode 15+ installed (the Swift compiler + frameworks are needed; you never need to open Xcode itself)
An OpenAI API key — covers both Whisper transcription and GPT-4o-mini formatting

Quick start

git clone https://github.com/opeoyeleke/voicescribe.git
cd voicescribe/VoiceScribe

# (Recommended, one-time) Create a stable signing identity in your login
# keychain so the macOS Accessibility grant survives across rebuilds.
./scripts/setup-signing.sh

# Build a release .app bundle
./scripts/build.sh

# Move it wherever you want and launch
mv .build/release/VoiceScribe.app ~/Applications/
open ~/Applications/VoiceScribe.app

On first launch:

Grant Microphone access when prompted
Open System Settings → Privacy & Security → Accessibility → enable VoiceScribe
Click the menu bar icon → Settings… → API tab → paste your OpenAI key
Press ⌥Space anywhere and start speaking

Usage

Place your cursor in any text field (any app)
Press ⌥Space
Speak naturally
Press ⌥Space again, or pause for ~3 s — recording auto-stops
Whisper transcribes → GPT formats → text appears at your cursor

If you've enabled Preview before injecting in Settings, an editable overlay appears after formatting. ⌘↵ injects, Esc cancels.

Settings

Open via the menu bar icon → Settings…

General
- Hotkey (default ⌥Space)
- Auto-inject text into focused app
- Auto-stop after silence (1.0–6.0 s)
- Preview before injecting
- Launch at login
API
- OpenAI API key
- Formatting model (GPT-4o-mini or GPT-4o)
- Test Connection button
Prompt
- Edit the system prompt that GPT receives
- Reset to default

Architecture

VoiceScribe/
├── Package.swift                     SPM manifest (one dep: KeyboardShortcuts)
├── Resources/
│   ├── Info.plist                    Bundle metadata, LSUIElement, permission strings
│   └── AppIcon.icns                  App icon (regenerate with scripts/generate-icon.sh)
├── scripts/
│   ├── build.sh                      Release build → .build/release/VoiceScribe.app
│   ├── setup-signing.sh              One-time: self-signed cert in login keychain
│   ├── generate-icon.swift           CoreGraphics icon renderer
│   └── generate-icon.sh              Wraps Swift renderer + sips + iconutil
└── Sources/VoiceScribe/
    ├── App/
    │   ├── VoiceScribeApp.swift      @main entry; placeholder Settings scene
    │   ├── AppDelegate.swift         Menu bar, hotkey, settings window, recent submenu
    │   └── OverlayWindow.swift       Borderless NSWindow; key only in edit-confirm mode
    ├── Models/
    │   └── AppState.swift            Observable state + @AppStorage settings
    ├── Services/
    │   ├── AudioRecorderService.swift  AVAudioRecorder → 16 kHz mono PCM .wav, level meter, silence auto-stop
    │   ├── WhisperService.swift        POST .wav to OpenAI Whisper API → raw transcript
    │   ├── GPTFormatterService.swift   POST transcript to OpenAI Chat API → clean text (streaming + non-streaming)
    │   ├── TextInjectionService.swift  AXUIElement injection, clipboard fallback (transient pasteboard type)
    │   └── RecordingCoordinator.swift  Owns the pipeline: record → transcribe → format → inject
    └── Views/
        ├── OverlayView.swift         SwiftUI floating pill UI
        └── SettingsView.swift        Tabbed Settings: General / API / Prompt

Customisation

Change language

VoiceScribe currently sends language=en to Whisper. To change, edit WhisperService.swift — adjust the language init parameter or expose it in Settings.

Custom formatting prompt

Settings → Prompt tab. The default prompt instructs GPT to act as a transcription editor (not a chatbot), preserve greetings, and never answer questions or invent content. Reset to default if you want to pick up prompt updates after a git pull.

Different Whisper biasing

The Whisper API call passes a prompt parameter to discourage common training-set hallucinations ("Thank you", "Subscribe to my channel"). Tune it in WhisperService.swift buildMultipartBody().

Building from source

Dev loop (debug build, fast iteration)

swift build
cp .build/debug/VoiceScribe .build/VoiceScribe.app/Contents/MacOS/
codesign --force --deep --sign "VoiceScribe Dev" .build/VoiceScribe.app
open .build/VoiceScribe.app

(After running setup-signing.sh once. Without it, substitute --sign - for ad-hoc signing — but be aware that ad-hoc rebuilds invalidate the macOS Accessibility grant on every iteration.)

Release build

./scripts/build.sh

Produces a signed .app at .build/release/VoiceScribe.app. Uses "VoiceScribe Dev" if available (run setup-signing.sh first), otherwise falls back to ad-hoc with a warning.

Regenerate app icon

./scripts/generate-icon.sh

Renders a fresh Resources/AppIcon.icns from the Swift CoreGraphics script. Edit colours / SF Symbol in scripts/generate-icon.swift.

Distribution

This repo is set up for bring-your-own-build: you clone, run the build script, and use the resulting .app. There is no signed/notarised release in the GitHub Releases.

If you do distribute a signed .app to others, two things matter:

Ad-hoc signed (--sign -) builds can be opened by anyone but each macOS install treats them as unidentified, requiring a right-click → Open the first time. Accessibility grants are tied to the code identity, so updates from someone else's machine require re-granting.
Properly notarised distribution requires the Apple Developer Program ($99/yr) so you can sign with a Developer ID certificate and submit the bundle to Apple's notarisation service. Once notarised, downloads work without any Gatekeeper friction.

Privacy

Audio is sent to OpenAI's Whisper API for transcription
The raw transcript is sent to OpenAI's Chat Completions API for formatting
Both endpoints are reached directly from your machine via URLSession; no third-party servers are involved
Your OpenAI API key is stored in macOS UserDefaults (com.opeoyeleke.voicescribe), never transmitted anywhere except OpenAI
Recent transcripts are kept locally in UserDefaults (last 10 entries; clearable from the menu)
Refer to OpenAI's data usage policy for what they do with API audio and text

Roadmap

Done:

Whisper transcription via OpenAI API
GPT-4o-mini formatting (streaming)
Native AX text injection + clipboard fallback
Configurable silence auto-stop
Edit-before-inject mode
Recent transcripts menu
Whisper hallucination biasing
Stable code-signing (no permission churn on rebuild)

Possible next steps:

Local Whisper via whisper.cpp for offline / zero-latency transcription
Per-app prompt profiles (different prompt for code vs prose vs Slack)
Voice commands ("new paragraph", "delete that", "scratch that")
Sparkle auto-updater
Notarised distribution

Contributing

PRs welcome. Each service has a single responsibility and is easy to swap or extend:

New transcription backend → conform to the WhisperService interface (file URL in, transcript out)
New formatter → conform to GPTFormatterService.format / formatStreaming shape
Different injection strategy → the TextInjectionService interface is one method: inject(text:) -> Bool

Build and test before opening a PR:

swift build 2>&1 | grep -E "error:|warning:"

If you're adding UI, also do a manual smoke test of the overlay flow (record → format → inject) in at least TextEdit (native AX) and one Electron app (clipboard fallback).

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github		.github
Resources		Resources
Sources/VoiceScribe		Sources/VoiceScribe
scripts		scripts
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Package.resolved		Package.resolved
Package.swift		Package.swift
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VoiceScribe

Features

Requirements

Quick start

Usage

Settings

Architecture

Customisation

Change language

Custom formatting prompt

Different Whisper biasing

Building from source

Dev loop (debug build, fast iteration)

Release build

Regenerate app icon

Distribution

Privacy

Roadmap

Contributing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VoiceScribe

Features

Requirements

Quick start

Usage

Settings

Architecture

Customisation

Change language

Custom formatting prompt

Different Whisper biasing

Building from source

Dev loop (debug build, fast iteration)

Release build

Regenerate app icon

Distribution

Privacy

Roadmap

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages