A fully local, lightweight, automated transcription system for MP3 recordings that runs entirely on your macOS machine using OpenAI's Whisper model.
- 🏠 Fully Local: No cloud uploads, everything runs on your machine
- 👀 Automatic Monitoring: Watches a folder for new MP3 recordings
- ⚡ Real-time Processing: Transcribes files as soon as recording is complete
- 📝 Markdown Output: Generates clean Markdown transcripts
- ⏱️ Timestamps & AI Speaker ID: Timestamps and AI-powered speaker identification
- 🔔 macOS Notifications: Get notified when transcriptions complete
- 🔄 Auto-archiving: Moves processed files to archive folder
- 🔋 Low Resource Usage: Minimal CPU usage when idle
- 🚀 Auto-start: Runs automatically at login via macOS LaunchAgent
- 🛡️ Rock-Solid: Automatic error recovery and restart on failures
- 🎯 VS Code Integration: Designed for seamless analysis with GitHub Copilot
Run the automated installer:
git clone <repository-url>
cd local-whisper-transcriber
chmod +x install.sh
./install.shThe installer will:
- Install Homebrew (if needed)
- Install
fswatch,whisper.cpp, andterminal-notifier - Install Python 3 (for future speaker identification support)
- Download the selected Whisper model (may take several minutes)
- Install scripts to
~/.local/bin/whisper-transcriber/(permanent location) - Create required directories (
~/Recordings,~/Recordings/archive,~/Recordings/transcripts) - Configure the system with timestamps
- Install and start the macOS LaunchAgent service
- Verify everything is working
Note: Speaker identification infrastructure is included but disabled pending pyannote.audio compatibility update.
After installation, you can safely delete the project directory - all scripts are installed to a permanent location.
Simply drop MP3 files into your ~/Recordings folder. The system will automatically:
- Detect the new file instantly (via fswatch)
- Wait for the recording to complete (checks if file is open by any process)
- Verify file stability (10 seconds after file is closed)
- Transcribe using Whisper with timestamps
- Send you a notification when complete
- Click the notification to open the transcripts folder
- Save the transcript as Markdown in
~/Recordings/transcripts/ - Move the MP3 to
~/Recordings/archive/
Safe for active recording: The system uses lsof to check if the file is open by any process (like your recording app), and only starts transcription after the file is closed AND stable.
When transcription completes:
- 🔔 You'll receive a macOS notification
- 🔊 You'll hear a Glass sound
- 👆 Click the notification to open the transcripts folder in Finder
- 📝 Transcripts appear as Markdown files in
~/Recordings/transcripts/
First-time notification setup: The first time you get a notification, macOS will ask for permission. Click "Allow" to enable notifications.
- macOS (Apple Silicon recommended)
- Homebrew (installed automatically if missing)
- 4GB+ RAM (for medium Whisper model)
- MP3 recordings (other formats not supported)
The install.sh script handles everything automatically. It installs:
fswatch: Folder monitoring utility (event-driven file detection)whisper.cpp: Local Whisper implementation (AI transcription)terminal-notifier: macOS notification utility (completion alerts)python3: Python runtime for speaker identificationpyannote.audio: AI-powered speaker diarization librarytorch&torchaudio: Deep learning framework (required by pyannote)
If you prefer manual setup:
-
Install dependencies:
brew install fswatch whisper-cpp terminal-notifier python3 python3 -m pip install --user pyannote.audio torch torchaudio
-
Create directories:
mkdir -p ~/Recordings ~/Recordings/archive ~/Projects/my-vscode-repo/transcripts
-
Configure paths in
config.sh -
Install LaunchAgent:
cp launch_agents/com.local.whisper.plist ~/Library/LaunchAgents/ launchctl load ~/Library/LaunchAgents/com.local.whisper.plist
Edit config.sh to customize:
# Folder paths
WATCH_FOLDER="$HOME/Recordings" # Monitor this folder
ARCHIVE_FOLDER="$HOME/Recordings/archive" # Move processed files here
TRANSCRIPT_FOLDER="$HOME/Recordings/transcripts" # Output location
# Whisper model (base/small/medium/large)
WHISPER_MODEL="base" # Smaller = faster, larger = more accurate
# File detection
FILE_STABILITY_TIME=2 # Seconds to wait before processing
# Transcription options
ENABLE_TIMESTAMPS=true # Include timestamps in transcript
TIMESTAMP_FORMAT="srt" # Format: srt, vtt, or txt
# AI Speaker Identification
ENABLE_SPEAKER_DIARIZATION=true # Enable AI-based speaker identification
MIN_SPEAKERS="" # Minimum speakers (optional, auto-detect if empty)
MAX_SPEAKERS="" # Maximum speakers (optional, auto-detect if empty)When ENABLE_TIMESTAMPS=true, transcripts include timing information:
- srt: SubRip format with numbered segments and timestamps (e.g.,
00:01:23,456 --> 00:01:25,789) - vtt: WebVTT format for web video players
- txt: Plain text without timestamps
When ENABLE_SPEAKER_DIARIZATION=true, the system will use pyannote.audio for AI-powered speaker identification:
- Works with: Mono or stereo recordings (any audio format)
- AI-powered: Uses deep learning to identify different speakers by voice characteristics
- Automatic: Detects number of speakers automatically (or you can specify min/max)
- Output: Labels speakers as "SPEAKER_00", "SPEAKER_01", etc. with timestamps
- Accuracy: State-of-the-art speaker diarization (research-grade quality)
Example with timestamps and AI speaker identification:
**[00:00 - 00:03] SPEAKER_00:**
Welcome to today's meeting.
**[00:03 - 00:07] SPEAKER_01:**
Thanks for having me. Let's discuss the project.
**[00:07 - 00:12] SPEAKER_00:**
Great! Let's start with the requirements.
Note: First run will download speaker diarization models (~300MB). Subsequent runs use cached models.
| Model | Size | Download | Accuracy | Speed |
|---|---|---|---|---|
| base | ~75MB | Fast | Good | Fastest |
| small | ~250MB | Medium | Better | Fast |
| medium | ~500MB | Slow | High | Medium |
| large | ~1GB | Slowest | Best | Slowest |
Download times depend on your internet connection
The system uses fswatch to monitor the watch folder for new files. When an MP3 is detected, it waits for the file to stabilize (no changes for FILE_STABILITY_TIME seconds) before processing. This ensures recordings are complete before transcription begins.
- Input: MP3 file in watch folder
- Processing:
whisper.cpptranscribes audio to text - Output: Markdown file with transcript
- Archiving: Original MP3 moved to archive folder
Transcripts are saved as clean Markdown:
# Audio Transcript: meeting_notes
**File:** meeting_notes.mp3
**Transcribed:** 2024-01-15 14:30:22
**Model:** base
## Transcript
[Full transcription text here]
---
*Transcribed using Local Whisper Transcriber*Check if the service is running:
launchctl list | grep com.local.whispertail -f /tmp/local-whisper.log
tail -f /tmp/local-whisper.error.loglaunchctl unload ~/Library/LaunchAgents/com.local.whisper.plist
launchctl load ~/Library/LaunchAgents/com.local.whisper.plist- Check logs:
cat /tmp/local-whisper.error.log - Verify dependencies:
which fswatch whisper-cpp - Check LaunchAgent:
launchctl list com.local.whisper
- Verify MP3 file integrity
- Check available disk space
- Try smaller Whisper model in config
- Check logs for specific errors
- The service should be idle when no files are being processed
- If constantly using CPU, check for file system issues
- Restart the service:
launchctl unload/load
- Ensure MP3 files are placed directly in watch folder (not in subdirectories)
- Check file permissions (files must be readable)
- Verify file isn't still being written to (system waits for file stability)
- Check logs for processing messages:
tail -f /tmp/local-whisper.error.log
If you don't receive notifications:
-
Check notification permissions:
- Go to System Settings → Notifications
- Look for "terminal-notifier" in the list
- Ensure "Allow Notifications" is enabled
-
Check Focus/Do Not Disturb:
- Notifications may be silenced by Focus mode
- Check the Control Center for active Focus modes
-
Verify terminal-notifier is installed:
which terminal-notifier # Should output: /opt/homebrew/bin/terminal-notifier -
Test manually:
terminal-notifier -title "Test" -message "Testing notifications"
-
Check the logs:
tail -f /tmp/local-whisper.error.log # Look for "Sending notification..." message
Note: You should at least hear the Glass sound even if visual notifications don't appear.
local-whisper-transcriber/
├── README.md # This file
├── install.sh # Automated installer
├── uninstall.sh # Uninstaller
├── watch_and_transcribe.sh # Main monitoring script (template)
├── config.sh # Configuration settings (template)
└── launch_agents/
└── com.local.whisper.plist # macOS LaunchAgent configuration (template)
~/.local/bin/whisper-transcriber/
├── watch_and_transcribe.sh # Main monitoring script
└── config.sh # Configuration with actual paths
~/Library/LaunchAgents/
└── com.local.whisper.plist # LaunchAgent (points to installed scripts)
~/.whisper/
└── ggml-base.bin # Downloaded Whisper model
~/Recordings/
├── archive/ # Processed MP3s moved here
├── transcripts/ # Generated transcripts appear here
└── [your MP3 files]
Run the script manually for testing:
./watch_and_transcribe.shEdit config.sh and restart the service.
- Modify
watch_and_transcribe.shfor new functionality - Update
config.shfor new settings - Test changes locally before redeploying
- ✅ No cloud uploads: Everything stays local
- ✅ No data collection: No telemetry or external communication
- ✅ File isolation: Processed files moved to archive
- ✅ Local models: Whisper runs entirely on device
- Base model: ~10-15 seconds per minute of audio
- Small model: ~20-30 seconds per minute
- Medium model: ~40-60 seconds per minute
- Large model: ~80-120 seconds per minute
Processing time depends on:
- Audio length and quality
- CPU performance (Apple Silicon recommended)
- Available RAM
- Whisper model size
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
This project is open source. See LICENSE file for details.
- Check the logs:
/tmp/local-whisper.log - Verify configuration in
config.sh - Test with a small MP3 file first
- Ensure adequate free disk space (>2x MP3 file size)
Happy transcribing! 🎙️➡️📝