Local Whisper Transcriber

A fully local, lightweight, automated transcription system for MP3 recordings that runs entirely on your macOS machine using OpenAI's Whisper model.

Features

🏠 Fully Local: No cloud uploads, everything runs on your machine
👀 Automatic Monitoring: Watches a folder for new MP3 recordings
⚡ Real-time Processing: Transcribes files as soon as recording is complete
📝 Markdown Output: Generates clean Markdown transcripts
⏱️ Timestamps & AI Speaker ID: Timestamps and AI-powered speaker identification
🔔 macOS Notifications: Get notified when transcriptions complete
🔄 Auto-archiving: Moves processed files to archive folder
🔋 Low Resource Usage: Minimal CPU usage when idle
🚀 Auto-start: Runs automatically at login via macOS LaunchAgent
🛡️ Rock-Solid: Automatic error recovery and restart on failures
🎯 VS Code Integration: Designed for seamless analysis with GitHub Copilot

Quick Start

1. Install

Run the automated installer:

git clone <repository-url>
cd local-whisper-transcriber
chmod +x install.sh
./install.sh

The installer will:

Install Homebrew (if needed)
Install fswatch, whisper.cpp, and terminal-notifier
Install Python 3 (for future speaker identification support)
Download the selected Whisper model (may take several minutes)
Install scripts to ~/.local/bin/whisper-transcriber/ (permanent location)
Create required directories (~/Recordings, ~/Recordings/archive, ~/Recordings/transcripts)
Configure the system with timestamps
Install and start the macOS LaunchAgent service
Verify everything is working

Note: Speaker identification infrastructure is included but disabled pending pyannote.audio compatibility update.

After installation, you can safely delete the project directory - all scripts are installed to a permanent location.

2. Start Transcribing

Simply drop MP3 files into your ~/Recordings folder. The system will automatically:

Detect the new file instantly (via fswatch)
Wait for the recording to complete (checks if file is open by any process)
Verify file stability (10 seconds after file is closed)
Transcribe using Whisper with timestamps
Send you a notification when complete
Click the notification to open the transcripts folder
Save the transcript as Markdown in ~/Recordings/transcripts/
Move the MP3 to ~/Recordings/archive/

Safe for active recording: The system uses lsof to check if the file is open by any process (like your recording app), and only starts transcription after the file is closed AND stable.

3. Get Notified & View Transcripts

When transcription completes:

🔔 You'll receive a macOS notification
🔊 You'll hear a Glass sound
👆 Click the notification to open the transcripts folder in Finder
📝 Transcripts appear as Markdown files in ~/Recordings/transcripts/

First-time notification setup: The first time you get a notification, macOS will ask for permission. Click "Allow" to enable notifications.

Requirements

macOS (Apple Silicon recommended)
Homebrew (installed automatically if missing)
4GB+ RAM (for medium Whisper model)
MP3 recordings (other formats not supported)

Installation Details

Automated Installation

The install.sh script handles everything automatically. It installs:

fswatch: Folder monitoring utility (event-driven file detection)
whisper.cpp: Local Whisper implementation (AI transcription)
terminal-notifier: macOS notification utility (completion alerts)
python3: Python runtime for speaker identification
pyannote.audio: AI-powered speaker diarization library
torch & torchaudio: Deep learning framework (required by pyannote)

Manual Installation

If you prefer manual setup:

Install dependencies:

brew install fswatch whisper-cpp terminal-notifier python3
python3 -m pip install --user pyannote.audio torch torchaudio

Create directories:

mkdir -p ~/Recordings ~/Recordings/archive ~/Projects/my-vscode-repo/transcripts

Configure paths in config.sh

Install LaunchAgent:

cp launch_agents/com.local.whisper.plist ~/Library/LaunchAgents/
launchctl load ~/Library/LaunchAgents/com.local.whisper.plist

Configuration

Edit config.sh to customize:

# Folder paths
WATCH_FOLDER="$HOME/Recordings"           # Monitor this folder
ARCHIVE_FOLDER="$HOME/Recordings/archive" # Move processed files here
TRANSCRIPT_FOLDER="$HOME/Recordings/transcripts" # Output location

# Whisper model (base/small/medium/large)
WHISPER_MODEL="base"                      # Smaller = faster, larger = more accurate

# File detection
FILE_STABILITY_TIME=2                     # Seconds to wait before processing

# Transcription options
ENABLE_TIMESTAMPS=true                    # Include timestamps in transcript
TIMESTAMP_FORMAT="srt"                    # Format: srt, vtt, or txt

# AI Speaker Identification
ENABLE_SPEAKER_DIARIZATION=true           # Enable AI-based speaker identification
MIN_SPEAKERS=""                           # Minimum speakers (optional, auto-detect if empty)
MAX_SPEAKERS=""                           # Maximum speakers (optional, auto-detect if empty)

Transcription Features

Timestamps

When ENABLE_TIMESTAMPS=true, transcripts include timing information:

srt: SubRip format with numbered segments and timestamps (e.g., 00:01:23,456 --> 00:01:25,789)
vtt: WebVTT format for web video players
txt: Plain text without timestamps

AI Speaker Identification ⭐ COMING SOON

When ENABLE_SPEAKER_DIARIZATION=true, the system will use pyannote.audio for AI-powered speaker identification:

Works with: Mono or stereo recordings (any audio format)
AI-powered: Uses deep learning to identify different speakers by voice characteristics
Automatic: Detects number of speakers automatically (or you can specify min/max)
Output: Labels speakers as "SPEAKER_00", "SPEAKER_01", etc. with timestamps
Accuracy: State-of-the-art speaker diarization (research-grade quality)

Example with timestamps and AI speaker identification:

**[00:00 - 00:03] SPEAKER_00:**
Welcome to today's meeting.

**[00:03 - 00:07] SPEAKER_01:**
Thanks for having me. Let's discuss the project.

**[00:07 - 00:12] SPEAKER_00:**
Great! Let's start with the requirements.

⚠️ Current Status: Speaker identification is temporarily disabled due to a compatibility issue between pyannote.audio (v3.4.0) and PyTorch 2.9+ on Apple Silicon. The infrastructure is ready and will be automatically enabled once pyannote.audio releases a compatible update. Track the issue: https://github.com/pyannote/pyannote-audio/issues

Note: First run will download speaker diarization models (~300MB). Subsequent runs use cached models.

Whisper Models

Model	Size	Download	Accuracy	Speed
base	~75MB	Fast	Good	Fastest
small	~250MB	Medium	Better	Fast
medium	~500MB	Slow	High	Medium
large	~1GB	Slowest	Best	Slowest

Download times depend on your internet connection

How It Works

File Detection

The system uses fswatch to monitor the watch folder for new files. When an MP3 is detected, it waits for the file to stabilize (no changes for FILE_STABILITY_TIME seconds) before processing. This ensures recordings are complete before transcription begins.

Transcription Process

Input: MP3 file in watch folder
Processing: whisper.cpp transcribes audio to text
Output: Markdown file with transcript
Archiving: Original MP3 moved to archive folder

Markdown Format

Transcripts are saved as clean Markdown:

# Audio Transcript: meeting_notes

**File:** meeting_notes.mp3
**Transcribed:** 2024-01-15 14:30:22
**Model:** base

## Transcript

[Full transcription text here]

---
*Transcribed using Local Whisper Transcriber*

Monitoring & Logs

Service Status

Check if the service is running:

launchctl list | grep com.local.whisper

View Logs

tail -f /tmp/local-whisper.log
tail -f /tmp/local-whisper.error.log

Restart Service

launchctl unload ~/Library/LaunchAgents/com.local.whisper.plist
launchctl load ~/Library/LaunchAgents/com.local.whisper.plist

Troubleshooting

Service Not Starting

Check logs: cat /tmp/local-whisper.error.log
Verify dependencies: which fswatch whisper-cpp
Check LaunchAgent: launchctl list com.local.whisper

Transcription Failing

Verify MP3 file integrity
Check available disk space
Try smaller Whisper model in config
Check logs for specific errors

High CPU Usage

The service should be idle when no files are being processed
If constantly using CPU, check for file system issues
Restart the service: launchctl unload/load

Files Not Processing

Ensure MP3 files are placed directly in watch folder (not in subdirectories)
Check file permissions (files must be readable)
Verify file isn't still being written to (system waits for file stability)
Check logs for processing messages: tail -f /tmp/local-whisper.error.log

Notifications Not Appearing

If you don't receive notifications:

Check notification permissions:
- Go to System Settings → Notifications
- Look for "terminal-notifier" in the list
- Ensure "Allow Notifications" is enabled
Check Focus/Do Not Disturb:
- Notifications may be silenced by Focus mode
- Check the Control Center for active Focus modes

Verify terminal-notifier is installed:

which terminal-notifier
# Should output: /opt/homebrew/bin/terminal-notifier

Test manually:

terminal-notifier -title "Test" -message "Testing notifications"

Check the logs:

tail -f /tmp/local-whisper.error.log
# Look for "Sending notification..." message

Note: You should at least hear the Glass sound even if visual notifications don't appear.

Project Structure

Source Repository

local-whisper-transcriber/
├── README.md                    # This file
├── install.sh                   # Automated installer
├── uninstall.sh                 # Uninstaller
├── watch_and_transcribe.sh      # Main monitoring script (template)
├── config.sh                    # Configuration settings (template)
└── launch_agents/
    └── com.local.whisper.plist  # macOS LaunchAgent configuration (template)

Installed Location (after running install.sh)

~/.local/bin/whisper-transcriber/
├── watch_and_transcribe.sh      # Main monitoring script
└── config.sh                    # Configuration with actual paths

~/Library/LaunchAgents/
└── com.local.whisper.plist      # LaunchAgent (points to installed scripts)

~/.whisper/
└── ggml-base.bin                # Downloaded Whisper model

~/Recordings/
├── archive/                     # Processed MP3s moved here
├── transcripts/                 # Generated transcripts appear here
└── [your MP3 files]

Development

Local Testing

Run the script manually for testing:

./watch_and_transcribe.sh

Modifying Configuration

Edit config.sh and restart the service.

Adding Features

Modify watch_and_transcribe.sh for new functionality
Update config.sh for new settings
Test changes locally before redeploying

Security & Privacy

✅ No cloud uploads: Everything stays local
✅ No data collection: No telemetry or external communication
✅ File isolation: Processed files moved to archive
✅ Local models: Whisper runs entirely on device

Performance Notes

Base model: ~10-15 seconds per minute of audio
Small model: ~20-30 seconds per minute
Medium model: ~40-60 seconds per minute
Large model: ~80-120 seconds per minute

Processing time depends on:

Audio length and quality
CPU performance (Apple Silicon recommended)
Available RAM
Whisper model size

Contributing

Fork the repository
Create a feature branch
Make your changes
Test thoroughly
Submit a pull request

License

This project is open source. See LICENSE file for details.

Support

Check the logs: /tmp/local-whisper.log
Verify configuration in config.sh
Test with a small MP3 file first
Ensure adequate free disk space (>2x MP3 file size)

Happy transcribing! 🎙️➡️📝

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
docs		docs
launch_agents		launch_agents
.gitignore		.gitignore
README.md		README.md
config.sh		config.sh
diarize_speakers.py		diarize_speakers.py
install.sh		install.sh
uninstall.sh		uninstall.sh
watch_and_transcribe.sh		watch_and_transcribe.sh

Folders and files

Latest commit

History

Repository files navigation