Transcribe your audio locally with Whisper multimodal speech-to-text model by OpenAI
Download DMG-image from releases and launch it. You can use the application from image or copy it to Applications folder for a quick access. It's up to you.
When you first open dmg-image or application, you are most probable to get untrusted developer warning. It basically means that, as far as the project is non-commercial, I do not have neither funding, nor wish to register in Apple. To bypass the warning and use the application:
- open system settings
- navigate to privacy and security settings
- allow the application to run (enter password, if prompted)
To uninstall the application, move the .app file and ~/.cache/whisper to trash.
- Install uv
- Clone repository
git clone https://github.com/loginchik/whisper-app.git
cd whisper-app- Configure virtual environment
uv sync --all-groups- Build app
make build_english Internet connection is required only to download a model on its first usage, while all other processes run locally on your machine
You need to choose a model and add audio files to process
tiny, base or small are powerful enough to handle most tasks and can be run on almost any PC having at least 4GB of RAM. For more complicated cases you can try larger models, but remember about resource limitations of your machine. You can always check About models or Whisper to study model's requirements.
Previously used models, if you had not manually deleted them from cache directory, are almost ready to use. Models that need to be downloaded first are marked with red icon.
— here tiny and base are available locally; small, medium and large will be downloaded first.
For each audio file, it is recommended to pass language for Whisper to start with relative context. Presets are predefined task settings (created by ChatGPT) that can help you handle popular tasks. If you are not sure which preset to choose, use universal.
Besides preset, there are separate options you can set manually:
| Setting | Purpose | Performance |
|---|---|---|
| Word timestamps | Force Whisper to export every word's timing and probability in the resulting Excel file | Decreases peformance and increases processing time |
| Propmt | Pass additional context to Whisper to put into context of the audio contents | Correctly formed, prompt usually increased transcription quality |
| Condition on previous text | Take previously processed audio files into consideration, when transcribing this one | May lead to hallucinations |
| FP16 | More energy efficient mode, which can worsen transcription quality |
While task is running, most application features freeze.
Transcribed files are exported in Downloads folder and can be located via double click.
This is a non-commercial project for personal usage, contributions are welcome. For major changes, please open an issue first to discuss what you would like to change.
If application crashes a few times in a row, consider it a bug. If you want the bug to be fixed, open an issue and make sure to include step-by-step description of your actions and log files from ~/.cache/whisper/logging directory.


