Whisper Dictation for Ubuntu (Wayland)

Keyboard shortcut voice dictation using OpenAI Whisper model.

Prerequisites

Docker
xclip
arecord (ALSA)
paplay (PulseAudio, optional for sounds)

Installation

./scripts/install.sh

Or install manually:

sudo apt-get install -y docker.io xclip alsa-utils pulseaudio-utils
sudo usermod -aG docker $USER
newgrp docker  # Or log out and back in

Quick Start

make build
make run
make enable-service

# Test recording (USB mic example)
arecord -f cd -t wav recordings/test.wav -d 3 -D hw:1,0

# Test transcription
make test

Configuration

WHISPER_LANG (default: auto) - input language hint. Supports auto, en, ru, english, russian. Use auto to let Whisper detect the language.
WHISPER_OUTPUT_LANG - strict output language hint (e.g., en or ru). Forces transcription to this language.
WHISPER_MODEL (default: openai/whisper-base) - model id (restart daemon/service after changing). Available models: tiny, base, small, medium, large-v3, large-v3-turbo.
WHISPER_PORT (default: 8610) - daemon port (host).

Setup Keyboard Shortcut

Go to Settings → Keyboard → View and Customize Shortcuts → Custom Shortcuts
Add shortcut:
- Name: "Voice Dictate"
- Command: /home/yuki/Projects/scuffed-whisper/scripts/voice-dictate.sh
- Shortcut: Ctrl+Shift+D

Autostart (Systemd User Service)

The service keeps the container and daemon running in the background and will automatically start on the next login.

Enable autostart:

make enable-service

Verify service is running and enabled:

systemctl --user status whisper-dictate.service

To disable:

make disable-service

If you change WHISPER_MODEL, WHISPER_LANG, or WHISPER_OUTPUT_LANG, restart the service:

make disable-service
make enable-service

Usage

Press shortcut to start recording
Speak
Press shortcut again to transcribe and copy to clipboard
Paste anywhere with Ctrl+V

How It Works

Recording stored in recordings/ directory
Docker container runs CPU-only Whisper model (compatible with AMD integrated GPU)
HuggingFace cache persisted at ~/.cache/huggingface to avoid repeated downloads
Transcribe script mounted from host, so no rebuilds needed for script changes
Daemon keeps model warm and accepts HTTP transcription requests
Transcription written to temp file, then copied via host xclip (Wayland compatible)
Audio + desktop notifications for recording/transcribing/ready

File Locations

Recordings: recordings/dictation.wav
Temp transcript: /tmp/whisper-transcript.txt
Container: whisper-app
Daemon: http://127.0.0.1:8610

Troubleshooting

Check service status: systemctl --user status whisper-dictate.service
Check container: docker ps
View logs: docker logs whisper-app
Stop: make stop
Rebuild: make clean && make build

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
recordings		recordings
scripts		scripts
.gitignore		.gitignore
AGENT_SETUP.md		AGENT_SETUP.md
Dockerfile		Dockerfile
Makefile		Makefile
QUICKSTART.md		QUICKSTART.md
README.md		README.md
server.py		server.py
transcribe.py		transcribe.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whisper Dictation for Ubuntu (Wayland)

Prerequisites

Installation

Quick Start

Configuration

Setup Keyboard Shortcut

Autostart (Systemd User Service)

Usage

How It Works

File Locations

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Whisper Dictation for Ubuntu (Wayland)

Prerequisites

Installation

Quick Start

Configuration

Setup Keyboard Shortcut

Autostart (Systemd User Service)

Usage

How It Works

File Locations

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages