Voice dictation for the terminal. Record, transcribe, copy — zero npm dependencies.
Uses Mistral's Voxtral for speech-to-text.
npm install -g dikt
Requires sox for audio recording (not needed for --file):
# macOS
brew install sox
# Ubuntu/Debian
sudo apt install sox
# Arch
sudo pacman -S soxOptional dependencies for --file mode:
- ffmpeg — enables compression, chunked transcription of long files, and broader format support
- yt-dlp — enables transcribing audio from URLs (YouTube, podcasts, etc.)
On first run, dikt will prompt you for your Mistral API key and model preferences:
dikt setup
Config is stored in ~/.config/dikt/config.json.
dikt
This opens an interactive TUI where you can record, transcribe, and copy text.
| Key | Action |
|---|---|
Space |
Start / stop recording |
c / Enter |
Copy transcript to clipboard |
a |
Toggle auto-copy |
h |
Cycle through history |
r |
Re-transcribe last recording |
Esc |
Cancel recording |
s |
Re-run setup |
? |
Show keybindings |
q |
Quit |
# Print transcript to stdout
dikt -q
# Output JSON
dikt --json
# Pipe to another tool
dikt -q | claude
# Wait longer before auto-stopping
dikt -q --silence 5Continuously transcribe, emitting chunks on pauses:
dikt --stream
# Stream as JSON Lines
dikt --stream --json
# Stream as continuous flowing text
dikt --stream -n
# Stream continuously until Ctrl+C
dikt --stream --silence 0Transcribe an existing audio file (wav, mp3, m4a, flac, ogg, webm, aac, wma, and more):
dikt --file meeting.wav
# Save to a file (.json auto-enables JSON output)
dikt --file meeting.wav -o transcript.json
dikt --file meeting.wav -o transcript.txt
# With JSON output
dikt --file recording.mp3 --json
# Transcribe from a URL (requires yt-dlp)
dikt --file https://youtube.com/watch?v=VIDEO_ID
dikt --file https://youtube.com/watch?v=VIDEO_ID -o transcript.txt# Speaker labels
dikt -q --diarize
# Timestamps
dikt -q --timestamps segment
dikt -q --timestamps word
dikt --file lecture.mp3 --timestamps segment
# Combined with JSON
dikt -q --json --diarize| Flag | Description |
|---|---|
--file <path|url> |
Transcribe audio file or URL (via yt-dlp) |
-o, --output <path> |
Write output to file (.json auto-enables JSON) |
--stream |
Stream transcription chunks on pauses |
--json |
Output JSON (single-shot or stream) |
-q, --quiet |
Record once, print transcript to stdout |
--silence <seconds> |
Silence duration before auto-stop (default: 2.0) |
--pause <seconds> |
Pause duration to split stream chunks (default: 1.0) |
--language <code> |
Language code, e.g. en, de, fr (default: auto) |
--timestamps <granularity> |
Add timestamps: segment or word |
--diarize |
Enable speaker identification |
-n, --no-newline |
Join stream chunks without newlines |
--no-color |
Disable colored output |
--no-input |
Fail if config is missing (no wizard) |
--setup |
Run setup wizard |
--update |
Update to latest version |
--version |
Show version |
-h, --help |
Show help |
dikt update
| Variable | Description |
|---|---|
DIKT_API_KEY |
Override API key |
DIKT_MODEL |
Override model (default: voxtral-mini-latest) |
DIKT_LANGUAGE |
Override language (default: auto) |
DIKT_TEMPERATURE |
Override temperature |
DIKT_CONTEXT_BIAS |
Override context bias |
MIT