A voice interface sidecar for LLM clients. Provides text-to-speech output via MCP and speech-to-text input via wake word detection.
- Text-to-Speech - Local TTS using KittenTTS (ONNX, no cloud API)
- Speech-to-Text - Local STT using faster-whisper with Silero VAD
- Wake Word Detection - Trigger dictation with customizable wake word
- MCP Server - SSE transport for LLM client integration
- Text Injection - Dictated text injected into active window
- Barge-In - Interrupt TTS playback by speaking
Debian/Ubuntu:
sudo apt install xdotool portaudio19-devFedora:
sudo dnf install xdotool portaudio-develArch:
sudo pacman -S xdotool portaudioPython 3.10+ required.
# Clone or download
cd PythiaMCP-0.1.0
# Create virtual environment
python -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt# Start with defaults
python main.py
# Custom port
python main.py --port 8000
# Start with dictation enabled
python main.py --dictation
# List audio devices
python main.py --list-devices
# Verbose logging
python main.py -v # INFO level
python main.py -vv # DEBUG level| Key | Action |
|---|---|
q |
Quit |
d |
Toggle dictation |
s |
Open settings |
i |
Select input device |
o |
Select output device |
v |
Cycle TTS voice |
c |
Copy MCP config |
? |
Help |
Up/Down |
Adjust TTS speed |
Space |
Skip TTS |
Add to your LLM client's MCP config:
{
"mcpServers": {
"pythia": {
"url": "http://127.0.0.1:7984/sse"
}
}
}Settings are stored in ~/.config/pythia/config.json. Use the Settings modal (s) to configure:
- Wake word and position
- Whisper model (tiny.en → large-v3)
- Silence timeout
- TTS speed and voice
- Text injection method
- Barge-in behavior
- Press
dto enable dictation - Say your wake word (default: "Computer") followed by your message
- Text is transcribed and injected into the active window
Example: "Computer, what is the weather today?" → injects "what is the weather today?"
GPL-3.0 - See LICENSE file.