A voice automation system that allows hands-free operation of Claude Code CLI through speech commands. Uses local Whisper for speech recognition and targets any terminal window through focus-based control.
- 🎤 Local Voice Recognition - Uses faster-whisper for offline speech-to-text processing
- 🎯 Universal Terminal Control - Works with any focused terminal window using xdotool
- 📝 Smart Text Accumulation - Speak your request in parts, combine with finalization keywords
- 🔧 Claude CLI Mode Control - Voice commands for plan/auto/interactive modes
- 🔢 Quick Option Selection - Say "option one" through "option five" for numbered choices
- ✅ Voice Confirmations - "yes"/"no" responses for Claude prompts
- 🖥️ Window Targeting - 10-second countdown to capture your target terminal
- 🔇 No API Dependencies - Completely offline speech recognition
- ⚡ Direct Keyboard Simulation - Sends actual keystrokes to focused applications
The system works by:
- Capturing audio from your microphone
- Converting speech to text using local Whisper
- Processing voice commands with intelligent text accumulation
- Targeting any terminal window through xdotool focus capture
- Sending keyboard input directly to the targeted application
This approach works with any terminal application, preserving native functionality while adding voice control.
- Python 3.8 or higher
- Claude Code CLI installed and accessible in your target terminal
- Microphone access
- Linux system with xdotool installed (
sudo apt install xdotool)
pip install -r requirements.txtThe system uses local Whisper by default - no API keys needed!
Whisper model options (tiny is default for speed):
export WHISPER_MODEL_SIZE=tiny # Fastest, still accurate
export WHISPER_MODEL_SIZE=base # Better accuracy, slower
export WHISPER_MODEL_SIZE=small # Even better accuracyOptional settings:
export AUDIO_DEVICE_INDEX=0 # Specific microphone device
export DEBUG_MODE=true # Enable detailed loggingpython main.pyThe system will:
- Give you 5 seconds to focus on your target terminal (where Claude CLI is running)
- Capture the focused window for keyboard input targeting
- Begin listening for voice commands
See COMMAND.md for complete operational reference
The system can be configured through environment variables:
WHISPER_MODEL_SIZE- Model size (tiny, base, small, medium, large)WHISPER_DEVICE- Processing device (cpu, cuda, auto)WHISPER_COMPUTE_TYPE- Computation type (int8, float16, float32)AUDIO_DEVICE_INDEX- Specific microphone device indexLOG_LEVEL- Logging level (DEBUG, INFO, WARNING, ERROR)DEBUG_MODE- Enable debug mode (true/false)
Create voice_commander.config for persistent settings (JSON format):
{
"audio": {
"sample_rate": 16000,
"silence_threshold": 0.01
},
"whisper": {
"model": "base",
"use_local": true
},
"feedback": {
"enable_audio": true,
"voice_feedback": false
}
}src/
├── __init__.py
├── config.py # Configuration management
├── voice_commander.py # Main orchestrator
├── audio_capture.py # Real-time audio capture
├── speech_to_text.py # Whisper integration
├── command_processor.py # Command parsing and text accumulation
├── universal_terminal_controller.py # xdotool-based terminal control
└── feedback_system.py # User feedback
pip install -e .Audio not being captured:
- Check microphone permissions
- Verify audio device in system settings
- Try different
AUDIO_DEVICE_INDEXvalues
xdotool not working:
- Install xdotool:
sudo apt install xdotool - Verify X11 display is available
- Check if running in Wayland (xdotool requires X11)
Commands not reaching Claude:
- Ensure target terminal was properly focused during 10-second countdown
- Check that Claude CLI is running in the captured window
- Try recapturing window by restarting the application
Low recognition accuracy:
- Speak clearly at moderate pace
- Reduce background noise
- Use exact phrases from GUIDE.md for best results
Enable debug mode for detailed logging:
export DEBUG_MODE=true
python main.py- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Submit a pull request
MIT License - see LICENSE file for details.
- OpenAI Whisper for speech recognition
- Anthropic Claude for the amazing CLI tool
- The Python community for excellent libraries