Summary
Add voice mode support to CodeMie CLI, allowing users to interact with AI coding agents using voice input instead of (or in addition to) text input.
Motivation
Voice input provides a hands-free, natural interaction mode that can improve developer productivity — especially during code reviews, brainstorming, and when navigating complex codebases. The project already has a sound effects system (--sounds flag) for audio output on hook events, but lacks any voice/microphone input capability.
Requirements
Core Functionality
Audio Recording
Speech-to-Text Integration
CLI Integration
Configuration
Technical Considerations
- Existing audio infra: The project has audio player detection in
src/agents/plugins/claude/sounds-installer.ts — recording detection should follow the same pattern
- OpenAI SDK: Already included as a dependency with
openai/resources/audio (transcription, translation, speech APIs)
- Architecture: Should follow the plugin-based 5-layer architecture (CLI → Registry → Plugin → Core → Utils)
- Cross-platform: Must work on macOS (SoX/
rec), Linux (arecord, SoX), and Windows WSL
Out of Scope (for initial version)
- Text-to-speech responses (agent speaking back)
- Real-time streaming transcription
- Multi-language auto-detection
- Wake word activation
Acceptance Criteria
- User can start a voice session with
codemie chat --voice or toggle with /voice
- Audio is recorded, transcribed, and sent as a prompt to the agent
- Clear error messages when audio tools are missing
- Voice settings are configurable per profile
- Works on macOS and Linux
Summary
Add voice mode support to CodeMie CLI, allowing users to interact with AI coding agents using voice input instead of (or in addition to) text input.
Motivation
Voice input provides a hands-free, natural interaction mode that can improve developer productivity — especially during code reviews, brainstorming, and when navigating complex codebases. The project already has a sound effects system (
--soundsflag) for audio output on hook events, but lacks any voice/microphone input capability.Requirements
Core Functionality
/voicecommand)Audio Recording
rec,arecord, etc.)brew install sox)Speech-to-Text Integration
CLI Integration
--voiceflag to agent session commands/voiceslash command to toggle voice mode within a sessionUserPromptSubmithook)Configuration
voice.enabled— enable/disable voice modevoice.provider— STT provider (openai, local-whisper)voice.language— preferred language for transcriptionvoice.silenceTimeout— seconds of silence before auto-stopvoice.confirmBeforeSend— show transcription before sendingTechnical Considerations
src/agents/plugins/claude/sounds-installer.ts— recording detection should follow the same patternopenai/resources/audio(transcription, translation, speech APIs)rec), Linux (arecord, SoX), and Windows WSLOut of Scope (for initial version)
Acceptance Criteria
codemie chat --voiceor toggle with/voice