Aalap is a Python voice-assistant dialogue manager that combines wake word detection, VAD, streaming ASR, and TTS playback in a single loop. It is built around a threaded state machine and is usable both as a CLI tool and a library component.
- Maturity: early-stage (v0.1.0); APIs may change before 1.0.
- Maintenance: active development.
- Supported Python: 3.9+.
- Platforms: intended for Linux and Windows; audio-device behavior and native dependency installation still need validation across more environments.
- Wake word detection via openWakeWord with score threshold, patience, debounce, and custom model support.
- Voice activity detection using Silero VAD.
- Streaming ASR in a worker process using faster-whisper.
- Offline TTS with Piper or online TTS via gTTS.
- Shared input/output audio backend with sounddevice and barge-in handling.
- Optional transcript audio capture with
ffmpeg. - Programmatic triggers and status/transcript callbacks.
- Python 3.9+
- PortAudio (required by sounddevice)
ffmpegis recommended for transcript audio saving and for MP3 decoding via pydub
Install PortAudio and ffmpeg for your OS:
# Ubuntu / Debian
sudo apt-get update
sudo apt-get install -y libportaudio2 portaudio19-dev ffmpeg# macOS (Homebrew)
brew install portaudio ffmpeg# Windows (Chocolatey)
choco install portaudio ffmpeg# linux
python3 -m pip install "git+https://github.com/MnAkash/aalap.git"# windows
python -m pip install "git+https://github.com/MnAkash/aalap.git"Dependencies are listed in requirements.txt.
git clone https://github.com/MnAkash/aalap.git
cd aalap
python -m pip install -e .After installation, run:
aalapThis uses the defaults defined in aalap/dialogue_manager.py.
Show available CLI flags with:
aalap --helpYou can override common settings directly from the CLI:
aalap --model base.en --asr-timeout 7 --tts-backend piper --piper-voice amyOn Windows, wrap any startup code that constructs and runs DialogManager in a if __name__ == "__main__": guard because the package uses multiprocessing.
import multiprocessing as mp
import time
import queue
from aalap import DialogManager
def main() -> None:
transcript_q: queue.Queue[str] = queue.Queue()
status_q: queue.Queue[str] = queue.Queue()
def on_transcript(text: str) -> None:
transcript_q.put(text)
def on_status(status: str) -> None:
status_q.put(status)
def my_policy(user_text: str) -> str:
# Replace with your LLM or rules. Return a reply string.
return f"You said: {user_text}"
manager = DialogManager(
model="base.en",
device="auto",
tts_backend="piper",
wakeword_keywords="hey_jarvis",
wakeword_model_paths=None,
wakeword_score_thresh=0.45,
wakeword_patience_frames=2,
wakeword_debounce_ms=900,
wakeword_vad_threshold=0.0,
on_transcript=on_transcript,
on_status=on_status,
external_policy=my_policy,
)
manager.start()
try:
while True:
time.sleep(0.1)
except KeyboardInterrupt:
pass
finally:
manager.stop()
if __name__ == "__main__":
try:
mp.set_start_method("spawn", force=True)
except RuntimeError:
pass
main()A fuller example is in examples/simple_dialogue.py.
The DialogManager exposes a few useful control methods in aalap/dialogue_manager.py:
trigger_wakeword()to start listening programmaticallydeactivate_wakeword_session()to force the session back to IDLEspeak(text)to enqueue TTS output directly
When you pass on_status, the callback receives the dialog state string emitted by the state machine in aalap/dialogue_manager.py:
IDLE: waiting for wake word or programmatic triggerLISTENING: waiting for user speech to start recordingRECORDING: capturing user speechTRANSCRIBING: running ASR on the captured audioTHINKING: waiting on the external policy to return a replySPEAKING: playing back TTS audioWAKEWORD_TRIGGER: wake word fired and session is activatingSYSTEM_TRIGGER: programmatic trigger fired and session is activating
Most knobs are in aalap/dialogue_manager.py and exposed through the DialogManager constructor.
- Wake word:
wakeword_keywords,wakeword_model_paths(see aalap/wakeword.py) - Wake-word trigger policy:
wakeword_score_thresh,wakeword_patience_frames,wakeword_debounce_ms - Wake-word VAD gate:
wakeword_vad_thresholdenables openWakeWord's internal Silero VAD gating for wake-word scoring. Set0to disable. - Wake-word debug:
wakeword_debug,save_wakeword_debug_audio,wakeword_debug_audio_dir - VAD:
vad_silero_threshold,vad_silero_window_ms,vad_silero_min_speech_ms,vad_silero_min_silence_ms - ASR:
model,device(uses faster-whisper) - TTS:
tts_backend,piper_language,piper_voice,piper_quality - Timing:
silence_ms_after_speech,no_speech_timeout,post_tts_mute - Debug audio capture:
save_transcript_audio,transcript_audio_dir
By default, the built-in "hey_jarvis" model is downloaded automatically. If you provide custom wake words, you must supply matching model paths and name them <wakeword>.onnx.
Model downloads are cached under ~/.cache/aalap (see aalap/wakeword.py).
Piper voices are fetched from rhasspy/piper-voices and cached under ~/.cache/aalap/piper (see aalap/tts_piper.py).
List available devices with:
python -m aalap.list_soundDevicesSee aalap/list_soundDevices.py.
gTTSrequires network access and depends on MP3 decoding via pydub.- Transcript audio saving uses
ffmpeg(see_save_audio_debugin aalap/dialogue_manager.py). - If openWakeWord is not installed or fails to load, wake word detection is disabled and only programmatic triggers are available.
- faster-whisper downloads ASR models from the Hugging Face Hub.
Apache 2.0. See LICENSE.
This is an open-source project and contributions are welcome via pull requests. Please open an issue first for major changes so we can align on scope and approach.