Aalap

Aalap is a Python voice-assistant dialogue manager that combines wake word detection, VAD, streaming ASR, and TTS playback in a single loop. It is built around a threaded state machine and is usable both as a CLI tool and a library component.

Project status

Maturity: early-stage (v0.1.0); APIs may change before 1.0.
Maintenance: active development.
Supported Python: 3.9+.
Platforms: intended for Linux and Windows; audio-device behavior and native dependency installation still need validation across more environments.

Features

Wake word detection via openWakeWord with score threshold, patience, debounce, and custom model support.
Voice activity detection using Silero VAD.
Streaming ASR in a worker process using faster-whisper.
Offline TTS with Piper or online TTS via gTTS.
Shared input/output audio backend with sounddevice and barge-in handling.
Optional transcript audio capture with ffmpeg.
Programmatic triggers and status/transcript callbacks.

Installation

Requirements

Python 3.9+
PortAudio (required by sounddevice)
ffmpeg is recommended for transcript audio saving and for MP3 decoding via pydub

System packages

Install PortAudio and ffmpeg for your OS:

# Ubuntu / Debian
sudo apt-get update
sudo apt-get install -y libportaudio2 portaudio19-dev ffmpeg

# macOS (Homebrew)
brew install portaudio ffmpeg

# Windows (Chocolatey)
choco install portaudio ffmpeg

Install with pip (no clone)

# linux
python3 -m pip install "git+https://github.com/MnAkash/aalap.git"

# windows
python -m pip install "git+https://github.com/MnAkash/aalap.git"

Dependencies are listed in requirements.txt.

Install from source

git clone https://github.com/MnAkash/aalap.git
cd aalap
python -m pip install -e .

Quickstart (CLI)

After installation, run:

aalap

This uses the defaults defined in aalap/dialogue_manager.py.

Show available CLI flags with:

aalap --help

You can override common settings directly from the CLI:

aalap --model base.en --asr-timeout 7 --tts-backend piper --piper-voice amy

On Windows, wrap any startup code that constructs and runs DialogManager in a if __name__ == "__main__": guard because the package uses multiprocessing.

Quickstart (Python)

import multiprocessing as mp
import time
import queue
from aalap import DialogManager

def main() -> None:
    transcript_q: queue.Queue[str] = queue.Queue()
    status_q: queue.Queue[str] = queue.Queue()

    def on_transcript(text: str) -> None:
        transcript_q.put(text)

    def on_status(status: str) -> None:
        status_q.put(status)

    def my_policy(user_text: str) -> str:
        # Replace with your LLM or rules. Return a reply string.
        return f"You said: {user_text}"

    manager = DialogManager(
        model="base.en",
        device="auto",
        tts_backend="piper",
        wakeword_keywords="hey_jarvis",
        wakeword_model_paths=None,
        wakeword_score_thresh=0.45,
        wakeword_patience_frames=2,
        wakeword_debounce_ms=900,
        wakeword_vad_threshold=0.0,
        on_transcript=on_transcript,
        on_status=on_status,
        external_policy=my_policy,
    )
    manager.start()

    try:
        while True:
            time.sleep(0.1)
    except KeyboardInterrupt:
        pass
    finally:
        manager.stop()

if __name__ == "__main__":
    try:
        mp.set_start_method("spawn", force=True)
    except RuntimeError:
        pass
    main()

A fuller example is in examples/simple_dialogue.py.

Runtime control

The DialogManager exposes a few useful control methods in aalap/dialogue_manager.py:

trigger_wakeword() to start listening programmatically
deactivate_wakeword_session() to force the session back to IDLE
speak(text) to enqueue TTS output directly

Status callback

When you pass on_status, the callback receives the dialog state string emitted by the state machine in aalap/dialogue_manager.py:

IDLE: waiting for wake word or programmatic trigger
LISTENING: waiting for user speech to start recording
RECORDING: capturing user speech
TRANSCRIBING: running ASR on the captured audio
THINKING: waiting on the external policy to return a reply
SPEAKING: playing back TTS audio
WAKEWORD_TRIGGER: wake word fired and session is activating
SYSTEM_TRIGGER: programmatic trigger fired and session is activating

Configuration highlights

Most knobs are in aalap/dialogue_manager.py and exposed through the DialogManager constructor.

Wake word: wakeword_keywords, wakeword_model_paths (see aalap/wakeword.py)
Wake-word trigger policy: wakeword_score_thresh, wakeword_patience_frames, wakeword_debounce_ms
Wake-word VAD gate: wakeword_vad_threshold enables openWakeWord's internal Silero VAD gating for wake-word scoring. Set 0 to disable.
Wake-word debug: wakeword_debug, save_wakeword_debug_audio, wakeword_debug_audio_dir
VAD: vad_silero_threshold, vad_silero_window_ms, vad_silero_min_speech_ms, vad_silero_min_silence_ms
ASR: model, device (uses faster-whisper)
TTS: tts_backend, piper_language, piper_voice, piper_quality
Timing: silence_ms_after_speech, no_speech_timeout, post_tts_mute
Debug audio capture: save_transcript_audio, transcript_audio_dir

Wake word models

By default, the built-in "hey_jarvis" model is downloaded automatically. If you provide custom wake words, you must supply matching model paths and name them <wakeword>.onnx.

Model downloads are cached under ~/.cache/aalap (see aalap/wakeword.py).

Piper voice models

Piper voices are fetched from rhasspy/piper-voices and cached under ~/.cache/aalap/piper (see aalap/tts_piper.py).

Audio device selection

List available devices with:

python -m aalap.list_soundDevices

See aalap/list_soundDevices.py.

Notes

gTTS requires network access and depends on MP3 decoding via pydub.
Transcript audio saving uses ffmpeg (see _save_audio_debug in aalap/dialogue_manager.py).
If openWakeWord is not installed or fails to load, wake word detection is disabled and only programmatic triggers are available.
faster-whisper downloads ASR models from the Hugging Face Hub.

License

Apache 2.0. See LICENSE.

Collaboration

This is an open-source project and contributions are welcome via pull requests. Please open an issue first for major changes so we can align on scope and approach.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
aalap		aalap
examples		examples
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Aalap

Project status

Features

Installation

Requirements

System packages

Install with pip (no clone)

Install from source

Quickstart (CLI)

Quickstart (Python)

Runtime control

Status callback

Configuration highlights

Wake word models

Piper voice models

Audio device selection

Notes

License

Collaboration

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Aalap

Project status

Features

Installation

Requirements

System packages

Install with pip (no clone)

Install from source

Quickstart (CLI)

Quickstart (Python)

Runtime control

Status callback

Configuration highlights

Wake word models

Piper voice models

Audio device selection

Notes

License

Collaboration

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages