Skip to content

MnAkash/aalap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Aalap

Commits Stars Forks

Aalap is a Python voice-assistant dialogue manager that combines wake word detection, VAD, streaming ASR, and TTS playback in a single loop. It is built around a threaded state machine and is usable both as a CLI tool and a library component.

Project status

  • Maturity: early-stage (v0.1.0); APIs may change before 1.0.
  • Maintenance: active development.
  • Supported Python: 3.9+.
  • Platforms: intended for Linux and Windows; audio-device behavior and native dependency installation still need validation across more environments.

Features

  • Wake word detection via openWakeWord with score threshold, patience, debounce, and custom model support.
  • Voice activity detection using Silero VAD.
  • Streaming ASR in a worker process using faster-whisper.
  • Offline TTS with Piper or online TTS via gTTS.
  • Shared input/output audio backend with sounddevice and barge-in handling.
  • Optional transcript audio capture with ffmpeg.
  • Programmatic triggers and status/transcript callbacks.

Installation

Requirements

  • Python 3.9+
  • PortAudio (required by sounddevice)
  • ffmpeg is recommended for transcript audio saving and for MP3 decoding via pydub

System packages

Install PortAudio and ffmpeg for your OS:

# Ubuntu / Debian
sudo apt-get update
sudo apt-get install -y libportaudio2 portaudio19-dev ffmpeg
# macOS (Homebrew)
brew install portaudio ffmpeg
# Windows (Chocolatey)
choco install portaudio ffmpeg

Install with pip (no clone)

# linux
python3 -m pip install "git+https://github.com/MnAkash/aalap.git"
# windows
python -m pip install "git+https://github.com/MnAkash/aalap.git"

Dependencies are listed in requirements.txt.

Install from source

git clone https://github.com/MnAkash/aalap.git
cd aalap
python -m pip install -e .

Quickstart (CLI)

After installation, run:

aalap

This uses the defaults defined in aalap/dialogue_manager.py.

Show available CLI flags with:

aalap --help

You can override common settings directly from the CLI:

aalap --model base.en --asr-timeout 7 --tts-backend piper --piper-voice amy

On Windows, wrap any startup code that constructs and runs DialogManager in a if __name__ == "__main__": guard because the package uses multiprocessing.

Quickstart (Python)

import multiprocessing as mp
import time
import queue
from aalap import DialogManager

def main() -> None:
    transcript_q: queue.Queue[str] = queue.Queue()
    status_q: queue.Queue[str] = queue.Queue()

    def on_transcript(text: str) -> None:
        transcript_q.put(text)

    def on_status(status: str) -> None:
        status_q.put(status)

    def my_policy(user_text: str) -> str:
        # Replace with your LLM or rules. Return a reply string.
        return f"You said: {user_text}"

    manager = DialogManager(
        model="base.en",
        device="auto",
        tts_backend="piper",
        wakeword_keywords="hey_jarvis",
        wakeword_model_paths=None,
        wakeword_score_thresh=0.45,
        wakeword_patience_frames=2,
        wakeword_debounce_ms=900,
        wakeword_vad_threshold=0.0,
        on_transcript=on_transcript,
        on_status=on_status,
        external_policy=my_policy,
    )
    manager.start()

    try:
        while True:
            time.sleep(0.1)
    except KeyboardInterrupt:
        pass
    finally:
        manager.stop()

if __name__ == "__main__":
    try:
        mp.set_start_method("spawn", force=True)
    except RuntimeError:
        pass
    main()

A fuller example is in examples/simple_dialogue.py.

Runtime control

The DialogManager exposes a few useful control methods in aalap/dialogue_manager.py:

  • trigger_wakeword() to start listening programmatically
  • deactivate_wakeword_session() to force the session back to IDLE
  • speak(text) to enqueue TTS output directly

Status callback

When you pass on_status, the callback receives the dialog state string emitted by the state machine in aalap/dialogue_manager.py:

  • IDLE: waiting for wake word or programmatic trigger
  • LISTENING: waiting for user speech to start recording
  • RECORDING: capturing user speech
  • TRANSCRIBING: running ASR on the captured audio
  • THINKING: waiting on the external policy to return a reply
  • SPEAKING: playing back TTS audio
  • WAKEWORD_TRIGGER: wake word fired and session is activating
  • SYSTEM_TRIGGER: programmatic trigger fired and session is activating

Configuration highlights

Most knobs are in aalap/dialogue_manager.py and exposed through the DialogManager constructor.

  • Wake word: wakeword_keywords, wakeword_model_paths (see aalap/wakeword.py)
  • Wake-word trigger policy: wakeword_score_thresh, wakeword_patience_frames, wakeword_debounce_ms
  • Wake-word VAD gate: wakeword_vad_threshold enables openWakeWord's internal Silero VAD gating for wake-word scoring. Set 0 to disable.
  • Wake-word debug: wakeword_debug, save_wakeword_debug_audio, wakeword_debug_audio_dir
  • VAD: vad_silero_threshold, vad_silero_window_ms, vad_silero_min_speech_ms, vad_silero_min_silence_ms
  • ASR: model, device (uses faster-whisper)
  • TTS: tts_backend, piper_language, piper_voice, piper_quality
  • Timing: silence_ms_after_speech, no_speech_timeout, post_tts_mute
  • Debug audio capture: save_transcript_audio, transcript_audio_dir

Wake word models

By default, the built-in "hey_jarvis" model is downloaded automatically. If you provide custom wake words, you must supply matching model paths and name them <wakeword>.onnx.

Model downloads are cached under ~/.cache/aalap (see aalap/wakeword.py).

Piper voice models

Piper voices are fetched from rhasspy/piper-voices and cached under ~/.cache/aalap/piper (see aalap/tts_piper.py).

Audio device selection

List available devices with:

python -m aalap.list_soundDevices

See aalap/list_soundDevices.py.

Notes

License

Apache 2.0. See LICENSE.

Collaboration

This is an open-source project and contributions are welcome via pull requests. Please open an issue first for major changes so we can align on scope and approach.

About

A speech to speech dialogue management package using faster-whisper ASR, Piper TTS, and advanced wake-word support.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages