Skip to content

akadatalimited/breath-assist

Repository files navigation

Breath Assist - Voice-Activated AI Assistant

A 100% local, voice-activated AI assistant that allows you to speak to your computer using your voice and execute commands like building apps, system administration, and more. Similar to HAL from 2001: A Space Odyssey but using local resources only.

Features

  • 100% Local Processing: No cloud services required
  • Voice Activation: Listen for wake word and respond to voice commands
  • Voice Activity Detection: Automatically detects when you stop speaking
  • Conversational AI: Natural language interaction with local models
  • System Administration: Execute safe system commands
  • Development Tools: Build applications based on verbal instructions
  • File Operations: Read, write, list, and manage files
  • Git Integration: Perform git operations
  • Network Operations: Safe network commands
  • Tool Creation: Create new tools as needed
  • Web Browsing: Open browsers to specific URLs
  • Time & Date: Provide accurate system time and date
  • Memory System: Remember important information with user confirmation

Architecture

┌────────────┐
│  Microphone│
└─────┬──────┘
      ▼
┌────────────┐
│ pw-record  │  (PipeWire)
└─────┬──────┘
      ▼
┌────────────┐
│ Whisper    │  (STT)
└─────┬──────┘
      ▼
┌───────────────────────────┐
│ Intent / Reasoning Model  │  (Ollama: qwen3 / deepseek / llama)
└─────┬─────────────────────┘
      ▼
┌───────────────────────────┐
│ Tool Router (YOURS)       │  ← THIS IS THE CORE
│ - bash                    │
│ - fs                      │
│ - git                     │
│ - build                   │
│ - network                 │
│ - self-create tools       │
└─────┬─────────────────────┘
      ▼
┌────────────┐
│ Piper TTS  │
└────────────┘

Prerequisites

You need the following system components installed:

Audio Processing

  • pw-record and arecord for audio capture
  • sox for audio processing
  • Appropriate audio drivers for your microphone

AI Models & Frameworks

  • Ollama with a supported model (e.g., qwen3:latest, deepseek-r1:latest)
  • Whisper (either Python whisper package or whisper.cpp)
  • Piper TTS for text-to-speech

Dependencies

  • Python 3.10+
  • Required Python packages (see requirements.txt)

Installation

1. Clone the Repository

git clone https://github.com/yourusername/breath-assist.git
cd breath-assist

2. Initialize Submodules

This project uses submodules for AI frameworks. Initialize them:

git submodule init
git submodule update

3. Setup Ollama

Make sure Ollama is running:

ollama serve

Pull a supported model:

ollama pull qwen3:latest
# or
ollama pull deepseek-r1:latest

4. Install Dependencies

pip install -r requirements.txt

5. Configure Audio

Ensure your microphone is accessible via arecord:

arecord -l  # List available audio devices

Usage

Basic Usage

Start the voice assistant:

./scripts/start_assistant.sh

Or with a custom wake word:

./scripts/start_assistant.sh "hal"

The assistant will listen for the wake word followed by commands:

  • "computer, what is the time?" → Gets current time
  • "computer, what is the date?" → Gets current date
  • "computer, list files in current directory" → Lists directory contents
  • "computer, open firefox and go to google.com" → Opens Firefox to Google
  • "computer, build a web app called myblog" → Creates a web application
  • "computer, my name is Andrew" → Remembers your name (with confirmation)
  • "computer, what is my name?" → Retrieves stored name

Memory Features

The assistant can store important information with your confirmation:

  • When you say "my principles are..." it will ask "Should I remember this? Please say yes remember to confirm."
  • If you confirm, it will ask for a key like "user.principles"
  • Then it will preview "Storing user.principles = ... Say confirm to save."
  • Confirm again to store the information permanently

Available Commands

System & File Operations

  • ls, ps, top, df, du, free, uptime, uname, etc.
  • File operations: read, write, list, delete, move
  • Git commands: git status, git commit, etc.

Development

  • Build operations: Python, Node.js, Rust, Go apps
  • App creation: Web applications and more
  • Tool creation: Generate custom shell scripts

Web & Network

  • Browse to websites
  • Network operations: curl, wget, ping, etc.

Configuration

The assistant can be configured via environment variables:

  • OLLAMA_MODEL: Specify which Ollama model to use (default: qwen3:latest)
  • OLLAMA_BASE_URL: Specify Ollama server URL (default: http://127.0.0.1:11434)

Troubleshooting

Audio Issues

  • Make sure your microphone is properly configured
  • Check arecord -l to ensure the device is detected
  • Verify audio permissions

Ollama Issues

  • Ensure Ollama server is running with ollama serve
  • Check that your chosen model is pulled with ollama pull [model]

Performance

  • The system uses whisper.cpp for faster speech-to-text processing
  • Using smaller models (like qwen3:4b instead of 7b) can improve response times

Contributing

Contributions are welcome! Please submit a pull request or open an issue for bugs and feature requests.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Ollama for local LLM serving
  • Whisper.cpp for fast speech-to-text
  • Piper TTS for local text-to-speech
  • Various Python libraries that make this project possible

About

a vad tts agent for Linux(Arch if you please)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors