╔══════════════════════════╗
║ ))) AURALIS ((( ║
║ ┌───────────────┐ ║
║ │ ◉ ══ ◉ │ ║
║ │ ─── │ ║
║ └───────────────┘ ║
║ ▓▓▓▓▓▓▓▓▓▓▓▓▓ ║
╚══════════════════════════╝
Voice-controlled household robot assistant with real-time intent recognition
Demo • Features • Quick Start • Architecture • Configuration • Development • Tech Stack • License
Hold the mic button, speak a command — Auralis parses intent in real-time, updates device state, and speaks a response back via TTS
| Feature | Description |
|---|---|
| Push-to-talk | Hold the mic button to capture voice; release to process |
| Real-time NLU | Rule-based intent engine maps utterances to structured {intent, slots, confidence} with no LLM latency |
| Live device dashboard | Six simulated smart-home devices update visually the moment a command is executed |
| WebSocket pipeline | Full-duplex WS stream between browser and backend — interim transcripts, intent events, device deltas, and TTS payloads |
| Waveform visualizer | Animated canvas waveform during recording |
| Intent inspector | Shows parsed intent, extracted slots, and confidence score for every utterance |
| Command history | Persistent SQLite log of every voice command with execution status |
| Skill registry | Six built-in skills (Lighting, Climate, Blinds, Media, Scenes, Timers) listed on the Settings page |
| Browser TTS | Auralis speaks its response back using the Web Speech Synthesis API |
| One-command setup | docker compose up --build launches the full stack |
- Docker Desktop or Docker Engine + Compose v2
git clone https://github.com/StrikeRobot/auralis.git
cd auralis
cp .env.example .env
docker compose up --buildOpen http://localhost:3000 — hold the mic button and speak.
Browser note: Web Speech API requires a secure context (HTTPS) or
localhost. Chrome/Edge recommended.
┌─────────────────────────┐ ┌──────────────────────────────┐
│ frontend │ │ backend │
│ Next.js 14 + TS │ │ FastAPI + Python 3.12 │
│ Tailwind + Zustand │◄──WS────►│ WebSocket hub (events.py) │
│ Framer Motion │ REST │ Intent NLU (services/intent) │
│ Web Speech API (STT) │ │ Device engine (state machine)│
│ SpeechSynthesis (TTS) │ │ SQLModel + SQLite │
└─────────────────────────┘ └──────────────────────────────┘
:3000 :8000
Data flow for a voice command:
- User holds the mic button; Web Speech API streams interim transcripts to the backend via WebSocket
- On release, the final transcript is sent as
{type: "transcript", text: "...", interim: false} - Backend runs the utterance through the rule-based NLU; emits
{type: "intent", data: {intent, slots, confidence}} - Device engine matches affected devices and applies state mutations; each update broadcasts
{type: "device_update"} - Response text is generated and returned as
{type: "tts_response"}; the frontend speaks it viaSpeechSynthesis - Command is persisted to SQLite for the history timeline
| Variable | Default | Description |
|---|---|---|
DB_PATH |
/data/auralis.db |
SQLite database path (inside container) |
ALLOWED_ORIGINS |
http://localhost:3000 |
CORS allowed origins (comma-separated) |
cd backend
pip install uv
uv pip install --system -e ".[dev]"
cp ../.env.example ../.env
uvicorn app.main:app --reload --port 8000cd frontend
npm install
cp .env.local.example .env.local
npm run devcd backend
pytest -v| Layer | Technology |
|---|---|
| Frontend framework | Next.js 14 (App Router) |
| Language | TypeScript 5 |
| Styling | Tailwind CSS 3 |
| State management | Zustand 4 |
| Animation | Framer Motion 11 |
| Voice input | Web Speech API (browser-native STT) |
| Voice output | Web Speech Synthesis API (browser-native TTS) |
| Backend framework | FastAPI |
| Backend language | Python 3.12 |
| ORM / DB | SQLModel + SQLite |
| Real-time | WebSocket (FastAPI native) |
| Container | Docker + Docker Compose |
MIT © 2025 — see LICENSE for details.