Skip to content

StrikeRobot/Auralis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

        ╔══════════════════════════╗
        ║  )))  AURALIS  (((       ║
        ║   ┌───────────────┐      ║
        ║   │  ◉  ══  ◉    │      ║
        ║   │     ───       │      ║
        ║   └───────────────┘      ║
        ║    ▓▓▓▓▓▓▓▓▓▓▓▓▓        ║
        ╚══════════════════════════╝

Auralis

Voice-controlled household robot assistant with real-time intent recognition

License: MIT Python Next.js TypeScript Docker WebSocket

Demo  •  Features  •  Quick Start  •  Architecture  •  Configuration  •  Development  •  Tech Stack  •  License


Demo

Auralis Demo

Hold the mic button, speak a command — Auralis parses intent in real-time, updates device state, and speaks a response back via TTS

Features

Feature Description
Push-to-talk Hold the mic button to capture voice; release to process
Real-time NLU Rule-based intent engine maps utterances to structured {intent, slots, confidence} with no LLM latency
Live device dashboard Six simulated smart-home devices update visually the moment a command is executed
WebSocket pipeline Full-duplex WS stream between browser and backend — interim transcripts, intent events, device deltas, and TTS payloads
Waveform visualizer Animated canvas waveform during recording
Intent inspector Shows parsed intent, extracted slots, and confidence score for every utterance
Command history Persistent SQLite log of every voice command with execution status
Skill registry Six built-in skills (Lighting, Climate, Blinds, Media, Scenes, Timers) listed on the Settings page
Browser TTS Auralis speaks its response back using the Web Speech Synthesis API
One-command setup docker compose up --build launches the full stack

Quick Start

Prerequisites

Run

git clone https://github.com/StrikeRobot/auralis.git
cd auralis

cp .env.example .env

docker compose up --build

Open http://localhost:3000 — hold the mic button and speak.

Browser note: Web Speech API requires a secure context (HTTPS) or localhost. Chrome/Edge recommended.


Architecture

┌─────────────────────────┐          ┌──────────────────────────────┐
│  frontend               │          │  backend                      │
│  Next.js 14 + TS        │          │  FastAPI + Python 3.12        │
│  Tailwind + Zustand      │◄──WS────►│  WebSocket hub (events.py)   │
│  Framer Motion          │  REST    │  Intent NLU (services/intent) │
│  Web Speech API (STT)   │          │  Device engine (state machine)│
│  SpeechSynthesis (TTS)  │          │  SQLModel + SQLite            │
└─────────────────────────┘          └──────────────────────────────┘
        :3000                                    :8000

Data flow for a voice command:

  1. User holds the mic button; Web Speech API streams interim transcripts to the backend via WebSocket
  2. On release, the final transcript is sent as {type: "transcript", text: "...", interim: false}
  3. Backend runs the utterance through the rule-based NLU; emits {type: "intent", data: {intent, slots, confidence}}
  4. Device engine matches affected devices and applies state mutations; each update broadcasts {type: "device_update"}
  5. Response text is generated and returned as {type: "tts_response"}; the frontend speaks it via SpeechSynthesis
  6. Command is persisted to SQLite for the history timeline

Configuration

Variable Default Description
DB_PATH /data/auralis.db SQLite database path (inside container)
ALLOWED_ORIGINS http://localhost:3000 CORS allowed origins (comma-separated)

Development

Backend

cd backend
pip install uv
uv pip install --system -e ".[dev]"
cp ../.env.example ../.env
uvicorn app.main:app --reload --port 8000

Frontend

cd frontend
npm install
cp .env.local.example .env.local
npm run dev

Tests

cd backend
pytest -v

Tech Stack

Layer Technology
Frontend framework Next.js 14 (App Router)
Language TypeScript 5
Styling Tailwind CSS 3
State management Zustand 4
Animation Framer Motion 11
Voice input Web Speech API (browser-native STT)
Voice output Web Speech Synthesis API (browser-native TTS)
Backend framework FastAPI
Backend language Python 3.12
ORM / DB SQLModel + SQLite
Real-time WebSocket (FastAPI native)
Container Docker + Docker Compose

License

MIT © 2025 — see LICENSE for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors