Aura is a local-first AI companion powered by an ESP32-S3-Box-3 for audio I/O and a Python backend ("Brain") for speech + intelligence.
- Wake‑word based interaction ("Aura", "Hey Aura", etc.)
- Local wake‑word + STT (Whisper small)
- Conversational AI via LLM (Ollama / Llama 3.2)
- Speech output via Microsoft Edge TTS
- Music playback with categories (Rap / Item / Relax / Travel / Random)
- Smart Home triggers via tag system
- Cute UI Face with blinking, lipsync, idle sleep
- Low-latency duplex streaming via WebSockets
┌───────────────┐ PCM Audio ┌───────────────┐
│ ESP32‑S3 BOX │ ───────────────▶ │ Aura Brain │
│ (Microphone) │ │ (Python API) │
└──────┬────────┘ TTS Audio └───────┬────────┘
│ ◀──────────────────────────────────────┘
│
▼
┌──────────────┐
│ UI + Speaker │
└──────────────┘
AURA-AI/
├── main/
│ ├── aura_firmware.c
│ ├── CMakeLists.txt
│ └── idf_component.yml
├── managed_components/
├── partitions.csv
└── sdkconfig
│
└── aura_brain/ # Python backend
├── assets/
│ └── songs/ # MUSIC GOES HERE (IMPORTANT: ADD music files in these folders)
│ ├── rap/
│ ├── item_songs/
│ ├── relax mixed_genre/
│ ├── travel/
│ └── random/
├── model/
├── server.py
├── dashboard.html
├── requirements.txt
└── dependencies.txt
To enable music playback, you must place audio files in:
aura_brain/assets/songs/
Supported formats:
.mp3.wav.m4a.flac
Categories used by the system:
| Category Tag | Folder Name |
|---|---|
PLAY_RAP |
Baadshah, rap/ |
PLAY_ITEM |
item_songs/ |
PLAY_RELAX |
relax mixed_genre/ |
PLAY_TRAVEL |
travel/ |
PLAY_RANDOM |
any folder / mixed |
If folders do not exist, create them manually.
You must have Python 3.9+.
pip install -r requirements.txtWhisper model used: small.en
FFmpeg must be present or placed in the working directory as:
ffmpeg.exe
ffprobe.exe
curl https://ollama.ai/install.sh | sh
ollama pull llama3.2python server.pyServer starts at:
ws://<your-ip>:8000/ws/audio
IP can be found running *ipconfig* in terminal (use wifi IPv4)
---
# ⚙️ Setup – ESP32 Firmware
### **1. Dependencies (ESP-IDF)**
Install ESP-IDF v5.x
```sh
git clone https://github.com/espressif/esp-idf.git
Edit:
#define WIFI_SSID "<YOUR_WIFI>"
#define WIFI_PASS "<YOUR_PASSWORD>"
#define BRAIN_SERVER_URI "ws://<IP>:8000/ws/audio"idf.py build
idf.py flash
idf.py monitor| User Intent | LLM Tag Output |
|---|---|
| "Turn on lights" | {{LIGHT_ON}} |
| "Play rap" | {{PLAY_RAP}} |
| "Stop music" | {{STOP_MUSIC}} |
LLM is instructed via this system prompt (already inside code).`
Typical entries (already included):
fastapi
uvicorn
websockets
pydub
faster-whisper
edge-tts
glob2
ollama
Tracks ESP-IDF + LVGL + BSP + Codec
Example:
ESP-IDF v5.3
LVGL v8.x
esp-box-3-bsp
esp_codec_dev
esp_websocket_client
Place this beside your server.py.
- Wake-word handled via STT phrase matching
- LLM streaming with sentence-split buffering
- Traffic control prevents full-duplex overlap
- UI state machine drives ears + eyes + mouth
View On Youtube :Aura Ai Companion
Created by Krishna Chauhan for 2025 Circuit Digest Competition Aura Submission.
Uses:
- ESP32‑S3‑Box‑3 (Hardware)
- Whisper (STT)
- Edge‑TTS (Speech)
- Llama 3.2 via Ollama (LLM)
- LVGL (UI)