Aria is a Windows-first personal AI assistant and teacher. She listens, sees your screen, researches, and remembers — built to help you learn and work, not to be a black-box agent. Every meaningful action stays observable, testable, and under your control.
Aria proposes. Chan decides.
Two flagship use cases drive the project:
- Meeting transcription for study — record a Teams (or any) meeting; Aria captures the audio, transcribes it locally with Whisper, saves a timestamped transcript, and can summarise it into study notes on demand. Review meetings by reference later.
- On-screen study help — ask about whatever is on your screen. "What's wrong with line 6?" while a SQL statement is in front of you → Aria grabs the active window, looks at it, and explains it in plain language. Great for SC-200 material, code, errors, and queries.
Plus the everyday assistant basics: time/date/reminders, web research, weather, and open-ended conversation with memory.
Point Aria at any code on screen and ask her to teach it (not just debug it).
Say: "Aria, explain this code" (or "walk me through this", "break this down").
Aria captures the active window and a senior-to-junior explanation is produced,
covering: what the code does, a walkthrough with the reasoning behind each part,
the key concepts to learn, and common mistakes to watch for. A short summary is
spoken aloud, and the full explanation is saved to data/explanations/ as a
timestamped Markdown file so you can build a personal study library and read
explanations back later. Each one is also logged to study_sessions with a
detected language tag (SQL, Python, …).
Teaching phrases ("explain this code") route here; debugging phrases ("what's wrong with line 6") stay with screen-assist.
Note on history: Aria previously included a finance/Trading212/market research direction. That has been removed — Aria is now focused on being an assistant and teacher. The old code remains in git history if you ever need it.
Aria's skills are self-registering plugins under capabilities/. The core router stays
domain-agnostic and simply dispatches to whichever capability claims a request.
| Capability | What it does |
|---|---|
vision |
Screen-assist: captures the active window and explains/evaluates what's on it (Gemini). Logs every Q&A to study_sessions. |
code_explanation |
Teaching mode: senior-to-junior explanation of on-screen code, saved to data/explanations/ for study. |
meeting |
Start/stop meeting transcription; summarise a saved meeting into study notes (Claude). |
web |
Web search, weather, and current-event follow-ups (Gemini + web scrape). |
assistant |
Time, date, reminders/calendar, queued notifications, analysis-mode toggle. |
reasoning |
The Claude generalist — handles anything no other capability claims, plus the offline Ollama fallback. |
For each request the router runs three layers, cheapest first:
- Keyword claims — each capability offers a deterministic match (with a priority).
- Semantic match — if nothing matched, a small local embedding model
(
fastembed,BAAI/bge-small-en-v1.5, CPU) matches the phrasing to a capability by meaning, so novel wording still routes correctly. - Claude — the generalist fallback when nothing else fits.
Side-effecting intents (e.g. start recording the meeting) are keyword-only, so they can't be triggered by a loose semantic match.
- Voice — say Aria's name (or use conversation mode) and ask. faster-whisper transcribes;
Kokoro ONNX (
af_heart) speaks the reply. - Typed — the terminal dashboard has an input bar; typed prompts go through the exact same router, useful when speech is misheard.
- Screen-assist hotkey —
Ctrl+Alt+Agrabs the active window, then records one spoken question to answer about it.
Input (voice / typed / hotkey)
│
▼
core/router.py ── keyword claims → semantic match → Claude fallback
│
▼
capabilities/* ── the skill that owns the matched intent runs
vision · meeting · web · assistant · reasoning
│
▼
voice/speaker.py ── Kokoro ONNX synthesis + playback
core/terminal_ui.py ── live dashboard, logs, insight queue
Cross-cutting pieces in core/: capability.py (interface + registry),
semantic.py (embedding router), response_shaping.py (shared reply shaping),
memory.py (SQLite: episodic, semantic, study_sessions, meetings),
screen_capture.py (on-demand + rolling capture), vision_analyzer.py (Gemini),
meeting_recorder.py (loopback + mic capture → transcript), personality.py.
A normal microphone only hears you. The other people on a call come out of your speakers,
so Aria also records the system loopback (what you hear) via the soundcard library,
mixes it with your mic, and transcribes ~25-second chunks. While a meeting records, the voice
assistant pauses (they share the microphone). If loopback can't be opened on your machine,
Aria degrades to mic-only and says so in the log.
Transcripts are written to data/meetings/<timestamp>.md; summaries to *-notes.md.
- Windows 11, Python 3.13, PowerShell 7 recommended
- NVIDIA GPU recommended for Kokoro ONNX CUDA
- Anthropic API key — Claude reasoning + meeting summaries
- Gemini API key — screen-assist / vision + web reasoning
soundcard(installed via requirements) for meeting loopback capture
git clone https://github.com/chansg/aria.git
cd aria
python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install -r requirements.txt
playwright install chromiumGPU note:
fastembedpulls the CPUonnxruntime, which can shadowonnxruntime-gpuand drop Kokoro TTS off CUDA. If that happens, restore the GPU runtime:pip uninstall -y onnxruntime; pip install --force-reinstall --no-deps onnxruntime-gpu.
Copy-Item config.example.py config.pyEdit config.py locally (never commit real keys). Key settings:
ANTHROPIC_API_KEY = "..."
GEMINI_API_KEY = "..."
CONVERSATION_MODE_DEFAULT = True
# Screen-assist
SCREEN_ASSIST_ENABLED = True
SCREEN_ASSIST_HOTKEY = "<ctrl>+<alt>+a"
# Meeting transcription
MEETING_TRANSCRIPTION_ENABLED = True
MEETING_CHUNK_SECONDS = 25
MEETING_OUTPUT_DIR = "data/meetings"
# Voice
TTS_PROVIDER = "kokoro"
KOKORO_VOICE = "af_heart"python main.pyOn startup Aria registers her capabilities, warms up the semantic router, asks for a microphone, calibrates ambient noise, initialises memory, and opens the dashboard.
- "Aria, what's wrong with line 6?" (with your SQL window focused) — or press
Ctrl+Alt+A. - "Aria, start transcribing the meeting" … then "stop transcribing" … then "summarise the last meeting".
- "Aria, what's the weather today?"
python -m pytest -q # unit + routing + capability tests
python tools\run_validation.py # structured session report- Small reviewed changes over broad speculative refactors.
- Keep runtime failures visible in
logs/aria.log— no silent fallbacks. - Add a test for every bug found through real use.
- The core stays domain-agnostic; new skills are added as capability plugins at the edge.
- Aria proposes; Chan reviews and approves.