Project Aria

Aria is a Windows-first personal AI assistant and teacher. She listens, sees your screen, researches, and remembers — built to help you learn and work, not to be a black-box agent. Every meaningful action stays observable, testable, and under your control.

Aria proposes. Chan decides.

What Aria is for

Two flagship use cases drive the project:

Meeting transcription for study — record a Teams (or any) meeting; Aria captures the audio, transcribes it locally with Whisper, saves a timestamped transcript, and can summarise it into study notes on demand. Review meetings by reference later.
On-screen study help — ask about whatever is on your screen. "What's wrong with line 6?" while a SQL statement is in front of you → Aria grabs the active window, looks at it, and explains it in plain language. Great for SC-200 material, code, errors, and queries.

Plus the everyday assistant basics: time/date/reminders, web research, weather, and open-ended conversation with memory.

Code Explanation Mode

Point Aria at any code on screen and ask her to teach it (not just debug it).

Say: "Aria, explain this code" (or "walk me through this", "break this down").

Aria captures the active window and a senior-to-junior explanation is produced, covering: what the code does, a walkthrough with the reasoning behind each part, the key concepts to learn, and common mistakes to watch for. A short summary is spoken aloud, and the full explanation is saved to data/explanations/ as a timestamped Markdown file so you can build a personal study library and read explanations back later. Each one is also logged to study_sessions with a detected language tag (SQL, Python, …).

Teaching phrases ("explain this code") route here; debugging phrases ("what's wrong with line 6") stay with screen-assist.

Note on history: Aria previously included a finance/Trading212/market research direction. That has been removed — Aria is now focused on being an assistant and teacher. The old code remains in git history if you ever need it.

Capabilities

Aria's skills are self-registering plugins under capabilities/. The core router stays domain-agnostic and simply dispatches to whichever capability claims a request.

Capability	What it does
`vision`	Screen-assist: captures the active window and explains/evaluates what's on it (Gemini). Logs every Q&A to `study_sessions`.
`code_explanation`	Teaching mode: senior-to-junior explanation of on-screen code, saved to `data/explanations/` for study.
`meeting`	Start/stop meeting transcription; summarise a saved meeting into study notes (Claude).
`web`	Web search, weather, and current-event follow-ups (Gemini + web scrape).
`assistant`	Time, date, reminders/calendar, queued notifications, analysis-mode toggle.
`reasoning`	The Claude generalist — handles anything no other capability claims, plus the offline Ollama fallback.

How routing works

For each request the router runs three layers, cheapest first:

Keyword claims — each capability offers a deterministic match (with a priority).
Semantic match — if nothing matched, a small local embedding model (fastembed, BAAI/bge-small-en-v1.5, CPU) matches the phrasing to a capability by meaning, so novel wording still routes correctly.
Claude — the generalist fallback when nothing else fits.

Side-effecting intents (e.g. start recording the meeting) are keyword-only, so they can't be triggered by a loose semantic match.

Two ways in

Voice — say Aria's name (or use conversation mode) and ask. faster-whisper transcribes; Kokoro ONNX (af_heart) speaks the reply.
Typed — the terminal dashboard has an input bar; typed prompts go through the exact same router, useful when speech is misheard.
Screen-assist hotkey — Ctrl+Alt+A grabs the active window, then records one spoken question to answer about it.

Architecture

Input (voice / typed / hotkey)
        │
        ▼
core/router.py        ── keyword claims → semantic match → Claude fallback
        │
        ▼
capabilities/*        ── the skill that owns the matched intent runs
  vision · meeting · web · assistant · reasoning
        │
        ▼
voice/speaker.py      ── Kokoro ONNX synthesis + playback
core/terminal_ui.py   ── live dashboard, logs, insight queue

Cross-cutting pieces in core/: capability.py (interface + registry), semantic.py (embedding router), response_shaping.py (shared reply shaping), memory.py (SQLite: episodic, semantic, study_sessions, meetings), screen_capture.py (on-demand + rolling capture), vision_analyzer.py (Gemini), meeting_recorder.py (loopback + mic capture → transcript), personality.py.

Meeting transcription — how it captures audio

A normal microphone only hears you. The other people on a call come out of your speakers, so Aria also records the system loopback (what you hear) via the soundcard library, mixes it with your mic, and transcribes ~25-second chunks. While a meeting records, the voice assistant pauses (they share the microphone). If loopback can't be opened on your machine, Aria degrades to mic-only and says so in the log.

Transcripts are written to data/meetings/<timestamp>.md; summaries to *-notes.md.

Setup

Requirements

Windows 11, Python 3.13, PowerShell 7 recommended
NVIDIA GPU recommended for Kokoro ONNX CUDA
Anthropic API key — Claude reasoning + meeting summaries
Gemini API key — screen-assist / vision + web reasoning
soundcard (installed via requirements) for meeting loopback capture

Install

git clone https://github.com/chansg/aria.git
cd aria

python -m venv .venv
.\.venv\Scripts\Activate.ps1

python -m pip install -r requirements.txt
playwright install chromium

GPU note: fastembed pulls the CPU onnxruntime, which can shadow onnxruntime-gpu and drop Kokoro TTS off CUDA. If that happens, restore the GPU runtime: pip uninstall -y onnxruntime; pip install --force-reinstall --no-deps onnxruntime-gpu.

Configure

Copy-Item config.example.py config.py

Edit config.py locally (never commit real keys). Key settings:

ANTHROPIC_API_KEY = "..."
GEMINI_API_KEY = "..."

CONVERSATION_MODE_DEFAULT = True

# Screen-assist
SCREEN_ASSIST_ENABLED = True
SCREEN_ASSIST_HOTKEY  = "<ctrl>+<alt>+a"

# Meeting transcription
MEETING_TRANSCRIPTION_ENABLED = True
MEETING_CHUNK_SECONDS = 25
MEETING_OUTPUT_DIR = "data/meetings"

# Voice
TTS_PROVIDER = "kokoro"
KOKORO_VOICE = "af_heart"

Run

python main.py

On startup Aria registers her capabilities, warms up the semantic router, asks for a microphone, calibrates ambient noise, initialises memory, and opens the dashboard.

Try it

"Aria, what's wrong with line 6?" (with your SQL window focused) — or press Ctrl+Alt+A.
"Aria, start transcribing the meeting" … then "stop transcribing" … then "summarise the last meeting".
"Aria, what's the weather today?"

Validation

python -m pytest -q              # unit + routing + capability tests
python tools\run_validation.py   # structured session report

Operating principles

Small reviewed changes over broad speculative refactors.
Keep runtime failures visible in logs/aria.log — no silent fallbacks.
Add a test for every bug found through real use.
The core stays domain-agnostic; new skills are added as capability plugins at the edge.
Aria proposes; Chan reviews and approves.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Aria

What Aria is for

Code Explanation Mode

Capabilities

How routing works

Two ways in

Architecture

Meeting transcription — how it captures audio

Setup

Requirements

Install

Configure

Run

Try it

Validation

Operating principles

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
assets		assets
avatar		avatar
capabilities		capabilities
core		core
data		data
docs		docs
tests		tests
tools		tools
voice		voice
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
config.example.py		config.example.py
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Project Aria

What Aria is for

Code Explanation Mode

Capabilities

How routing works

Two ways in

Architecture

Meeting transcription — how it captures audio

Setup

Requirements

Install

Configure

Run

Try it

Validation

Operating principles

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages