Skip to content

chansg/aria

Repository files navigation

Project Aria

Aria is a Windows-first personal AI assistant and teacher. She listens, sees your screen, researches, and remembers — built to help you learn and work, not to be a black-box agent. Every meaningful action stays observable, testable, and under your control.

Aria proposes. Chan decides.

What Aria is for

Two flagship use cases drive the project:

  1. Meeting transcription for study — record a Teams (or any) meeting; Aria captures the audio, transcribes it locally with Whisper, saves a timestamped transcript, and can summarise it into study notes on demand. Review meetings by reference later.
  2. On-screen study help — ask about whatever is on your screen. "What's wrong with line 6?" while a SQL statement is in front of you → Aria grabs the active window, looks at it, and explains it in plain language. Great for SC-200 material, code, errors, and queries.

Plus the everyday assistant basics: time/date/reminders, web research, weather, and open-ended conversation with memory.

Code Explanation Mode

Point Aria at any code on screen and ask her to teach it (not just debug it).

Say: "Aria, explain this code" (or "walk me through this", "break this down").

Aria captures the active window and a senior-to-junior explanation is produced, covering: what the code does, a walkthrough with the reasoning behind each part, the key concepts to learn, and common mistakes to watch for. A short summary is spoken aloud, and the full explanation is saved to data/explanations/ as a timestamped Markdown file so you can build a personal study library and read explanations back later. Each one is also logged to study_sessions with a detected language tag (SQL, Python, …).

Teaching phrases ("explain this code") route here; debugging phrases ("what's wrong with line 6") stay with screen-assist.

Note on history: Aria previously included a finance/Trading212/market research direction. That has been removed — Aria is now focused on being an assistant and teacher. The old code remains in git history if you ever need it.

Capabilities

Aria's skills are self-registering plugins under capabilities/. The core router stays domain-agnostic and simply dispatches to whichever capability claims a request.

Capability What it does
vision Screen-assist: captures the active window and explains/evaluates what's on it (Gemini). Logs every Q&A to study_sessions.
code_explanation Teaching mode: senior-to-junior explanation of on-screen code, saved to data/explanations/ for study.
meeting Start/stop meeting transcription; summarise a saved meeting into study notes (Claude).
web Web search, weather, and current-event follow-ups (Gemini + web scrape).
assistant Time, date, reminders/calendar, queued notifications, analysis-mode toggle.
reasoning The Claude generalist — handles anything no other capability claims, plus the offline Ollama fallback.

How routing works

For each request the router runs three layers, cheapest first:

  1. Keyword claims — each capability offers a deterministic match (with a priority).
  2. Semantic match — if nothing matched, a small local embedding model (fastembed, BAAI/bge-small-en-v1.5, CPU) matches the phrasing to a capability by meaning, so novel wording still routes correctly.
  3. Claude — the generalist fallback when nothing else fits.

Side-effecting intents (e.g. start recording the meeting) are keyword-only, so they can't be triggered by a loose semantic match.

Two ways in

  • Voice — say Aria's name (or use conversation mode) and ask. faster-whisper transcribes; Kokoro ONNX (af_heart) speaks the reply.
  • Typed — the terminal dashboard has an input bar; typed prompts go through the exact same router, useful when speech is misheard.
  • Screen-assist hotkeyCtrl+Alt+A grabs the active window, then records one spoken question to answer about it.

Architecture

Input (voice / typed / hotkey)
        │
        ▼
core/router.py        ── keyword claims → semantic match → Claude fallback
        │
        ▼
capabilities/*        ── the skill that owns the matched intent runs
  vision · meeting · web · assistant · reasoning
        │
        ▼
voice/speaker.py      ── Kokoro ONNX synthesis + playback
core/terminal_ui.py   ── live dashboard, logs, insight queue

Cross-cutting pieces in core/: capability.py (interface + registry), semantic.py (embedding router), response_shaping.py (shared reply shaping), memory.py (SQLite: episodic, semantic, study_sessions, meetings), screen_capture.py (on-demand + rolling capture), vision_analyzer.py (Gemini), meeting_recorder.py (loopback + mic capture → transcript), personality.py.

Meeting transcription — how it captures audio

A normal microphone only hears you. The other people on a call come out of your speakers, so Aria also records the system loopback (what you hear) via the soundcard library, mixes it with your mic, and transcribes ~25-second chunks. While a meeting records, the voice assistant pauses (they share the microphone). If loopback can't be opened on your machine, Aria degrades to mic-only and says so in the log.

Transcripts are written to data/meetings/<timestamp>.md; summaries to *-notes.md.

Setup

Requirements

  • Windows 11, Python 3.13, PowerShell 7 recommended
  • NVIDIA GPU recommended for Kokoro ONNX CUDA
  • Anthropic API key — Claude reasoning + meeting summaries
  • Gemini API key — screen-assist / vision + web reasoning
  • soundcard (installed via requirements) for meeting loopback capture

Install

git clone https://github.com/chansg/aria.git
cd aria

python -m venv .venv
.\.venv\Scripts\Activate.ps1

python -m pip install -r requirements.txt
playwright install chromium

GPU note: fastembed pulls the CPU onnxruntime, which can shadow onnxruntime-gpu and drop Kokoro TTS off CUDA. If that happens, restore the GPU runtime: pip uninstall -y onnxruntime; pip install --force-reinstall --no-deps onnxruntime-gpu.

Configure

Copy-Item config.example.py config.py

Edit config.py locally (never commit real keys). Key settings:

ANTHROPIC_API_KEY = "..."
GEMINI_API_KEY = "..."

CONVERSATION_MODE_DEFAULT = True

# Screen-assist
SCREEN_ASSIST_ENABLED = True
SCREEN_ASSIST_HOTKEY  = "<ctrl>+<alt>+a"

# Meeting transcription
MEETING_TRANSCRIPTION_ENABLED = True
MEETING_CHUNK_SECONDS = 25
MEETING_OUTPUT_DIR = "data/meetings"

# Voice
TTS_PROVIDER = "kokoro"
KOKORO_VOICE = "af_heart"

Run

python main.py

On startup Aria registers her capabilities, warms up the semantic router, asks for a microphone, calibrates ambient noise, initialises memory, and opens the dashboard.

Try it

  • "Aria, what's wrong with line 6?" (with your SQL window focused) — or press Ctrl+Alt+A.
  • "Aria, start transcribing the meeting" … then "stop transcribing" … then "summarise the last meeting".
  • "Aria, what's the weather today?"

Validation

python -m pytest -q              # unit + routing + capability tests
python tools\run_validation.py   # structured session report

Operating principles

  • Small reviewed changes over broad speculative refactors.
  • Keep runtime failures visible in logs/aria.log — no silent fallbacks.
  • Add a test for every bug found through real use.
  • The core stays domain-agnostic; new skills are added as capability plugins at the edge.
  • Aria proposes; Chan reviews and approves.

About

Windows-first personal AI assistant for voice, screen context, memory, and finance research.

Topics

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages