Aria — Voice-First AI Assistant

Aria is an accessibility-focused voice AI designed primarily for blind and visually impaired users. It listens to your voice, reasons about what you need, and acts — browsing the web, controlling your Mac desktop, writing code, or creating notes — all through natural spoken conversation.

What It Does

Aria is a full-duplex voice agent. You speak, it transcribes your speech in real time, routes your intent to the best specialized AI agent, and speaks back a natural response — all with sub-second latency.

Specialized Agents

Agent	What it does
Commerce	Browses the web, searches for products, compares prices and ratings, and can add items to cart and check out on real e-commerce sites
General	Navigates any website, reads and summarizes articles, answers factual questions, searches the web
Coding	Reads, writes, and edits files; runs shell commands; debugs errors; searches the codebase — full developer assistant by voice
Desktop	Controls your Mac via screenshots + mouse/keyboard using Anthropic's computer-use API — opens apps, clicks buttons, navigates UIs
Documentation	Creates notes in Apple Notes by controlling the macOS desktop — dictate, it types

A LangGraph supervisor classifies every voice request and routes it to the right agent automatically.

Architecture

Microphone → Deepgram STT → Socket.io → LangGraph Supervisor
                                               ↓
                          ┌─────────────────────────────────────┐
                          │  Commerce · General · Coding        │
                          │  Desktop · Documentation            │
                          └─────────────────────────────────────┘
                                               ↓
                               Cartesia TTS → Speaker

Frontend: Electron desktop app (React + Vite) with a glassmorphism UI, and a Next.js web client
Backend: Node.js + Express + Socket.io server
STT: Deepgram nova-3 (real-time streaming)
TTS: Cartesia (WebSocket streaming, low latency)
LLM: OpenAI GPT-4o / Anthropic Claude / Ollama (local) — configurable via env vars
Agent orchestration: LangGraph + LangChain
Browser automation: Stagehand
Desktop control: macOS AppleScript + JXA (for the Desktop & Documentation agents)
Memory / Knowledge base: Elasticsearch (user profiles, browsing history, past interactions)
AgentVerse: Python adapter to register Aria as a uAgents-compatible agent

Project Structure

aria/
├── server/          # Node.js backend (Express, Socket.io, all agents)
├── client/          # Next.js web frontend
├── electron/        # Electron desktop app
├── visualizer/      # Audio visualizer component
├── agentverse/      # Python uAgents adapter
└── docs/            # Architecture documentation

Getting Started

Prerequisites

Node.js 18+
Python 3.10+ (for AgentVerse adapter)
macOS (Desktop and Documentation agents require macOS)

Environment Variables

Create server/.env:

# LLM — at least one required
OPENAI_API_KEY=
ANTHROPIC_API_KEY=
OLLAMA_MODEL=           # optional: local model name (e.g. llama3)

# Voice
DEEPGRAM_API_KEY=
CARTESIA_API_KEY=
CARTESIA_VOICE_ID=

# Knowledge base (optional)
ELASTIC_CLOUD_ID=
ELASTIC_API_KEY=

# Dev
PORT=3001
TEST=false              # set to true to skip Stagehand + Elasticsearch init

Install & Run

# Server
cd server && npm install && npm run dev

# Web client
cd client && npm install && npm run dev

# Electron desktop app
cd electron && npm install && npm run dev

# Audio visualizer
cd visualizer && npm run dev:electron

# AgentVerse adapter (Python)
cd agentverse && pip install -r requirements.txt && python register.py

Key Features

Real-time voice: Streaming STT + TTS with interim results so the assistant feels instant
Multi-agent routing: LangGraph supervisor classifies intent and picks the right agent — no manual switching
Browser control: Stagehand-powered browser for real shopping, real navigation, real actions
Computer use: Anthropic's computer-use model lets Aria take screenshots and control any macOS app
Persistent memory: Elasticsearch stores user profiles, shopping history, and conversation context
Accessibility-first: Every agent's prompts are tuned for voice output — no markdown, no lists, natural spoken language
Graceful degradation: Works without Elasticsearch or Stagehand if those services are unavailable

Tech Stack

Frontend: React, TypeScript, Vite, Electron, Next.js, Tailwind CSS
Backend: Node.js, TypeScript, Express, Socket.io
AI: OpenAI GPT-4o, Anthropic Claude 3, LangGraph, LangChain
Voice: Deepgram (STT), Cartesia (TTS)
Automation: Stagehand, AppleScript, JXA
Data: Elasticsearch
Other: Python, uAgents (Fetch.ai AgentVerse)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
agentverse		agentverse
banner		banner
client		client
docs		docs
electron		electron
server		server
visualizer		visualizer
.gitignore		.gitignore
ELECTRON_README.md		ELECTRON_README.md
PHASES.md		PHASES.md
README.md		README.md
agent-S-dev-plan.md		agent-S-dev-plan.md
architecture.txt		architecture.txt
package-lock.json		package-lock.json
package.json		package.json
visionimplementationplan.md		visionimplementationplan.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Aria — Voice-First AI Assistant

What It Does

Specialized Agents

Architecture

Project Structure

Getting Started

Prerequisites

Environment Variables

Install & Run

Key Features

Tech Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Aria — Voice-First AI Assistant

What It Does

Specialized Agents

Architecture

Project Structure

Getting Started

Prerequisites

Environment Variables

Install & Run

Key Features

Tech Stack

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages