A local-first AI assistant with multi-worker orchestration, tool execution, and document retrieval. Runs as a desktop application.
HelixAI is a desktop application that runs AI workloads locally. It routes user requests through a FastAPI orchestrator to specialized workers via Redis Streams, persists state in PostgreSQL, and uses Qdrant for vector storage.
The system decomposes requests into task DAGs. Each task specifies a required capability (llm, tool, rag, voice), and the scheduler routes it to the appropriate worker. Workers execute independently and report results back. This is actual workflow orchestration, not sequential prompt chaining.
┌─────────────────────────────────────────────────────────────┐
│ Electron Desktop App │
│ React + TypeScript + Tailwind CSS │
└─────────────────────────┬───────────────────────────────────┘
│ HTTP / SSE
▼
┌─────────────────────────────────────────────────────────────┐
│ FastAPI Orchestrator │
│ ┌──────────────┐ ┌────────────────┐ ┌────────────────┐ │
│ │ Intent │ │ Workflow │ │ DAG Scheduler │ │
│ │ Classifier │ │ Builder │ │ │ │
│ └──────────────┘ └────────────────┘ └────────────────┘ │
└─────────────────────────┬───────────────────────────────────┘
│
┌────────────────┼────────────────┐
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Redis │ │ PostgreSQL │ │ Qdrant │
│ Streams │ │ │ │ │
└──────┬──────┘ └─────────────┘ └─────────────┘
│
│ Capability-based routing
▼
┌─────────────────────────────────────────────────────────────┐
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌─────────────┐ │
│ │LLM Worker │ │Tool Worker│ │RAG Worker │ │Voice Worker │ │
│ │ (Ollama) │ │ │ │ (Qdrant) │ │ (Whisper) │ │
│ └───────────┘ └───────────┘ └───────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────┘
Multi-Worker Architecture
Four worker types (LLM, Tool, RAG, Voice) consume from capability-specific Redis Streams. Workers register, send heartbeats, and can be monitored independently.
DAG-Based Task Orchestration
The scheduler resolves task dependencies, propagates parent outputs to child tasks, and marks workflows complete when all tasks finish. Tasks transition through PENDING → QUEUED → RUNNING → COMPLETED states.
LLM Integration (Ollama)
Connects to local Ollama for inference. Supports multiple models with role-based routing (e.g., assign a coding model to code tasks). Streams tokens via Redis Pub/Sub.
Tool Execution
Implements web search (DuckDuckGo), web scraping, sandboxed Python execution, shell commands, and file operations (read/write/list/delete). Input validation and SSRF protection included.
Document Retrieval (RAG)
Chunks uploaded documents, embeds them with sentence-transformers, stores in Qdrant. Retrieves relevant context for queries.
Chat Persistence
Saves conversations to PostgreSQL. Sessions can be resumed from the sidebar.
Prometheus Metrics
Exposes task counts, queue depths, worker health, and API latency. Grafana dashboards included in the repo.
Long-Term Memory
Stores user facts and preferences in Qdrant. Retrieves and injects into prompts. Extraction heuristics are basic.
Vision Analysis
Routes images to vision-capable models (LLaVA). Upload and query flow works, but depends on Ollama vision model availability.
Voice Input/Output
Whisper for speech-to-text, Piper TTS or espeak for synthesis. Functional but latency and model loading are unoptimized.
Multi-Model Discussions
Multiple LLMs can debate or build consensus on a topic. Modes: debate, consensus, review, round-robin, expert panel. Feature is implemented but UI integration is minimal.
Workflow Checkpoints
API to save and restore workflow state exists. Resume logic is implemented but not heavily tested.
Plugin System
Plugins can add tools, workers, or integrations. Two example plugins exist (weather, GitHub). Loading and lifecycle management work.
Self-Correction
Detects task failures and can generate recovery plans. Wired into the system but not consistently triggered.
Frontend: Electron 28, React 18, TypeScript, Tailwind CSS, React Query, ReactFlow, Zustand
Backend: FastAPI, SQLAlchemy, PostgreSQL, Redis Streams, Pydantic
AI/ML: Ollama, Sentence Transformers, Qdrant, Whisper, Piper TTS
Observability: Prometheus, Grafana, structured JSON logging
- Tasks route by capability, not by hardcoded worker assignment
- Workers are separate processes with their own health endpoints
- DAG scheduler handles dependency resolution and output propagation
- Redis Streams provide durable, consumer-group-based task queuing
- Offline mode auto-detection adjusts available tools
- Desktop app bundles the full backend stack
Prerequisites: Docker, Python 3.11+, Node.js 18+, Ollama with a pulled model
./start.shThis starts PostgreSQL, Redis, Qdrant, the orchestrator, workers, and the desktop app.
Manual startup:
docker compose up -d
source venv/bin/activate
uvicorn orchestrator.main:app --reload
# In separate terminals:
python -m workers.llm_worker.llm_worker
python -m workers.tools_worker.tool_worker
python -m workers.rag_worker.rag_worker
python -m workers.voice_worker.voice_worker
cd desktop && npm run electron:devPersonal project. Actively developed. Focused on local-first AI agent orchestration as a reference implementation.
MIT
