ATLAS is a multimodal desktop AI assistant with a conversational interface. It can understand and process audio, video, and text, and responds in real-time. It features a persistent memory and the ability to access the internet for up-to-date information.
- Multimodal Interaction: Communicate with ATLAS using your voice, camera, screen share, or text.
- Real-time Conversation: ATLAS processes information and responds in real-time for a natural conversational experience.
- Persistent Memory: ATLAS remembers past conversations, providing context for future interactions.
- Internet Access: ATLAS can search the internet to answer questions and provide current information.
- Desktop Integration: As a desktop application, ATLAS can access the screen and other system resources.
ATLAS is built with a Python backend and an Electron frontend, communicating via WebSockets.
The backend is a Python application that uses FastAPI for real-time communication and a variety of libraries for AI capabilities.
app.py: The main entry point for the backend, running a FastAPI server and providing a WebSocket endpoint for the frontend.Brain/RTC.py: The core of the backend, handling real-time communication. It processes audio from the microphone, video from the camera or screen, and text from the user. It uses the Google Gemini API for its multimodal AI capabilities.Brain/deepagent.py&Brain/subagents.py: Implement a "deep agent" architecture using thedeepagentslibrary. This allows for specialized sub-agents, such as a "researcher" that can access the internet.Brain/RAG.py: Implements Retrieval-Augmented Generation (RAG) using ChromaDB. This gives ATLAS a persistent memory by storing and retrieving chat history.Tools/: Contains tools that can be used by the agents.tavily.py: A tool for searching the internet using the Tavily API.
The frontend is an Electron application that provides the user interface for ATLAS.
frontend/main.js: The main process for the Electron application. It creates the application window and handles system-level interactions like screen capture. It also runs a local HTTP server to receive state updates from the backend.frontend/renderer.js: The user interface logic. It handles user input from the microphone, camera, and text input. It communicates with the backend via theGeminiClient.frontend/gemini-client.js: A WebSocket client that handles the real-time communication with the Python backend.
- Backend: Python, FastAPI, WebSockets, Google Generative AI (Gemini), LangChain,
deepagents(for agent-based architecture), ChromaDB, Tavily API - Frontend: Electron, JavaScript, HTML, CSS
- Database: ChromaDB (for vector storage)
- Python
- uv (Python Package manager)
- Node.js 24+
- An
.envfile with the following keys:GEMINI_API_KEYTAVILY_API_KEY
- Backend:
uv sync
- Frontend:
cd frontend npm install
- Start the backend:
uv run app.py
- Text Input: Type a message in the input box and press Enter to send it.
- Microphone: Click the microphone icon to start and stop recording your voice.
- Camera: Click the camera icon to turn your camera on and off.
- Screen Share: Click the screen share icon to start and stop sharing your screen.
- State Indicator: The sphere in the middle of the screen indicates the application's current state (e.g., listening, thinking, speaking).
- Connection Status: The pill in the top right corner shows the connection status to the backend.