Skip to content

AmanatAliPanhwer/ATLAS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ATLAS

ATLAS is a multimodal desktop AI assistant with a conversational interface. It can understand and process audio, video, and text, and responds in real-time. It features a persistent memory and the ability to access the internet for up-to-date information.

Features

  • Multimodal Interaction: Communicate with ATLAS using your voice, camera, screen share, or text.
  • Real-time Conversation: ATLAS processes information and responds in real-time for a natural conversational experience.
  • Persistent Memory: ATLAS remembers past conversations, providing context for future interactions.
  • Internet Access: ATLAS can search the internet to answer questions and provide current information.
  • Desktop Integration: As a desktop application, ATLAS can access the screen and other system resources.

Architecture

ATLAS is built with a Python backend and an Electron frontend, communicating via WebSockets.

Backend

The backend is a Python application that uses FastAPI for real-time communication and a variety of libraries for AI capabilities.

  • app.py: The main entry point for the backend, running a FastAPI server and providing a WebSocket endpoint for the frontend.
  • Brain/RTC.py: The core of the backend, handling real-time communication. It processes audio from the microphone, video from the camera or screen, and text from the user. It uses the Google Gemini API for its multimodal AI capabilities.
  • Brain/deepagent.py & Brain/subagents.py: Implement a "deep agent" architecture using the deepagents library. This allows for specialized sub-agents, such as a "researcher" that can access the internet.
  • Brain/RAG.py: Implements Retrieval-Augmented Generation (RAG) using ChromaDB. This gives ATLAS a persistent memory by storing and retrieving chat history.
  • Tools/: Contains tools that can be used by the agents.
    • tavily.py: A tool for searching the internet using the Tavily API.

Frontend

The frontend is an Electron application that provides the user interface for ATLAS.

  • frontend/main.js: The main process for the Electron application. It creates the application window and handles system-level interactions like screen capture. It also runs a local HTTP server to receive state updates from the backend.
  • frontend/renderer.js: The user interface logic. It handles user input from the microphone, camera, and text input. It communicates with the backend via the GeminiClient.
  • frontend/gemini-client.js: A WebSocket client that handles the real-time communication with the Python backend.

Key Technologies

  • Backend: Python, FastAPI, WebSockets, Google Generative AI (Gemini), LangChain, deepagents (for agent-based architecture), ChromaDB, Tavily API
  • Frontend: Electron, JavaScript, HTML, CSS
  • Database: ChromaDB (for vector storage)

Getting Started

Prerequisites

  • Python
  • uv (Python Package manager)
  • Node.js 24+
  • An .env file with the following keys:
    • GEMINI_API_KEY
    • TAVILY_API_KEY

Installation

  1. Backend:
    uv sync
  2. Frontend:
    cd frontend
    npm install

Running the Application

  1. Start the backend:
    uv run app.py

Usage

  • Text Input: Type a message in the input box and press Enter to send it.
  • Microphone: Click the microphone icon to start and stop recording your voice.
  • Camera: Click the camera icon to turn your camera on and off.
  • Screen Share: Click the screen share icon to start and stop sharing your screen.
  • State Indicator: The sphere in the middle of the screen indicates the application's current state (e.g., listening, thinking, speaking).
  • Connection Status: The pill in the top right corner shows the connection status to the backend.

About

A multimodal desktop AI assistant, built with Electron and Python.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Contributors