A privacy-first, fully local document question-answering system built with FastAPI and Ollama.
A local RAG system that ingests documents at runtime and answers questions using a locally hosted LLM.
Local AI Agent is a local-first Retrieval-Augmented Generation (RAG) system that allows you to upload documents and ask questions against them using a fully local AI stack.
- Documents are uploaded at runtime and chunked safely
- Text is embedded using a local embedding model
- Relevant context is retrieved using vector similarity
- Answers are generated using a locally hosted LLM
- No external APIs or cloud services are required
This version (v1.0) provides a stable backend API and a lightweight HTML frontend.
- Python 3.10+
- Ollama installed and running
- Recommended: 8 GB RAM or more
- Pull required Ollama models
ollama pull llama3.1
ollama pull nomic-embed-text- Create and activate a virtual environment
python3 -m venv venv
source venv/bin/activate- Install Python dependencies
pip install -r requirements.txt- Start the backend
python -m uvicorn main:app --reload- Serve the frontend
python3 -m http.server 3000- Open the app
http://localhost:3000/universal_frontend.htmlNote: The backend must be running in the activated virtual environment.
- Documents are uploaded at runtime via the web interface
- The backend exposes a FastAPI-based REST API
- All processing is performed locally
- Python
- LangChain (retrieval and orchestration)
- Chroma (vector storage)
- Ollama (local LLM runtime)