A Retrieval-Augmented Generation (RAG) application that uses local language models via Ollama. This application allows you to:
- Chat with documents by uploading PDF files
- Get answers derived only from the content of your documents
- Configure retrieval parameters for better results
- Requirements
- Setup
- Docker Setup
- Launch
- Common Issues and Troubleshooting
- Examples
- Multilingual Support
- Ollama (for local LLM usage)
- Python 3.9+
- PyMuPDF (for PDF document loading)
- FAISS (for vector storage)
- Docker and Docker Compose (for containerized setup)
-
Install Ollama
- Download and install Ollama for your OS
- Verify installation with:
ollama help
-
Download a model
- Run:
ollama pull deepseek-r1:7b(or another compatible model) - Wait for the model to download
- Run:
-
Setup Python environment
# Clone the repository git clone https://github.com/yourusername/local-rag-ollama.git cd local-rag-ollama # Create and activate virtual environment python -m venv venv # For Linux/Mac source venv/bin/activate # For Windows .\venv\Scripts\activate # Install dependencies pip install -r requirements.txt
This application can be easily deployed using Docker and Docker Compose:
-
Clone the repository
git clone https://github.com/yourusername/local-rag-ollama.git cd local-rag-ollama -
Using helper scripts (recommended)
-
For Linux/Mac:
# Make the script executable chmod +x docker-start.sh # Run the helper script to build, start and pull the model ./docker-start.sh
-
For Windows (PowerShell):
# Run the PowerShell helper script .\docker-start.ps1
These scripts will:
- Check if Docker is installed and running
- Build and start the containers
- Pull the necessary model if it doesn't exist
- Provide instructions for accessing the application
-
-
Manual setup
-
Build and start containers:
docker-compose up -d
-
Pull the model:
docker-compose exec ollama ollama pull deepseek-r1:7b
-
-
Access the application
- Open your browser and navigate to http://localhost:8000
If you're using Windows, here are some specific tips:
- Ensure Docker Desktop for Windows is installed and running
- You may need to enable WSL2 (Windows Subsystem for Linux) during Docker Desktop installation
- If using the default CMD or PowerShell terminal, commands should work the same as shown above
- For file paths in volumes, you may need to use Windows-style paths with Docker Desktop
-
The
docker-compose.ymlincludes:- An Ollama service that runs the language model
- The RAG application service connected to Ollama
- GPU support for Ollama if available
- Health checks for both services
- Persistent volume for Ollama models
-
For GPU support:
- Ensure NVIDIA Container Toolkit is installed
- For Windows, use NVIDIA Container Runtime with Docker Desktop
- The Docker Compose configuration automatically detects and uses available GPUs
-
Environment variables:
OLLAMA_HOST: Set to "ollama" (the service name) for inter-container communicationPORT: Application port (default is 8000)
-
Start Ollama service
# In a separate terminal ollama serve -
Start the Chainlit application
# Basic usage chainlit run app.py # With custom port chainlit run app.py --port 8080
-
Access the application
- Open your browser and navigate to: http://localhost:8000 (or your custom port)
If you encounter errors related to PDF loading:
# Install PyMuPDF separately
pip install pymupdf==1.23.21If you see errors about dimension mismatch:
- This happens when the FAISS vector store was created with a different embedding model than currently used
- Select "Empty db" when starting the application to create a fresh database
- Adjust the "How similar should the pieces be?" slider to a lower value (around 0.1) to handle potential negative scores
Ollama embeddings can sometimes produce negative similarity scores. The application has been updated to handle this by:
- Setting a much lower score threshold (0.1 instead of 0.5)
- Using proper error handling to catch and recover from issues
If Ollama fails to load the model:
- Ensure Ollama is running (
ollama serveor Docker container is up) - Verify the model is downloaded (
ollama listor via Docker:docker-compose exec ollama ollama list) - Try a different model if needed (adjust in app.py)
-
Cannot connect to Ollama from app container:
- Check if the Ollama container is healthy:
docker-compose ps - Verify the model is downloaded:
docker-compose exec ollama ollama list - Check logs:
docker-compose logs ollama
- Check if the Ollama container is healthy:
-
Application container fails to start:
- Check logs:
docker-compose logs rag-app - Ensure Ollama container is running first
- Verify network connectivity between containers
- Check logs:
This application includes a novel approach to handling non-English queries:
- When a non-English query is received, the LLM generates a potential answer (hallucination)
- This hallucination is then used to search the vector database for relevant document sections
- If relevant sections are found, they are used to generate a proper response
This technique allows the application to work with queries in languages other than English, even when the embedding model only supports English.

