- Install uv (Requires Python 3.10+):
pipx install uv- Clone the repository:
git clone <repository-url>
cd advanced-rag- Create and activate virtual environment using uv:
uv venv
.venv\Scripts\activate- Install dependencies using uv sync (reads from pyproject.toml):
uv pip sync pyproject.toml- Install and start Ollama:
- Download from Ollama's website
- Pull the Llama3.2 model:
ollama pull llama3.2Note: uv sync ensures exact dependency resolution from pyproject.toml, providing faster and more reliable package installation.
- Simple PDF document loading and parsing
- Basic text chunking
- Vector store using FAISS
- Question-answering with Llama2
- Enhanced document chunking with semantic boundaries
- BAAI/bge-large-en-v1.5 embeddings
- Contextual compression
- Source attribution and metadata tracking
- Similarity score filtering
- Custom prompt templates
- Multi-document context handling
- Place your PDFs in the
datafolder:
data/
├── document1.pdf
├── document2.pdf
└── document3.pdf
- Run a script for basic pdf based simple RAG. It returns answer to simple query "What is the main topic of the PDF documents?"
python .\pdf_rag.py
- Run a script for advanced pdf based RAG. It returns answer to query "who is Marcus Aurelius?"
python .\advanced_pdf_rag.py