RoT RAG Project

Generative AI systems, like large language models (LLMs), are very effective at producing text responses. However, their outputs are often limited to the knowledge present in their training data, which can be weeks, months, or even years out of date. This may result in outdated or incorrect information, especially in a corporate context where specific knowledge about products or services is required.

Retrieval-Augmented Generation (RAG) addresses this limitation by combining an LLM with targeted external data. RAG allows the AI to generate responses based not only on its pre-trained knowledge but also on up-to-date, domain-specific information. This ensures that answers are more accurate, contextually relevant, and grounded in the latest available data.

About RoT RAG Project

The RoT RAG Project implements a Python-based RAG pipeline. Documents are ingested and split into chunks, then embedded using Hugging Face models and indexed with FAISS for fast semantic search.

A FastAPI backend exposes an API for querying the indexed documents using a local Hugging Face LLM for generating responses. The project includes example scripts for index building and query testing, and is fully Docker-ready for deployment. By integrating RAG, the system ensures that LLM-generated responses are enhanced with the most relevant and up-to-date information from the provided documents.

Project Structure

ROT-RAG-PROJECT/
├── data/                    # Documents and generated artifacts
│ ├── chunks.json            # Generated text chunks
│ ├── faiss_index.bin        # FAISS index file
| ├── user_manual.pdf        # Main source
| ├── query_result.json      # Sample query results
| ├── test_cases.json        # questions & expected answers
│ ├── sample.pdf             # Example PDF document
│ └── test.txt               # Example TXT document
│
├── rag/                     # FastAPI application code (API layer)
│   ├── app.py                     # FastAPI app (serves RAG pipeline via API)
│   ├── llm_wrapper.py             # LLM interface
│   ├── query_faiss.py             # Query FAISS index
|   └── test_llm_query.py          # Example script to test LLM with retrieved passages
|
├── src/                     # Scripts for ingestion, embedding, evaluation
│ ├── embed_faiss.py               # Build FAISS index from chunks
│ ├── ingest.py                    # Process documents into chunks
│ └── eval_rag.py                  # Evaluate retrieval quality with test cases
│
├── tests/                   # Unit / integration tests
│   ├── performance/
│   |   ├── test_faiss_speed.py    # Performance testing for FAISS
│   |   └── test_chunks.json       # Test chunks for benchmarking
|   |   
|   └── quick_test.py           # Minimal test for the RAG pipeline
|
├── test_rag.py              # Root-level API test via HTTP (requests)
|
├── .gitignore
├── .dockerignore
├── Dockerfile               # Containerization support
├── LICENSE
├── pyproject.toml
├── README.md                # Project documentation
└── requirements.txt         # Python dependencies

Setup Instructions

To run this project, make sure you have the following installed:

Python 3.13.7
CUDA 12.6 (Optional: Required for GPU acceleration)

1. Clone the repository

git clone https://github.com/zbilgeozkan/rot-rag-project.git
cd rot-rag-project

2. (Optional) Create a virtual environment

python -m venv .venv
source .venv/bin/activate   # On Linux/Mac
.venv\Scripts\activate      # On Windows

3. Install dependencies

pip install -r requirements.txt

4. Add your documents

Place your .pdf or .txt files inside the data/ directory.

(Optional) Configure GPU

If you have a CUDA-compatible GPU, the pipeline will automatically use it for embeddings and LLM inference. No API key is needed for local Hugging Face models.

(Optional) Add OpenAI API Key

Create a .env file with your API key:

echo "OPENAI_API_KEY=your_api_key_here" > .env

Usage

Before running the pipeline, specify the document(s) you want to process inside ingest.py:

files = ["data/user_manual.pdf"]  # Also add TXT files as comma-separated values
all_chunks = ingest_all(files, file_type="pdf", debug=False)  # "pdf", "txt" or "both"

Replace "data/user_manual.pdf" with the path to your own PDF or TXT document.

You can add multiple documents, e.g.:

files = ["data/manual1.pdf", "data/notes.txt"]

Adjust the file_type parameter depending on your input ("pdf", "txt", or "both").

Step 1: Ingest documents

python src/ingest.py

Splits documents into chunks and saves them in: data/chunks.json.

Step 2: Build FAISS index

python src/embed_faiss.py

Creates the FAISS index: data/faiss_index.bin and metadata in data/faiss_metadata.json.

Step 3: (Optional) Query the index -> FAISS-only

python rag/query_faiss.py

This script queries the FAISS index and saves results in data/query_result.json.

You can modify the query string inside query_faiss.py:

results = faiss_query.query("Your question here")

Example output:

[
  {
    "text": "Example content from document...",
    "source": "sample.pdf",
    "page": 2,
    "title": "Example Title",
    "distance": 0.12345
  }
]

Step 4: Test

python rag/llm_wrapper.py

Step 5: Test LLM answers with retrieved passages

python rag/test_llm_query.py

This script:

Pulls top-k passages from FAISS for your question.
Gerates a detailed answer using the local Hugging Face LLM.
Prints the answer in the console.

Example output:

Question: How do I wear the Gear VR headset?

Answer:
1. Align your face and the foam cushion, and put on the Gear VR, being cautious not to walk or drive while wearing it.
2. Secure the Gear VR to your head with the straps and place it comfortably over your face.
3. Adjust the length of the top head strap and the main strap to ensure the headset is properly adjusted for your comfort.
4. Check for any discomfort or screen tilt by adjusting the Gear VR if needed, and be aware of your surroundings to avoid injury to yourself or others.
5. If you need to remove the Gear VR for any reason, wait 5-7 seconds before using it again to prevent damage to the headset.

Tips:
- Do not place objects on the proximity sensor while the Gear VR is not in use, as this may drain the battery.
- Always read and follow the set up and operating instructions provided with the Gear VR.
- Adjust the Gear VR for each individual user and calibrate it using the configuration software (if available) before starting a virtual reality experience.

Step 6: Run the API

uvicorn rag.app:app --reload

Then open http://127.0.0.1:8000/docs to test the API endpoints.

Step 7: (Optional) Run in Docker

docker build -t rot-rag-project .
docker run -p 8000:8000 --env-file .env rot-rag-project

Notes

Replace sample.pdf and test.txt with your own content for meaningful results.
Default embedding model: all-mpnet-base-v2 (or any other HF model you prefer).
LLM answers are generated locally using Hugging Face models.
- If you want to optionally use an external API (like OpenAI), you can do so by adding your API key in the .env file.
Supports GPU acceleration if available. For large datasets, consider using faiss-gpu.
You can also run this project in a container using the provided Dockerfile.

Dependencies and Licenses

This project is released under the MIT License.
It uses several third-party frameworks, libraries, and AI models.
Full license texts for these dependencies are available in the /THIRD_PARTY_LICENSES folder.

Library / Component	License	Source
Python Standard Libraries (os, re, json, sys, pathlib, collections)	Python Software Foundation License	Python
NumPy	BSD-3-Clause	NumPy
SciPy	BSD-3-Clause	SciPy
Scikit-learn	BSD-3-Clause	scikit-learn
PyTorch (torch)	BSD-3-Clause	PyTorch
sentence-transformers	Apache-2.0	sentence-transformers
transformers	Apache-2.0	HuggingFace Transformers
tokenizers	Apache-2.0	HuggingFace Tokenizers
huggingface-hub	Apache-2.0	HuggingFace Hub
faiss	MIT	FAISS
PyPDF2	BSD-3-Clause	PyPDF2
fastapi	MIT	FastAPI
uvicorn	BSD-3-Clause	uvicorn
pydantic	MIT	pydantic
requests	Apache-2.0	requests
tqdm	MPL-2.0	tqdm
safetensors	Apache-2.0	safetensors
google/flan-t5-base (LLM model)	Apache-2.0	HuggingFace Model
all-MiniLM-L6-v2 (embedding model)	Apache-2.0	HuggingFace Model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RoT RAG Project

About RoT RAG Project

Project Structure

Setup Instructions

1. Clone the repository

2. (Optional) Create a virtual environment

3. Install dependencies

4. Add your documents

(Optional) Configure GPU

(Optional) Add OpenAI API Key

Usage

Step 1: Ingest documents

Step 2: Build FAISS index

Step 3: (Optional) Query the index -> FAISS-only

Step 4: Test

Step 5: Test LLM answers with retrieved passages

Step 6: Run the API

Step 7: (Optional) Run in Docker

Notes

Dependencies and Licenses

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
THIRD_PARTY_LICENSES		THIRD_PARTY_LICENSES
assets		assets
data		data
rag		rag
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
test_rag.py		test_rag.py

Folders and files

Latest commit

History

Repository files navigation

RoT RAG Project

About RoT RAG Project

Project Structure

Setup Instructions

1. Clone the repository

2. (Optional) Create a virtual environment

3. Install dependencies

4. Add your documents

(Optional) Configure GPU

(Optional) Add OpenAI API Key

Usage

Step 1: Ingest documents

Step 2: Build FAISS index

Step 3: (Optional) Query the index -> FAISS-only

Step 4: Test

Step 5: Test LLM answers with retrieved passages

Step 6: Run the API

Step 7: (Optional) Run in Docker

Notes

Dependencies and Licenses

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages