Generative AI systems, like large language models (LLMs), are very effective at producing text responses. However, their outputs are often limited to the knowledge present in their training data, which can be weeks, months, or even years out of date. This may result in outdated or incorrect information, especially in a corporate context where specific knowledge about products or services is required.
Retrieval-Augmented Generation (RAG) addresses this limitation by combining an LLM with targeted external data. RAG allows the AI to generate responses based not only on its pre-trained knowledge but also on up-to-date, domain-specific information. This ensures that answers are more accurate, contextually relevant, and grounded in the latest available data.
The RoT RAG Project implements a Python-based RAG pipeline. Documents are ingested and split into chunks, then embedded using Hugging Face models and indexed with FAISS for fast semantic search.
A FastAPI backend exposes an API for querying the indexed documents using a local Hugging Face LLM for generating responses. The project includes example scripts for index building and query testing, and is fully Docker-ready for deployment. By integrating RAG, the system ensures that LLM-generated responses are enhanced with the most relevant and up-to-date information from the provided documents.
ROT-RAG-PROJECT/
├── data/ # Documents and generated artifacts
│ ├── chunks.json # Generated text chunks
│ ├── faiss_index.bin # FAISS index file
| ├── user_manual.pdf # Main source
| ├── query_result.json # Sample query results
| ├── test_cases.json # questions & expected answers
│ ├── sample.pdf # Example PDF document
│ └── test.txt # Example TXT document
│
├── rag/ # FastAPI application code (API layer)
│ ├── app.py # FastAPI app (serves RAG pipeline via API)
│ ├── llm_wrapper.py # LLM interface
│ ├── query_faiss.py # Query FAISS index
| └── test_llm_query.py # Example script to test LLM with retrieved passages
|
├── src/ # Scripts for ingestion, embedding, evaluation
│ ├── embed_faiss.py # Build FAISS index from chunks
│ ├── ingest.py # Process documents into chunks
│ └── eval_rag.py # Evaluate retrieval quality with test cases
│
├── tests/ # Unit / integration tests
│ ├── performance/
│ | ├── test_faiss_speed.py # Performance testing for FAISS
│ | └── test_chunks.json # Test chunks for benchmarking
| |
| └── quick_test.py # Minimal test for the RAG pipeline
|
├── test_rag.py # Root-level API test via HTTP (requests)
|
├── .gitignore
├── .dockerignore
├── Dockerfile # Containerization support
├── LICENSE
├── pyproject.toml
├── README.md # Project documentation
└── requirements.txt # Python dependencies
To run this project, make sure you have the following installed:
- Python 3.13.7
- CUDA 12.6 (Optional: Required for GPU acceleration)
git clone https://github.com/zbilgeozkan/rot-rag-project.git
cd rot-rag-projectpython -m venv .venv
source .venv/bin/activate # On Linux/Mac
.venv\Scripts\activate # On Windowspip install -r requirements.txtPlace your .pdf or .txt files inside the data/ directory.
If you have a CUDA-compatible GPU, the pipeline will automatically use it for embeddings and LLM inference. No API key is needed for local Hugging Face models.
Create a .env file with your API key:
echo "OPENAI_API_KEY=your_api_key_here" > .envBefore running the pipeline, specify the document(s) you want to process inside ingest.py:
files = ["data/user_manual.pdf"] # Also add TXT files as comma-separated values
all_chunks = ingest_all(files, file_type="pdf", debug=False) # "pdf", "txt" or "both"Replace "data/user_manual.pdf" with the path to your own PDF or TXT document.
You can add multiple documents, e.g.:
files = ["data/manual1.pdf", "data/notes.txt"]Adjust the file_type parameter depending on your input ("pdf", "txt", or "both").
python src/ingest.pySplits documents into chunks and saves them in: data/chunks.json.
python src/embed_faiss.pyCreates the FAISS index: data/faiss_index.bin and metadata in data/faiss_metadata.json.
python rag/query_faiss.pyThis script queries the FAISS index and saves results in data/query_result.json.
You can modify the query string inside query_faiss.py:
results = faiss_query.query("Your question here")- Example output:
[
{
"text": "Example content from document...",
"source": "sample.pdf",
"page": 2,
"title": "Example Title",
"distance": 0.12345
}
]python rag/llm_wrapper.pypython rag/test_llm_query.pyThis script:
-
Pulls top-k passages from FAISS for your question.
-
Gerates a detailed answer using the local Hugging Face LLM.
-
Prints the answer in the console.
- Example output:
Question: How do I wear the Gear VR headset?
Answer:
1. Align your face and the foam cushion, and put on the Gear VR, being cautious not to walk or drive while wearing it.
2. Secure the Gear VR to your head with the straps and place it comfortably over your face.
3. Adjust the length of the top head strap and the main strap to ensure the headset is properly adjusted for your comfort.
4. Check for any discomfort or screen tilt by adjusting the Gear VR if needed, and be aware of your surroundings to avoid injury to yourself or others.
5. If you need to remove the Gear VR for any reason, wait 5-7 seconds before using it again to prevent damage to the headset.
Tips:
- Do not place objects on the proximity sensor while the Gear VR is not in use, as this may drain the battery.
- Always read and follow the set up and operating instructions provided with the Gear VR.
- Adjust the Gear VR for each individual user and calibrate it using the configuration software (if available) before starting a virtual reality experience.uvicorn rag.app:app --reloadThen open http://127.0.0.1:8000/docs to test the API endpoints.
docker build -t rot-rag-project .
docker run -p 8000:8000 --env-file .env rot-rag-project-
Replace
sample.pdfandtest.txtwith your own content for meaningful results. -
Default embedding model:
all-mpnet-base-v2(or any other HF model you prefer). -
LLM answers are generated locally using Hugging Face models.
- If you want to optionally use an external API (like OpenAI), you can do so by adding your API key in the
.envfile.
- If you want to optionally use an external API (like OpenAI), you can do so by adding your API key in the
-
Supports GPU acceleration if available. For large datasets, consider using
faiss-gpu. -
You can also run this project in a container using the provided Dockerfile.
This project is released under the MIT License.
It uses several third-party frameworks, libraries, and AI models.
Full license texts for these dependencies are available in the /THIRD_PARTY_LICENSES folder.
| Library / Component | License | Source |
|---|---|---|
| Python Standard Libraries (os, re, json, sys, pathlib, collections) | Python Software Foundation License | Python |
| NumPy | BSD-3-Clause | NumPy |
| SciPy | BSD-3-Clause | SciPy |
| Scikit-learn | BSD-3-Clause | scikit-learn |
| PyTorch (torch) | BSD-3-Clause | PyTorch |
| sentence-transformers | Apache-2.0 | sentence-transformers |
| transformers | Apache-2.0 | HuggingFace Transformers |
| tokenizers | Apache-2.0 | HuggingFace Tokenizers |
| huggingface-hub | Apache-2.0 | HuggingFace Hub |
| faiss | MIT | FAISS |
| PyPDF2 | BSD-3-Clause | PyPDF2 |
| fastapi | MIT | FastAPI |
| uvicorn | BSD-3-Clause | uvicorn |
| pydantic | MIT | pydantic |
| requests | Apache-2.0 | requests |
| tqdm | MPL-2.0 | tqdm |
| safetensors | Apache-2.0 | safetensors |
| google/flan-t5-base (LLM model) | Apache-2.0 | HuggingFace Model |
| all-MiniLM-L6-v2 (embedding model) | Apache-2.0 | HuggingFace Model |

