This project implements a Retrieval-Augmented Generation (RAG) based conversational agent that answers user questions strictly grounded in a provided PDF document.
The system retrieves relevant context from the document and generates answers only from that context, with clear citations.
- PDF ingestion and page-wise text extraction
- Text chunking with page and chunk-level metadata
- Vector-based retrieval using FAISS
- Conversational (multi-turn) Q&A
- Grounded answers with citations (page / chunk references)
- Explicit refusal when information is not present in the document
- Retrieval visibility for debugging (top-k chunks + scores)
- Python 3.10+
- Gemini API (for answer generation)
- FAISS (vector index)
- LangChain utilities (chunking & retrieval)
- PyPDF (PDF text extraction)
├── main.py ├── ingest.py ├── rag.py ├── llm.py ├── requirements.txt ├── README.md └── .env
pip install -r requirements.txt
Create a .env file or set the variable directly.
Windows
setx GEMINI_API_KEY your_api_key_here
Mac / Linux
export GEMINI_API_KEY=your_api_key_here
Run the Application:
python main.py <path_to_pdf>