A Retrieval-Augmented Generation (RAG) chatbot built using LangChain, LangGraph, and FAISS for intelligent document search and question answering.
This project enables users to query documents (PDFs, URLs, etc.) and receive context-aware, accurate responses powered by LLMs.
- 🔍 Semantic document search using FAISS
- 🧠 Context-aware responses with RAG pipeline
- 🔗 Graph-based orchestration using LangGraph
- 📄 Supports PDF and URL ingestion
- ⚡ Modular and scalable architecture
- 🧩 Easy to extend with new data sources
.
├── data/
│ ├── attention.pdf
│ └── url.txt
├── src/
│ ├── config/ # Configuration files
│ ├── document_ingestion/ # Data loading & preprocessing
│ ├── graph_builder/ # LangGraph workflow setup
│ ├── node/ # Graph nodes (LLM, retrieval, etc.)
│ ├── state/ # State management
│ ├── vectorstore/ # FAISS vector DB logic
│ └── __init__.py
├── .python-version
└── README.md
- LLM Framework: LangChain
- Workflow Orchestration: LangGraph
- Vector Database: FAISS
- Language: Python
- Load PDFs or URLs
- Split into chunks
- Generate embeddings
- Store embeddings in FAISS
- Enable fast similarity search
- User query is embedded
- Relevant chunks retrieved
- Retrieved context + query sent to LLM
- Generates grounded response
- Nodes handle each step (retrieval, generation, etc.)
- State flows through graph
# Clone the repository
git clone https://github.com/namansinghal111/DocumentSearch_RAG_Chatboat.git
# Navigate to project
cd DocumentSearch_RAG_Chatboat
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtpython main.py- Add your documents to the
data/folder - Configure settings in
src/config - Run the chatbot and start querying
Set your environment variables:
OPENAI_API_KEY=your_api_keyYou can also modify:
- Chunk size
- Embedding model
- Retrieval strategy
Q: What is attention mechanism?
A: The attention mechanism allows models to focus on relevant parts...
- 🌐 Web UI (Streamlit / React)
- 📊 Better evaluation metrics
- 🔐 Authentication & user sessions
- 🗂️ Multi-document collections
- 🧩 Plug-and-play LLM support