DocuMind-RAG is an intelligent AI-powered PDF chatbot that lets you upload multiple documents and chat directly with their content.
It uses Retrieval-Augmented Generation (RAG) with LangChain, FAISS, and Groq LLaMA 3.1 to deliver accurate, context-aware answers from your files.
Whether you're analyzing research papers, study material, or business reports — DocuMind transforms your PDFs into interactive conversations.
graph TB
subgraph Setup["📄 Document Processing (One-Time Setup)"]
A[Upload PDFs] --> B[PyPDF2: Extract Text]
B --> C[TextSplitter: Split into Chunks]
C --> D[HuggingFace: Generate Embeddings]
D --> E[FAISS: Store Vector DB]
end
subgraph Chat["💬 Conversational Interface"]
F[User Asks Question]
F --> G{Has Chat History?}
end
subgraph RAG["🔍 RAG Workflow"]
G -->|Yes| H[Contextualize Question with History]
G -->|No| I[Use Question As-Is]
H --> J[FAISS: Retrieve Relevant Chunks]
I --> J
J --> K[LLaMA 3.1 via Groq: Generate Answer]
K --> L[Update Chat History]
end
L --> M[Display Response to User]
M --> F
E -.->|Vector Store Ready| F
style A fill:#9D00FF,color:#fff
style E fill:#4CAF50,color:#fff
style F fill:#2196F3,color:#fff
style K fill:#FF9800,color:#fff
style M fill:#2196F3,color:#fff
✅ Upload and process multiple PDF files
✅ Extract, chunk, and embed text using HuggingFace transformers
✅ Store vector embeddings efficiently with FAISS
✅ Query your PDFs conversationally using LLaMA 3-8B (Groq API)
✅ Memory-aware responses with contextual follow-ups
✅ Clean Streamlit UI for real-time chatting
- Frontend: Streamlit
- Backend & RAG: LangChain
- LLM: Groq (Llama 3.1 8B Instant)
- Embeddings: Hugging Face (
all-MiniLM-L6-v2) - Vector Store: FAISS (from Meta)
- PDF Parsing: PyPDF2
Follow these steps to run DocuMind on your local machine.
git clone https://github.com/your-username/DocuMind.git
cd DocuMindIt's highly recommended to use a virtual environment to manage dependencies.
# For macOS/Linux
python3 -m venv venv
source venv/bin/activate
# For Windows
python -m venv venv
.\venv\Scripts\activateYou can install all Python packages using the provided requirements.txt file.
pip install -r requirements.txtThe application uses an API key for the Groq LLM.
-
Create a file named .env in the root of your project directory.
-
Add your Groq API key to this file:
GROQ_API_KEY="your-api-key-here"
Once everything is installed, launch the app from your terminal:
streamlit run frontend.pyYour browser should automatically open to the application.
-
Launch the application using the command above.
-
Use the sidebar to upload one or more PDF files you wish to chat with.
-
Click the "Process" button and wait for the "Documents processed!" success message.
-
The chat input box at the bottom of the page will become active.
-
Start asking your questions!
Aryan Gupta
📍 Bhilai, Chhattisgarh
🔗 GitHub Profile
If you like this project, leave a ⭐ and share it with others!