Skip to content

prudhivinath/RAG_pdf_chatbot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Offline RAG-Powered Chatbot for PDF Knowledge Extraction

📂 Project Files (in order)

  1. code_3.py – Main Python script for running the RAG chatbot.
  2. lsb.pdf – Sample PDF document used for testing question-answering.
  3. .gitignore – Configuration file to exclude large model and installer files.

🚀 Project Overview

This project implements an offline Retrieval-Augmented Generation (RAG) chatbot that can answer questions based on the contents of PDF documents.
The chatbot works completely offline, ensuring data privacy, security, and independence from external APIs.

Workflow:

  1. Extract text from PDFs using pdfplumber.
  2. Split text into chunks and generate embeddings using SentenceTransformers.
  3. Store embeddings in DuckDB for efficient vector search.
  4. Retrieve the most relevant chunks for a given query.
  5. Use LangChain to pass the retrieved context into a locally run LLM (LLaMA 3).
  6. Generate accurate, context-aware responses.

🛠️ Technologies & Libraries Used

  • Python 3.10+
  • LangChain – for building RAG pipelines
  • pdfplumber – for extracting text from PDFs
  • DuckDB – lightweight embedded database
  • SentenceTransformers – for semantic vector embeddings
  • Torch – backend for deep learning
  • Ollama (or other LLaMA runtime) – for running the LLaMA model locally

📦 Requirements

Install dependencies manually with pip, or just copy-paste the below block into your terminal:

pip install langchain
pip install pdfplumber
pip install duckdb
pip install sentence-transformers
pip install torch

About

Offline RAG-powered chatbot for extracting knowledge from PDFs using LangChain, PyTorch, and PDFPlumber.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages