Skip to content

Aaditthesmart/RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 Local RAG Pipeline (Groq + FAISS)

A lightning-fast, highly contextual Retrieval-Augmented Generation (RAG) system built from scratch. It allows you to chat securely with your private documents by leveraging local vector embeddings and the blazing-fast Groq API for inference.

By utilizing local embeddings and the FAISS vector database, this engine avoids cloud database costs while securely analyzing your offline documents.

🚀 Features

  • Multi-Format Ingestion: Natively reads and parses .pdf, .txt, .csv, .xlsx, .docx, and .json files.
  • Local Embeddings: Uses HuggingFace's sentence-transformers (all-MiniLM-L6-v2) locally to bypass cloud embedding costs.
  • Offline Vector Database: Employs Meta's FAISS (Facebook AI Similarity Search) to organize your data into a locally queried index.
  • Groq LLaMA / Gemma Integration: Connects to Groq's API via LangChain to generate high-quality text leveraging top-tier models like LLaMA-3.3-70B or Gemma-2-9B.

🛠️ Tech Stack

  • Framework: LangChain
  • Vector DB: Meta FAISS
  • Embeddings: SentenceTransformers
  • Language Models: LLaMA-3.3-70B / Gemma-2-9B (via Groq API)
  • Environment: Python 3.11+, uv Package Manager

📂 Project Structure

RAG/
│
├── data/                  # Drop all your files (PDFs, Excel, Word) here!
│   ├── pdf/
│   └── text_files/
├── src/                   # Core pipeline logic
│   ├── data_loader.py     # Parses various doc formats via LangChain loaders
│   ├── embedding.py       # Chunks text and processes dense vector embeddings
│   ├── vectorstore.py     # Handles FAISS index creation and metadata persistence
│   └── search.py          # Queries FAISS and synthesizes the final LLM response
├── notebook/              # Jupyter notebooks for testing models and evaluation
├── .env                   # Store your Groq API Keys here
├── pyproject.toml         # Dependencies managed via tools like uv
├── requirements.txt       # Base pip requirements
└── main.py                # Entry point

⚙️ Setup & Installation

1. Clone the repository:

git clone https://github.com/Aaditthesmart/RAG.git
cd RAG

2. Install dependencies: The project includes a uv.lock file for fast installation, or you can use pip:

# Using uv (recommended)
uv sync

# OR using pip
pip install -r requirements.txt

3. Configure Environment Variables: Create a .env file in the root directory and add your Groq API key:

GROQ_API_KEY="your_groq_api_key_here"

💡 How to Use

  1. Add your files: Drop any documents you want to analyze (.pdf, .csv, .txt, etc.) into the data/ folder.
  2. Build the Index & Query: You can run directly via the search.py script:
    python src/search.py
    (Note: The first run will automatically trigger data_loader.py to parse your data/ folder and vectorstore.py to build and save your faiss_store locally.)

Built as a custom document retrieval intelligence tool.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors