🧠 Local RAG Pipeline (Groq + FAISS)

A lightning-fast, highly contextual Retrieval-Augmented Generation (RAG) system built from scratch. It allows you to chat securely with your private documents by leveraging local vector embeddings and the blazing-fast Groq API for inference.

By utilizing local embeddings and the FAISS vector database, this engine avoids cloud database costs while securely analyzing your offline documents.

🚀 Features

Multi-Format Ingestion: Natively reads and parses .pdf, .txt, .csv, .xlsx, .docx, and .json files.
Local Embeddings: Uses HuggingFace's sentence-transformers (all-MiniLM-L6-v2) locally to bypass cloud embedding costs.
Offline Vector Database: Employs Meta's FAISS (Facebook AI Similarity Search) to organize your data into a locally queried index.
Groq LLaMA / Gemma Integration: Connects to Groq's API via LangChain to generate high-quality text leveraging top-tier models like LLaMA-3.3-70B or Gemma-2-9B.

🛠️ Tech Stack

Framework: LangChain
Vector DB: Meta FAISS
Embeddings: SentenceTransformers
Language Models: LLaMA-3.3-70B / Gemma-2-9B (via Groq API)
Environment: Python 3.11+, uv Package Manager

📂 Project Structure

RAG/
│
├── data/                  # Drop all your files (PDFs, Excel, Word) here!
│   ├── pdf/
│   └── text_files/
├── src/                   # Core pipeline logic
│   ├── data_loader.py     # Parses various doc formats via LangChain loaders
│   ├── embedding.py       # Chunks text and processes dense vector embeddings
│   ├── vectorstore.py     # Handles FAISS index creation and metadata persistence
│   └── search.py          # Queries FAISS and synthesizes the final LLM response
├── notebook/              # Jupyter notebooks for testing models and evaluation
├── .env                   # Store your Groq API Keys here
├── pyproject.toml         # Dependencies managed via tools like uv
├── requirements.txt       # Base pip requirements
└── main.py                # Entry point

⚙️ Setup & Installation

1. Clone the repository:

git clone https://github.com/Aaditthesmart/RAG.git
cd RAG

2. Install dependencies: The project includes a uv.lock file for fast installation, or you can use pip:

# Using uv (recommended)
uv sync

# OR using pip
pip install -r requirements.txt

3. Configure Environment Variables: Create a .env file in the root directory and add your Groq API key:

GROQ_API_KEY="your_groq_api_key_here"

💡 How to Use

Add your files: Drop any documents you want to analyze (.pdf, .csv, .txt, etc.) into the data/ folder.
Build the Index & Query: You can run directly via the search.py script:
```
python src/search.py
```
(Note: The first run will automatically trigger data_loader.py to parse your data/ folder and vectorstore.py to build and save your faiss_store locally.)

Built as a custom document retrieval intelligence tool.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Local RAG Pipeline (Groq + FAISS)

🚀 Features

🛠️ Tech Stack

📂 Project Structure

⚙️ Setup & Installation

💡 How to Use

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
notebook		notebook
src		src
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

🧠 Local RAG Pipeline (Groq + FAISS)

🚀 Features

🛠️ Tech Stack

📂 Project Structure

⚙️ Setup & Installation

💡 How to Use

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages