Skip to content

Yashajmeri/LLM_Based_Agentic_RAG

Repository files navigation

Agentic RAG System

A production-style Corrective RAG (CRAG) pipeline built with LangGraph, FAISS, Groq LLaMA-3.3-70b, and Streamlit. Unlike a basic retrieve-then-answer loop, the system uses an LLM to grade every retrieval for relevance, automatically rewrites and retries the query when the first pass falls short, and only then synthesises a grounded answer — all orchestrated as a compiled LangGraph state machine.


Architecture

  ┌──────────────┐
  │  User Query  │
  └──────┬───────┘
         │
         ▼
  ┌──────────────────┐
  │  retrieve_docs   │  FAISS semantic search over indexed documents
  └──────┬───────────┘
         │
         ▼
  ┌──────────────────┐
  │   grade_docs     │  LLM binary relevance check on retrieved passages
  └──────┬───────────┘
         │
    ─────┴──────────────────────────
    │                              │
 relevant                     not relevant
    │                              │
    │                  ┌───────────────────────┐
    │                  │  rewrite_and_retrieve  │  LLM rewrites query,
    │                  │  (max 1 retry)         │  re-fetches from FAISS
    │                  └───────────┬───────────┘
    │                              │
    └──────────────┬───────────────┘
                   │
                   ▼
  ┌──────────────────┐
  │ generate_answer  │  Direct LLM call with graded context
  └──────┬───────────┘
         │
         ▼
  ┌───────────────┐
  │  Final Answer │
  └───────────────┘

State (RAGState) flows through each node carrying question, retrieved_docs, grade, and answer.
The 1-retry ceiling is enforced structurally — there is no edge from rewrite_and_retrieve back to grade_docs.


Tech Stack

Layer Technology
LLM Groq — llama-3.3-70b-versatile
Orchestration LangGraph StateGraph with conditional routing
RAG pattern Corrective RAG — LLM-based relevance grading + query rewriting
Vector store FAISS (in-memory)
Embeddings sentence-transformers/all-MiniLM-L6-v2 (local, no API cost)
Document loaders LangChain — Web, PDF, TXT
UI Streamlit
Evaluation Cosine similarity via sentence-transformers
Package management pip / uv

Setup

1. Clone the repository

git clone https://github.com/your-username/agentic-rag.git
cd agentic-rag

2. Create a virtual environment

python -m venv .venv

# Windows
.venv\Scripts\activate

# macOS / Linux
source .venv/bin/activate

3. Install dependencies

pip install -r requirements.txt

4. Add your Groq API key

Get a free key at console.groq.com → API Keys, then:

# Windows
copy .env.example .env

# macOS / Linux
cp .env.example .env

Open .env and replace the placeholder:

GROQ_API_KEY="gsk_your_actual_key_here"

5. Run the Streamlit app

streamlit run streamlit_app.py

The app fetches and indexes the default source documents on first launch (~30 seconds), then the chat interface is ready.


Example Questions

Questions drawn from the indexed sources (Lilian Weng's blog posts on LLM agents and video diffusion):

  • What are the three main components of an LLM-powered autonomous agent?
  • How does chain-of-thought prompting help agents solve complex tasks?
  • What types of memory does an LLM agent use, and how do they differ?
  • What are the main challenges of applying diffusion models to video generation?
  • What role does classifier-free guidance play in diffusion models?

Running the Evaluator

eval.py scores the system against 6 reference Q&A pairs using cosine similarity between expected and actual answers:

python eval.py

Results are printed to the terminal and saved to eval_results.json.


Project Structure

agentic-rag/
│
├── src/
│   ├── config/
│   │   └── config.py              # Groq API key, model name, chunk settings
│   ├── document_ingestion/
│   │   └── document_processor.py  # Web / PDF / TXT loaders + text splitter
│   ├── graph_builder/
│   │   └── graph_builder.py       # LangGraph StateGraph definition
│   ├── node/
│   │   └── nodes.py               # retrieve_docs, grade_docs,
│   │                              # rewrite_and_retrieve, generate_answer
│   ├── state/
│   │   └── rag_state.py           # RAGState Pydantic model
│   └── vectorstore/
│       └── vectorstore.py         # FAISS + HuggingFace embeddings
│
├── data/
│   ├── attention.pdf              # Optional local PDF source
│   └── url.txt                    # Default source URLs
│
├── streamlit_app.py               # Streamlit UI entry point
├── main.py                        # CLI entry point with interactive mode
├── eval.py                        # Evaluation harness (cosine similarity)
├── requirements.txt               # Pip dependencies
├── pyproject.toml                 # Project metadata and pinned deps
├── .env.example                   # Environment variable template
└── README.md

How It Works

  1. Ingestion — URLs and local files are loaded, split into 500-token chunks, and embedded with all-MiniLM-L6-v2 into an in-memory FAISS index.
  2. Retrievalretrieve_docs runs a cosine-similarity search and returns the top-k passages.
  3. Gradinggrade_docs sends the passages to the LLM with a binary relevance prompt (relevant / not relevant). Short-circuits without an LLM call when no docs are returned.
  4. Correction — if graded not_relevant, rewrite_and_retrieve asks the LLM to reformulate the query, then re-fetches. Max one retry — enforced by graph topology.
  5. Generationgenerate_answer calls the LLM with the graded context and returns a concise, grounded answer.

About

Corrective RAG pipeline built with LangGraph StateGraph, Groq LLaMA-3.3-70B, and FAISS. Features LLM-based relevance grading, automatic query rewriting with max-1-retry, and a Streamlit chat interface for real-time document Q&A.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages