Skip to content

YsK-dev/Rag-case-study-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

18 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

RAG Case Study - Local Document Q&A System

A privacy-first, fully local Retrieval-Augmented Generation (RAG) system that enables intelligent Q&A over your documents using Ollama and ChromaDB.

Python FastAPI Next.js License


What This Project Does

Transform your document collection into an interactive knowledge base:

Architecture

alt text

Key Features:

  • πŸ”’ 100% Local: No data leaves your machine
  • πŸ“„ Multi-format: PDF, DOCX, TXT, XLSX, CSV, PPTX, HTML, Markdown
  • πŸ” Hybrid Search: Semantic + BM25 for best results
  • πŸ’¬ Streaming Chat: Real-time responses with source citations
  • 🎨 Modern UI: React/Next.js with dark mode
  • πŸ§ͺ Evaluated: LLM-as-Judge scoring (avg 7.2/10)

πŸš€ Installation & Setup

Prerequisites

Requirement Version Notes
Python 3.10+ Required
Node.js 18+ For frontend
Ollama Latest Download here

Step 1: Clone the Repository

git clone https://github.com/YsK-dev/Rag-case-study-.git 
cd Rag-case-study

Step 2: Install Ollama Models

# Install Ollama (macOS)
brew install ollama
# On Winows
winget install Ollama.Ollama
# Linux (Ubuntu / Debian / Arch / Fedora etc.)
curl -fsSL https://ollama.com/install.sh | sh


# Start Ollama service
ollama serve

# Pull required models (in another terminal)
ollama pull qwen3:1.7b                           # Fast model (⚑ 1.7B params)
ollama pull pielee/qwen3-4b-thinking-2507_q8     # Smart model (🧠 4B params, reasoning)

Available Models:

Model ID Label Tier Parameters Best For
qwen3-1.7b Qwen3 1.7B ⚑ Fast 1.7B Quick responses, low latency
qwen3-4b-thinking Qwen3 4B Thinking 🧠 Smart 4B Complex reasoning, chain-of-thought

Step 3: Setup Backend (Python)

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install Python dependencies
pip install -r requirements.txt

Backend Dependencies (requirements.txt):

fastapi          # Web framework
uvicorn          # ASGI server
chromadb         # Vector database
sentence-transformers  # Embedding model
pypdf            # PDF parsing
pymupdf          # PDF rendering
python-docx      # DOCX support
openpyxl         # Excel support
python-pptx      # PowerPoint support
beautifulsoup4   # HTML parsing
rank-bm25        # BM25 scoring
ollama           # LLM client
pytest           # Testing

Step 4: Setup Frontend (Next.js)

cd frontedRag/rag-app

# Install Node.js dependencies
npm install

Frontend Dependencies (package.json):

next             # React framework (v16)
react            # UI library (v19)
tailwindcss      # Styling (v4)
framer-motion    # Animations
lucide-react     # Icons
react-pdf        # PDF preview

πŸƒ Running the Application

Terminal 1: Start Ollama

ollama serve
# Runs on http://localhost:11434

Terminal 2: Start Backend

uvicorn main:app --reload

# Runs on http://localhost:8000

Terminal 3: Start Frontend

cd frontedRag/rag-app
npm run dev
# Runs on http://localhost:3000

Verify Everything Works

# Check backend health
curl http://localhost:8000/api/health

# Check available models
curl http://localhost:8000/api/models

πŸ“– Usage

Web Interface

  1. Open http://localhost:3000 in your browser
  2. Upload documents via drag-and-drop
  3. Select model (Fast ⚑ or Smart 🧠)
  4. Ask questions in natural language
  5. View answers with source citations

API Endpoints

Endpoint Method Description
/api/chat POST Ask questions (supports streaming)
/api/upload POST Upload documents
/api/documents GET List indexed documents
/api/documents/{file} DELETE Remove document
/api/models GET List available LLM models
/api/health GET Health check

Example API Request

curl -X POST http://localhost:8000/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What is RAG?",
    "model": "qwen3-1.7b",
    "stream": true,
    "top_k": 5
  }'

Tech Stack:

  • Backend: FastAPI, Uvicorn, Pydantic
  • Vector Store: ChromaDB (default), FAISS (optional)
  • Embeddings: sentence-transformers (all-MiniLM-L6-v2)
  • LLM: Ollama with qwen3 models
  • Frontend: Next.js 16, React 19, TailwindCSS 4

πŸ§ͺ Testing & Evaluation

Unit Tests

RAG_TESTING=1 pytest tests/test_app.py -v
# 24 tests covering all endpoints

LLM-as-Judge Evaluation

# Set Groq API key (get free at console.groq.com)
export GROQ_API_KEY=gsk_...

# Run evaluation
python tests/eval_judge.py --input data/eval_cases.json --output data/eval_results.jsonl

Latest Evaluation Results:

Metric Score
Overall 7.19/10
qwen3-32b judge 7.98/10
gpt-oss-120b judge 7.14/10

βš™οΈ Configuration

Setting Location Default
LLM Models app.py β†’ MODEL_REGISTRY qwen3-1.7b, qwen3-4b-thinking
Embedding Model rag_engine.py all-MiniLM-L6-v2
Chunk Size rag_engine.py 500 tokens
Chunk Overlap rag_engine.py 100 tokens
Backend Port app.py 8000
Frontend Port next.config.js 3000

Adding New Ollama Models

  1. Pull the model:

    ollama pull <model-name>
  2. Add to MODEL_REGISTRY in app.py:

    MODEL_REGISTRY = {
        "new-model": {
            "ollama_name": "<model-name>",
            "label": "Display Name",
            "tier": "fast",  # or "smart"
            "description": "Model description",
            "params": "7B",
        },
    }

πŸ“š Documentation

Detailed documentation is available in docs/:

Document Topic
01_problem_definition.md User needs & solution overview
02_llm_layer.md LLM integration & guardrails
03_vector_layer.md Chunking, embeddings, hybrid search
04_api_layer.md API design & endpoints
05_architecture.md System architecture diagrams
06_literature_review.md Technology comparisons

πŸ“„ License

MIT License - see LICENSE for details.


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published