A privacy-first, fully local Retrieval-Augmented Generation (RAG) system that enables intelligent Q&A over your documents using Ollama and ChromaDB.
Transform your document collection into an interactive knowledge base:
Key Features:
- π 100% Local: No data leaves your machine
- π Multi-format: PDF, DOCX, TXT, XLSX, CSV, PPTX, HTML, Markdown
- π Hybrid Search: Semantic + BM25 for best results
- π¬ Streaming Chat: Real-time responses with source citations
- π¨ Modern UI: React/Next.js with dark mode
- π§ͺ Evaluated: LLM-as-Judge scoring (avg 7.2/10)
| Requirement | Version | Notes |
|---|---|---|
| Python | 3.10+ | Required |
| Node.js | 18+ | For frontend |
| Ollama | Latest | Download here |
git clone https://github.com/YsK-dev/Rag-case-study-.git
cd Rag-case-study# Install Ollama (macOS)
brew install ollama
# On Winows
winget install Ollama.Ollama
# Linux (Ubuntu / Debian / Arch / Fedora etc.)
curl -fsSL https://ollama.com/install.sh | sh
# Start Ollama service
ollama serve
# Pull required models (in another terminal)
ollama pull qwen3:1.7b # Fast model (β‘ 1.7B params)
ollama pull pielee/qwen3-4b-thinking-2507_q8 # Smart model (π§ 4B params, reasoning)Available Models:
| Model ID | Label | Tier | Parameters | Best For |
|---|---|---|---|---|
qwen3-1.7b |
Qwen3 1.7B | β‘ Fast | 1.7B | Quick responses, low latency |
qwen3-4b-thinking |
Qwen3 4B Thinking | π§ Smart | 4B | Complex reasoning, chain-of-thought |
# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install Python dependencies
pip install -r requirements.txtBackend Dependencies (requirements.txt):
fastapi # Web framework
uvicorn # ASGI server
chromadb # Vector database
sentence-transformers # Embedding model
pypdf # PDF parsing
pymupdf # PDF rendering
python-docx # DOCX support
openpyxl # Excel support
python-pptx # PowerPoint support
beautifulsoup4 # HTML parsing
rank-bm25 # BM25 scoring
ollama # LLM client
pytest # Testing
cd frontedRag/rag-app
# Install Node.js dependencies
npm installFrontend Dependencies (package.json):
next # React framework (v16)
react # UI library (v19)
tailwindcss # Styling (v4)
framer-motion # Animations
lucide-react # Icons
react-pdf # PDF preview
ollama serve
# Runs on http://localhost:11434uvicorn main:app --reload
# Runs on http://localhost:8000cd frontedRag/rag-app
npm run dev
# Runs on http://localhost:3000# Check backend health
curl http://localhost:8000/api/health
# Check available models
curl http://localhost:8000/api/models- Open
http://localhost:3000in your browser - Upload documents via drag-and-drop
- Select model (Fast β‘ or Smart π§ )
- Ask questions in natural language
- View answers with source citations
| Endpoint | Method | Description |
|---|---|---|
/api/chat |
POST | Ask questions (supports streaming) |
/api/upload |
POST | Upload documents |
/api/documents |
GET | List indexed documents |
/api/documents/{file} |
DELETE | Remove document |
/api/models |
GET | List available LLM models |
/api/health |
GET | Health check |
curl -X POST http://localhost:8000/api/chat \
-H "Content-Type: application/json" \
-d '{
"question": "What is RAG?",
"model": "qwen3-1.7b",
"stream": true,
"top_k": 5
}'Tech Stack:
- Backend: FastAPI, Uvicorn, Pydantic
- Vector Store: ChromaDB (default), FAISS (optional)
- Embeddings: sentence-transformers (all-MiniLM-L6-v2)
- LLM: Ollama with qwen3 models
- Frontend: Next.js 16, React 19, TailwindCSS 4
RAG_TESTING=1 pytest tests/test_app.py -v
# 24 tests covering all endpoints# Set Groq API key (get free at console.groq.com)
export GROQ_API_KEY=gsk_...
# Run evaluation
python tests/eval_judge.py --input data/eval_cases.json --output data/eval_results.jsonlLatest Evaluation Results:
| Metric | Score |
|---|---|
| Overall | 7.19/10 |
| qwen3-32b judge | 7.98/10 |
| gpt-oss-120b judge | 7.14/10 |
| Setting | Location | Default |
|---|---|---|
| LLM Models | app.py β MODEL_REGISTRY |
qwen3-1.7b, qwen3-4b-thinking |
| Embedding Model | rag_engine.py |
all-MiniLM-L6-v2 |
| Chunk Size | rag_engine.py |
500 tokens |
| Chunk Overlap | rag_engine.py |
100 tokens |
| Backend Port | app.py |
8000 |
| Frontend Port | next.config.js |
3000 |
-
Pull the model:
ollama pull <model-name>
-
Add to
MODEL_REGISTRYinapp.py:MODEL_REGISTRY = { "new-model": { "ollama_name": "<model-name>", "label": "Display Name", "tier": "fast", # or "smart" "description": "Model description", "params": "7B", }, }
Detailed documentation is available in docs/:
| Document | Topic |
|---|---|
| 01_problem_definition.md | User needs & solution overview |
| 02_llm_layer.md | LLM integration & guardrails |
| 03_vector_layer.md | Chunking, embeddings, hybrid search |
| 04_api_layer.md | API design & endpoints |
| 05_architecture.md | System architecture diagrams |
| 06_literature_review.md | Technology comparisons |
MIT License - see LICENSE for details.
