-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Rana Faraz edited this page Jun 23, 2026
·
1 revision
Production-grade, agentic RAG over your own documents — hybrid retrieval (BM25 + dense), cross-encoder reranking, grounded answers with citations, RAGAS-style evaluation, and safety guardrails. Runs fully offline and free by default (no API key, no GPU), and swaps to real local or hosted models with a single environment variable.
git clone https://github.com/ranafaraz/InsightRAG.git
cd InsightRAG
pip install -e ".[dev]"
python -m rag.cli ask "What does InsightRAG use to rerank candidates?" --path docs/flowchart LR
subgraph Ingest
A[PDF / Markdown / TXT] --> B[Chunker]
B --> C[Embedder]
C --> D[(Vector store)]
end
Q[User query] --> G1{Injection guard}
G1 -- blocked --> X[Refuse + flag]
G1 -- ok --> R[Hybrid retrieval\nBM25 + dense]
D --> R
R --> RR[Cross-encoder rerank]
RR --> GEN{Enough context?}
GEN -- no --> RF[Refuse honestly]
GEN -- yes --> L[LLM: grounded answer + citations]
L --> G2[PII redaction]
G2 --> ANS[Answer + citations + latency]
The pipeline has four stages: ingest (chunk → embed → store), retrieve (hybrid BM25+dense fusion), rerank (cross-encoder re-scoring), and generate (grounded LLM with citation verification + safety guardrails). Every heavy component runs offline by default; swap to real models with env vars.
| Metric | Value |
|---|---|
| Recall@5 | 0.917 |
| MRR (hybrid) | 0.931 |
| Faithfulness | 0.907 |
| Prompt tokens saved by reranker | −58% (≈933 → ≈389) |
| Injection detection F1 | 1.000 |
| PII redaction accuracy | 1.000 |
- Architecture — pipeline design, offline backends, retrieval and reranking strategy, guardrail design
- Evaluation — recall@5, MRR, faithfulness, guardrail metrics, ablation table, how to reproduce
-
Configuration — all env vars, backend options,
.env.example, CLI flags - Development — local setup, project structure, running tests, adding a new backend