Skip to content
Rana Faraz edited this page Jun 23, 2026 · 1 revision

InsightRAG

CI Live demo Python License: MIT

Production-grade, agentic RAG over your own documents — hybrid retrieval (BM25 + dense), cross-encoder reranking, grounded answers with citations, RAGAS-style evaluation, and safety guardrails. Runs fully offline and free by default (no API key, no GPU), and swaps to real local or hosted models with a single environment variable.

Quick start

git clone https://github.com/ranafaraz/InsightRAG.git
cd InsightRAG
pip install -e ".[dev]"
python -m rag.cli ask "What does InsightRAG use to rerank candidates?" --path docs/

Architecture overview

flowchart LR
    subgraph Ingest
        A[PDF / Markdown / TXT] --> B[Chunker]
        B --> C[Embedder]
        C --> D[(Vector store)]
    end
    Q[User query] --> G1{Injection guard}
    G1 -- blocked --> X[Refuse + flag]
    G1 -- ok --> R[Hybrid retrieval\nBM25 + dense]
    D --> R
    R --> RR[Cross-encoder rerank]
    RR --> GEN{Enough context?}
    GEN -- no --> RF[Refuse honestly]
    GEN -- yes --> L[LLM: grounded answer + citations]
    L --> G2[PII redaction]
    G2 --> ANS[Answer + citations + latency]
Loading

The pipeline has four stages: ingest (chunk → embed → store), retrieve (hybrid BM25+dense fusion), rerank (cross-encoder re-scoring), and generate (grounded LLM with citation verification + safety guardrails). Every heavy component runs offline by default; swap to real models with env vars.

Key results (offline backends, 12-question benchmark)

Metric Value
Recall@5 0.917
MRR (hybrid) 0.931
Faithfulness 0.907
Prompt tokens saved by reranker −58% (≈933 → ≈389)
Injection detection F1 1.000
PII redaction accuracy 1.000

Wiki pages

  • Architecture — pipeline design, offline backends, retrieval and reranking strategy, guardrail design
  • Evaluation — recall@5, MRR, faithfulness, guardrail metrics, ablation table, how to reproduce
  • Configuration — all env vars, backend options, .env.example, CLI flags
  • Development — local setup, project structure, running tests, adding a new backend

Clone this wiki locally