Home

InsightRAG

Production-grade, agentic RAG over your own documents — hybrid retrieval (BM25 + dense), cross-encoder reranking, grounded answers with citations, RAGAS-style evaluation, and safety guardrails. Runs fully offline and free by default (no API key, no GPU), and swaps to real local or hosted models with a single environment variable.

Quick start

git clone https://github.com/ranafaraz/InsightRAG.git
cd InsightRAG
pip install -e ".[dev]"
python -m rag.cli ask "What does InsightRAG use to rerank candidates?" --path docs/

Architecture overview

flowchart LR
    subgraph Ingest
        A[PDF / Markdown / TXT] --> B[Chunker]
        B --> C[Embedder]
        C --> D[(Vector store)]
    end
    Q[User query] --> G1{Injection guard}
    G1 -- blocked --> X[Refuse + flag]
    G1 -- ok --> R[Hybrid retrieval\nBM25 + dense]
    D --> R
    R --> RR[Cross-encoder rerank]
    RR --> GEN{Enough context?}
    GEN -- no --> RF[Refuse honestly]
    GEN -- yes --> L[LLM: grounded answer + citations]
    L --> G2[PII redaction]
    G2 --> ANS[Answer + citations + latency]

The pipeline has four stages: ingest (chunk → embed → store), retrieve (hybrid BM25+dense fusion), rerank (cross-encoder re-scoring), and generate (grounded LLM with citation verification + safety guardrails). Every heavy component runs offline by default; swap to real models with env vars.

Key results (offline backends, 12-question benchmark)

Metric	Value
Recall@5	0.917
MRR (hybrid)	0.931
Faithfulness	0.907
Prompt tokens saved by reranker	−58% (≈933 → ≈389)
Injection detection F1	1.000
PII redaction accuracy	1.000

Wiki pages

Architecture — pipeline design, offline backends, retrieval and reranking strategy, guardrail design
Evaluation — recall@5, MRR, faithfulness, guardrail metrics, ablation table, how to reproduce
Configuration — all env vars, backend options, .env.example, CLI flags
Development — local setup, project structure, running tests, adding a new backend

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

InsightRAG

Quick start

Architecture overview

Key results (offline backends, 12-question benchmark)

Wiki pages

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally