Skip to content

Configuration

Rana Faraz edited this page Jun 23, 2026 · 1 revision

Configuration

InsightRAG is configured entirely through environment variables loaded from .env (via rag/config.py::Settings, pydantic-based). All settings have offline defaults so the system runs without any configuration.

Backend env vars

Env var Offline default Options Description
EMBEDDING_BACKEND hash hash, sentence-transformers Embedding model for dense retrieval
RERANK_BACKEND lexical lexical, cross-encoder Reranker for candidate re-scoring
LLM_BACKEND stub stub, ollama, openai Language model for answer generation
VECTOR_STORE memory memory, chroma Vector store persistence

Full env var reference

Env var Default Description
EMBEDDING_BACKEND hash Embedder backend
RERANK_BACKEND lexical Reranker backend
LLM_BACKEND stub LLM backend
VECTOR_STORE memory Vector store backend
HYBRID_ALPHA 0.5 Blend ratio: 0 = pure BM25, 1 = pure dense
RETRIEVAL_TOP_K 10 Number of candidates from retriever
RERANK_TOP_N 3 Candidates kept after reranking
MIN_RERANK_SCORE 0.0 Score floor; below this the system refuses
OPENAI_API_KEY Required only when LLM_BACKEND=openai
OLLAMA_BASE_URL http://localhost:11434 Ollama server URL
OLLAMA_MODEL llama3.1:8b Ollama model to use
CHROMA_PATH ./chroma_db Persistent Chroma directory
ST_MODEL BAAI/bge-small-en-v1.5 sentence-transformers model
CE_MODEL cross-encoder/ms-marco-MiniLM-L-6-v2 Cross-encoder model

.env.example

# Backends — all offline by default; uncomment to switch
# EMBEDDING_BACKEND=sentence-transformers
# RERANK_BACKEND=cross-encoder
# LLM_BACKEND=ollama
# VECTOR_STORE=chroma

# Retrieval tuning
# HYBRID_ALPHA=0.5
# RETRIEVAL_TOP_K=10
# RERANK_TOP_N=3
# MIN_RERANK_SCORE=0.0

# Real model settings (only needed for non-stub backends)
# OPENAI_API_KEY=sk-...
# OLLAMA_BASE_URL=http://localhost:11434
# OLLAMA_MODEL=llama3.1:8b
# CHROMA_PATH=./chroma_db
# ST_MODEL=BAAI/bge-small-en-v1.5
# CE_MODEL=cross-encoder/ms-marco-MiniLM-L-6-v2

Copy to .env:

cp .env.example .env
# Edit .env to enable the backends you want

Backend upgrade paths

From offline to local models (still free)

pip install -e ".[local]"    # installs sentence-transformers + chromadb

In .env:

EMBEDDING_BACKEND=sentence-transformers
RERANK_BACKEND=cross-encoder
LLM_BACKEND=ollama
VECTOR_STORE=chroma

Requires Ollama:

ollama serve
ollama pull llama3.1:8b

From local to OpenAI

LLM_BACKEND=openai
OPENAI_API_KEY=sk-...

API keys are never committed — .gitignore excludes .env.

CLI flags

The CLI (python -m rag.cli) accepts:

python -m rag.cli ask "Your question" [--path PATH] [--top-k K] [--alpha A]
Flag Default Description
--path File or directory to ingest before answering
--top-k from env Override RETRIEVAL_TOP_K for this query
--alpha from env Override HYBRID_ALPHA for this query

FastAPI service settings

When running with uvicorn app.main:app, the same env vars apply. The service exposes:

  • GET /health — liveness check
  • POST /ingest/text — ingest text chunks
  • POST /ingest/file — ingest a file path
  • POST /chat — answer a query (hybrid retrieve → rerank → generate)
  • GET /docs — OpenAPI interactive docs

Eval harness settings

The eval harness always uses offline defaults regardless of .env, to ensure CI reproducibility. Backends used by the harness are hardcoded in eval_harness/harness.py.

Clone this wiki locally