Configuration

InsightRAG is configured entirely through environment variables loaded from .env (via rag/config.py::Settings, pydantic-based). All settings have offline defaults so the system runs without any configuration.

Backend env vars

Env var	Offline default	Options	Description
`EMBEDDING_BACKEND`	`hash`	`hash`, `sentence-transformers`	Embedding model for dense retrieval
`RERANK_BACKEND`	`lexical`	`lexical`, `cross-encoder`	Reranker for candidate re-scoring
`LLM_BACKEND`	`stub`	`stub`, `ollama`, `openai`	Language model for answer generation
`VECTOR_STORE`	`memory`	`memory`, `chroma`	Vector store persistence

Full env var reference

Env var	Default	Description
`EMBEDDING_BACKEND`	`hash`	Embedder backend
`RERANK_BACKEND`	`lexical`	Reranker backend
`LLM_BACKEND`	`stub`	LLM backend
`VECTOR_STORE`	`memory`	Vector store backend
`HYBRID_ALPHA`	`0.5`	Blend ratio: 0 = pure BM25, 1 = pure dense
`RETRIEVAL_TOP_K`	`10`	Number of candidates from retriever
`RERANK_TOP_N`	`3`	Candidates kept after reranking
`MIN_RERANK_SCORE`	`0.0`	Score floor; below this the system refuses
`OPENAI_API_KEY`	—	Required only when `LLM_BACKEND=openai`
`OLLAMA_BASE_URL`	`http://localhost:11434`	Ollama server URL
`OLLAMA_MODEL`	`llama3.1:8b`	Ollama model to use
`CHROMA_PATH`	`./chroma_db`	Persistent Chroma directory
`ST_MODEL`	`BAAI/bge-small-en-v1.5`	sentence-transformers model
`CE_MODEL`	`cross-encoder/ms-marco-MiniLM-L-6-v2`	Cross-encoder model

`.env.example`

# Backends — all offline by default; uncomment to switch
# EMBEDDING_BACKEND=sentence-transformers
# RERANK_BACKEND=cross-encoder
# LLM_BACKEND=ollama
# VECTOR_STORE=chroma

# Retrieval tuning
# HYBRID_ALPHA=0.5
# RETRIEVAL_TOP_K=10
# RERANK_TOP_N=3
# MIN_RERANK_SCORE=0.0

# Real model settings (only needed for non-stub backends)
# OPENAI_API_KEY=sk-...
# OLLAMA_BASE_URL=http://localhost:11434
# OLLAMA_MODEL=llama3.1:8b
# CHROMA_PATH=./chroma_db
# ST_MODEL=BAAI/bge-small-en-v1.5
# CE_MODEL=cross-encoder/ms-marco-MiniLM-L-6-v2

Copy to .env:

cp .env.example .env
# Edit .env to enable the backends you want

Backend upgrade paths

From offline to local models (still free)

pip install -e ".[local]"    # installs sentence-transformers + chromadb

In .env:

EMBEDDING_BACKEND=sentence-transformers
RERANK_BACKEND=cross-encoder
LLM_BACKEND=ollama
VECTOR_STORE=chroma

Requires Ollama:

ollama serve
ollama pull llama3.1:8b

From local to OpenAI

LLM_BACKEND=openai
OPENAI_API_KEY=sk-...

API keys are never committed — .gitignore excludes .env.

CLI flags

The CLI (python -m rag.cli) accepts:

python -m rag.cli ask "Your question" [--path PATH] [--top-k K] [--alpha A]

Flag	Default	Description
`--path`	—	File or directory to ingest before answering
`--top-k`	from env	Override `RETRIEVAL_TOP_K` for this query
`--alpha`	from env	Override `HYBRID_ALPHA` for this query

FastAPI service settings

When running with uvicorn app.main:app, the same env vars apply. The service exposes:

GET /health — liveness check
POST /ingest/text — ingest text chunks
POST /ingest/file — ingest a file path
POST /chat — answer a query (hybrid retrieve → rerank → generate)
GET /docs — OpenAPI interactive docs

Eval harness settings

The eval harness always uses offline defaults regardless of .env, to ensure CI reproducibility. Backends used by the harness are hardcoded in eval_harness/harness.py.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configuration

Configuration

Backend env vars

Full env var reference

`.env.example`

Backend upgrade paths

From offline to local models (still free)

From local to OpenAI

CLI flags

FastAPI service settings

Eval harness settings

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally