-
Notifications
You must be signed in to change notification settings - Fork 0
Configuration
Rana Faraz edited this page Jun 23, 2026
·
1 revision
InsightRAG is configured entirely through environment variables loaded from .env (via rag/config.py::Settings, pydantic-based). All settings have offline defaults so the system runs without any configuration.
| Env var | Offline default | Options | Description |
|---|---|---|---|
EMBEDDING_BACKEND |
hash |
hash, sentence-transformers
|
Embedding model for dense retrieval |
RERANK_BACKEND |
lexical |
lexical, cross-encoder
|
Reranker for candidate re-scoring |
LLM_BACKEND |
stub |
stub, ollama, openai
|
Language model for answer generation |
VECTOR_STORE |
memory |
memory, chroma
|
Vector store persistence |
| Env var | Default | Description |
|---|---|---|
EMBEDDING_BACKEND |
hash |
Embedder backend |
RERANK_BACKEND |
lexical |
Reranker backend |
LLM_BACKEND |
stub |
LLM backend |
VECTOR_STORE |
memory |
Vector store backend |
HYBRID_ALPHA |
0.5 |
Blend ratio: 0 = pure BM25, 1 = pure dense |
RETRIEVAL_TOP_K |
10 |
Number of candidates from retriever |
RERANK_TOP_N |
3 |
Candidates kept after reranking |
MIN_RERANK_SCORE |
0.0 |
Score floor; below this the system refuses |
OPENAI_API_KEY |
— | Required only when LLM_BACKEND=openai
|
OLLAMA_BASE_URL |
http://localhost:11434 |
Ollama server URL |
OLLAMA_MODEL |
llama3.1:8b |
Ollama model to use |
CHROMA_PATH |
./chroma_db |
Persistent Chroma directory |
ST_MODEL |
BAAI/bge-small-en-v1.5 |
sentence-transformers model |
CE_MODEL |
cross-encoder/ms-marco-MiniLM-L-6-v2 |
Cross-encoder model |
# Backends — all offline by default; uncomment to switch
# EMBEDDING_BACKEND=sentence-transformers
# RERANK_BACKEND=cross-encoder
# LLM_BACKEND=ollama
# VECTOR_STORE=chroma
# Retrieval tuning
# HYBRID_ALPHA=0.5
# RETRIEVAL_TOP_K=10
# RERANK_TOP_N=3
# MIN_RERANK_SCORE=0.0
# Real model settings (only needed for non-stub backends)
# OPENAI_API_KEY=sk-...
# OLLAMA_BASE_URL=http://localhost:11434
# OLLAMA_MODEL=llama3.1:8b
# CHROMA_PATH=./chroma_db
# ST_MODEL=BAAI/bge-small-en-v1.5
# CE_MODEL=cross-encoder/ms-marco-MiniLM-L-6-v2Copy to .env:
cp .env.example .env
# Edit .env to enable the backends you wantpip install -e ".[local]" # installs sentence-transformers + chromadbIn .env:
EMBEDDING_BACKEND=sentence-transformers
RERANK_BACKEND=cross-encoder
LLM_BACKEND=ollama
VECTOR_STORE=chromaRequires Ollama:
ollama serve
ollama pull llama3.1:8bLLM_BACKEND=openai
OPENAI_API_KEY=sk-...API keys are never committed — .gitignore excludes .env.
The CLI (python -m rag.cli) accepts:
python -m rag.cli ask "Your question" [--path PATH] [--top-k K] [--alpha A]
| Flag | Default | Description |
|---|---|---|
--path |
— | File or directory to ingest before answering |
--top-k |
from env | Override RETRIEVAL_TOP_K for this query |
--alpha |
from env | Override HYBRID_ALPHA for this query |
When running with uvicorn app.main:app, the same env vars apply. The service exposes:
-
GET /health— liveness check -
POST /ingest/text— ingest text chunks -
POST /ingest/file— ingest a file path -
POST /chat— answer a query (hybrid retrieve → rerank → generate) -
GET /docs— OpenAPI interactive docs
The eval harness always uses offline defaults regardless of .env, to ensure CI reproducibility. Backends used by the harness are hardcoded in eval_harness/harness.py.