From installation to your first query in under 5 minutes.
pip install quantumragpip install quantumrag[all]This installs all optional dependencies: OpenAI, Anthropic, LanceDB, Tantivy, Kiwi, FastAPI, etc.
# Korean language support
pip install quantumrag[korean]
# API server only
pip install quantumrag[api]
# Gemini provider
pip install quantumrag[gemini]
# Reranking models
pip install quantumrag[rerank]git clone https://github.com/quantumaikr/quantumrag.git
cd quantumrag
pip install -e ".[dev,all]"| Requirement | Minimum | Recommended |
|---|---|---|
| Python | 3.10 | 3.11 or 3.12 |
| RAM | 2 GB | 4 GB+ |
| GPU | Not required | Not required |
| Storage | SQLite + LanceDB + Tantivy (local) | Same |
| OS | Linux, macOS, Windows (WSL2) | Any |
QuantumRAG needs an LLM provider API key. Set it via environment variable:
# OpenAI (default provider)
export OPENAI_API_KEY=sk-...
# Or Anthropic
export ANTHROPIC_API_KEY=sk-ant-...
# Or Google Gemini
export GOOGLE_API_KEY=AIza...For local models via Ollama, no API key is needed.
quantumrag initThis creates a quantumrag.yaml config file with sensible defaults.
CLI:
quantumrag ingest ./docs --recursivePython:
from quantumrag import Engine
engine = Engine()
result = engine.ingest("./docs")
print(f"Indexed {result.documents} documents, {result.chunks} chunks")CLI:
quantumrag query "What chunking strategies are available?"Python:
result = engine.query("What reranking providers are supported?")
print(result.answer) # Answer with inline citations [1], [2]
print(result.confidence) # STRONGLY_SUPPORTED / PARTIALLY_SUPPORTED / INSUFFICIENT_EVIDENCE
print(result.sources) # List of source referencesquantumrag serve --port 8000Then query via HTTP:
curl -X POST http://localhost:8000/v1/query \
-H "Content-Type: application/json" \
-d '{"query": "What reranking providers are supported?"}'Install Ollama and pull models:
ollama pull nomic-embed-text
ollama pull llama3.2Then configure QuantumRAG:
from quantumrag import Engine
engine = Engine(
embedding_model="nomic-embed-text",
generation_model="llama3.2",
)
engine.ingest("./docs")
result = engine.query("Summarize the documents")Or via YAML config:
# quantumrag.yaml
models:
embedding:
provider: "ollama"
model: "nomic-embed-text"
generation:
simple:
provider: "ollama"
model: "llama3.2"
medium:
provider: "ollama"
model: "llama3.2"
complex:
provider: "ollama"
model: "llama3.2"For Korean documents without an API key:
models:
embedding:
provider: "local"
model: "BAAI/bge-m3"
dimensions: 1024This downloads and runs the BGE-M3 model locally (CPU-based, multilingual).
QuantumRAG uses a layered configuration system:
defaults ← quantumrag.yaml ← environment variables ← code arguments
quantumrag init # Generates quantumrag.yaml with defaultsAll config keys can be overridden with QUANTUMRAG_ prefix:
export QUANTUMRAG_LANGUAGE=ko
export QUANTUMRAG_RETRIEVAL__TOP_K=10
export QUANTUMRAG_MODELS__EMBEDDING__PROVIDER=geminifrom quantumrag import Engine
from quantumrag.core.config import QuantumRAGConfig
# From YAML
engine = Engine(config="./quantumrag.yaml")
# With overrides
config = QuantumRAGConfig.from_yaml("./quantumrag.yaml", language="en")
engine = Engine(config=config)
# Quick overrides
engine = Engine(embedding_model="text-embedding-3-large", data_dir="./my_data")from quantumrag import Engine
engine = Engine()
status = engine.status()
print(status)
# {'documents': 0, 'chunks': 0, 'config': {...}, 'data_dir': './quantumrag_data'}- Configuration Guide — Full config reference
- Architecture — How the engine works internally
- API Reference — HTTP API endpoints
- Korean Guide — Korean language optimization