release: v0.4.2 — performance optimization documentation + debug tooling

unamedkr · claude · unamedkr · commit ec2d1b0c8dce · 2026-03-29T23:15:35.000+09:00
v0.4 series summary (29% → 75.2% Combined QA):

Retrieval precision:
- BM25 min-max score normalization (+36.7%p vs 1.0 cap)
- Document coherence boost (same-doc chunks get +5%/extra)
- Reranker score blending (0.7 reranker + 0.3 fusion signal)
- Full ingest mode with HyPE enabled (+9.5%p)

Generation quality:
- Citation mapping handles [Source N] format + range validation
- Sentence-boundary-aware context truncation (Korean + English)
- Finance metric cross-verification in fact_verifier

Engine optimization:
- Adaptive post-correction time budget (80s total target)
- Query deadline gate (70s) skips expensive late-stage steps
- Auto-skip correction for simple confident queries
- Sub-query cap (3→2) to reduce parallel retrieval cost

Playground:
- Pipeline trace visualization with Retrieve/Generate/Other breakdown
- Source excerpts visible by default with document titles
- Code block rendering fix (placeholder-based extraction)
- Query options panel (top_k, rerank, trace, stream toggles)
- `quantumrag demo` command for instant one-line experience

Infrastructure:
- `serve` auto-detects quantumrag.yaml in current directory
- `from_yaml()` loads .env for API keys
- Server startup prints provider/model/embedding info
- FAISS upsert stale reference bug fix (850 tests, 0 failures)

Optimization lessons documented in CLAUDE.md:
- Fusion weight tuning exhausted (4 attempts, all negative)
- Current 40/35/25 weights are near-optimal
- Next breakthrough requires embedding model change or noise reduction

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -44,11 +44,15 @@ Index-Heavy, Query-Light RAG 엔진. Python 3.10+, Apache 2.0.
 ### 현재 성능 현황
 - **개별 QA** (4 datasets, 105 questions): 77~100% pass rate → 전체 graduated
 - **Combined QA** (73 sources + 50 noise, 436 chunks): **75% pass rate** (full mode), timeout 2건, 30초 avg
-- **개선 이력**: 29% → 65% → **75%** (BM25 정규화 + coherence boost + reranker 블렌딩 + HyPE)
+- **개선 이력**: 29% → 65% → **75%** (6회 측정-개선 루프)
 - **남은 실패**: 26건 — retrieval FAIL 23건, timeout 2건, generation FAIL 1건
-- **Ceiling 분석**: fusion 가중치 튜닝은 소진됨. 다음 돌파구는 embedding 모델 교체 또는 노이즈 축소
 - **기본 LLM**: gemini-3.1-flash-lite-preview (무료 티어, 비용 효율적)
 
+### 성능 최적화 교훈 (검증 완료)
+- **효과 있음**: BM25 min-max 정규화(+36.7%p), Document Coherence Boost, Reranker 블렌딩(0.7/0.3), Full ingest HyPE(+9.5%p)
+- **효과 없음 (재시도 금지)**: fusion 가중치 튜닝(4회 모두 악화), dictionary expansion(-5%p), timeout 최적화(0%p), query classifier 변경(-2%p)
+- **Ceiling 분석**: 현재 가중치(40/35/25)가 최적점. 다음 돌파구는 embedding 모델 교체 또는 노이즈 축소
+
 ## 주요 파일 위치
 - 엔진 진입점: `quantumrag/core/engine.py`
 - RAG 설정: `quantumrag/core/config.py`
diff --git a/datasets/debug_retrieval.py b/datasets/debug_retrieval.py
@@ -33,7 +33,11 @@
 
 
 async def debug_query(query: str, data_dir: str) -> None:
-    cfg = QuantumRAGConfig.auto(storage={"data_dir": data_dir})
+    cfg = QuantumRAGConfig.default(storage={"data_dir": data_dir})
+    # Match Combined QA runner: use local embeddings (1024d)
+    cfg.models.embedding.provider = "local"
+    cfg.models.embedding.model = "BAAI/bge-m3"
+    cfg.models.embedding.dimensions = 1024
     engine = Engine(config=cfg)
     engine._ensure_initialized()
 
diff --git a/pyproject.toml b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 
 [project]
 name = "quantumrag"
-version = "0.4.1"
+version = "0.4.2"
 description = "Index-Heavy, Query-Light RAG Engine — Put in docs, ask questions, it just works."
 readme = "README.md"
 license = "Apache-2.0"
diff --git a/quantumrag/_version.py b/quantumrag/_version.py
@@ -1 +1 @@
-__version__ = "0.4.1"
+__version__ = "0.4.2"

Original file line number	Diff line number	Diff line change
`@@ -1 +1 @@`
`1`		`-__version__ = "0.4.1"`
	`1`	`+__version__ = "0.4.2"`