Skip to content

Commit 49aba41

Browse files
SonAIengineclaude
andcommitted
fix: FTS 하이브리드 스코링 개선 — full query match + coverage 보너스 + 적응형 weight
- full query in title 보너스 복원 (len(terms) × 3.0) - query term coverage 보너스: 80%+ 매칭 시 추가 가중치 - BM25 weight curve 조정: N≤500 → 10% BM25, N=5000+ → 80% BM25 - 소규모 corpus에서 substring 매칭 비중을 높여 안정성 확보 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 1eb084e commit 49aba41

File tree

1 file changed

+17
-2
lines changed

1 file changed

+17
-2
lines changed

src/synaptic/backends/memory.py

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -152,6 +152,11 @@ async def search_fts(self, query: str, *, limit: int = 20) -> list[Node]:
152152

153153
bm25_score = 0.0
154154
substr_score = 0.0
155+
matched_terms = 0 # query term coverage 계산용
156+
157+
# Full query in title — 강한 신호 (모든 corpus 크기에서 유효)
158+
if query_lower in title_lower:
159+
substr_score += len(terms) * 3.0
155160

156161
for t in terms:
157162
tf_content = content_lower.count(t)
@@ -177,6 +182,7 @@ async def search_fts(self, query: str, *, limit: int = 20) -> list[Node]:
177182
substr_score += 2.0
178183
if tf_content > 0:
179184
substr_score += 1.0
185+
matched_terms += 1
180186

181187
# Bigram bonus
182188
for bg in bigrams:
@@ -199,9 +205,18 @@ async def search_fts(self, query: str, *, limit: int = 20) -> list[Node]:
199205
if t in search_kw:
200206
substr_score += 1.5
201207

208+
# Query term coverage bonus — 쿼리 단어 대부분 매칭 시 보너스
209+
# coverage 80%+ → 보너스, 대규모 corpus에서 precision 향상
210+
if len(terms) >= 2 and matched_terms > 0:
211+
coverage = matched_terms / len(terms)
212+
if coverage >= 0.8:
213+
substr_score += len(terms) * 1.5 # 높은 coverage 보상
214+
elif coverage >= 0.5:
215+
substr_score += len(terms) * 0.5
216+
202217
# Hybrid: BM25 weight increases with corpus size
203-
# N=100: 30% BM25 + 70% substr, N=1000+: 80% BM25 + 20% substr
204-
bm25_weight = min(0.8, 0.3 + 0.5 * min(1.0, N / 1000))
218+
# N≤500: mostly substring, N=5000+: mostly BM25
219+
bm25_weight = min(0.8, max(0.1, (N - 500) / 5000))
205220
score = bm25_score * bm25_weight + substr_score * (1 - bm25_weight)
206221

207222
if score > 0:

0 commit comments

Comments
 (0)