Commit 309af4e
Use AND strategy with lowercase embeddings for thought dedup (jaredlockhart#884)
OR strategy produced false positives from common short words ("2026",
"AI", "agent") matching via TCR on short titles. Switched to AND
(both TCR >= 0.6 AND embedding >= 0.6 required) which eliminates
all false positives while catching real duplicates.
Also lowercase titles before embedding so casing doesn't affect
similarity (e.g., "THE GHOST IN THE SHELL" vs "Ghost in the Shell"
was 0.381, now 0.652 after lowercasing).
Lowered THOUGHT_DEDUP_EMBEDDING_THRESHOLD default from 0.80 to 0.60
since title embeddings score lower than full-content embeddings.
Co-authored-by: Jared Lockhart <119884+jaredlockhart@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 0e9c98f commit 309af4e
2 files changed
Lines changed: 5 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
186 | 186 | | |
187 | 187 | | |
188 | 188 | | |
189 | | - | |
| 189 | + | |
190 | 190 | | |
191 | 191 | | |
192 | 192 | | |
| |||
273 | 273 | | |
274 | 274 | | |
275 | 275 | | |
276 | | - | |
| 276 | + | |
277 | 277 | | |
278 | 278 | | |
279 | 279 | | |
| |||
284 | 284 | | |
285 | 285 | | |
286 | 286 | | |
287 | | - | |
| 287 | + | |
288 | 288 | | |
289 | 289 | | |
290 | 290 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
232 | 232 | | |
233 | 233 | | |
234 | 234 | | |
235 | | - | |
| 235 | + | |
236 | 236 | | |
237 | | - | |
| 237 | + | |
238 | 238 | | |
239 | 239 | | |
240 | 240 | | |
| |||
0 commit comments