feat(indexer): semantic search result cache with LRU eviction#48
Merged
Conversation
New module indexer/search_cache.py with in-memory LRU cache that matches queries by cosine similarity (threshold 0.85). Cached results for similar queries satisfy subsequent requests without hitting ChromaDB. Tracks DB exhaustion — if a search returned fewer results than requested, the cached set IS complete. Integrate into VaultStore.search() for unfiltered queries. Add invalidate(note_path) for cache-busting on note modifications. Add cache_enabled, cache_max_entries, cache_similarity_threshold to SearchConfig.
18 tests across 7 classes: cosine similarity (4), cache get/put with exact and semantic matching (5), LRU eviction and access refresh (2), invalidation by note path (2), DB exhaustion detection (2), clear and hit/miss stats (2), SearchConfig defaults (1).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add an in-memory LRU search result cache with semantic similarity matching, inspired by memora-lab/memory-service-public's three-tier cache hierarchy with namespace separation and DB exhaustion detection. Repeated or semantically similar queries are served from cache without hitting ChromaDB, reducing latency in thinking sessions where the same topic generates multiple similar queries across turns.
How It Works
Key Features
Semantic similarity matching: A cached result for "kubernetes deployment" satisfies a subsequent query for "kubernetes scaling" if cosine similarity > threshold (default 0.85). No additional embedding API calls — uses the query embedding already computed for ChromaDB.
DB exhaustion detection: If a search returned fewer results than requested (e.g., 3 results when 10 were asked), the cache knows the DB was exhausted. Future requests for fewer results (e.g., 5) are served from cache since there are only 3 total results anyway.
LRU eviction: OrderedDict-based LRU with configurable max entries (default 50). Access refreshes position. Oldest entries evicted at capacity.
Note-level invalidation:
invalidate(note_path)removes all cache entries containing results from a modified note. Ready for event bus integration onNoteModifiedEvent.Filtered queries bypass cache: Cache only operates on unfiltered queries (
where=None). Metadata-filtered searches always hit ChromaDB to ensure filter accuracy.Changes
New Files
src/vaultmind/indexer/search_cache.py(176 lines) —SearchResultCacheclass withget()(exact + semantic matching),put()(with n_requested tracking),invalidate(note_path),clear(),statsproperty. Private_CacheEntrydataclass stores query, embedding, results, and request/return counts._cosine_similarity()helper for vector comparisontests/test_search_cache.py(187 lines) — 18 tests across 7 classesModified Files
src/vaultmind/indexer/store.py—VaultStore.__init__()accepts optionalsearch_cacheparameter.search()checks cache before ChromaDB query (unfiltered only) and stores results after querysrc/vaultmind/config.py— Addedcache_enabled: bool = True,cache_max_entries: int = 50,cache_similarity_threshold: float = 0.85toSearchConfigconfig/default.toml— Added cache settings to[search]sectionBackward Compatibility
search_cachedefaults toNoneinVaultStore.__init__()— existing callers unaffectedsearch()behaves identically to before (no conditional overhead)SearchResultCacheand pass it toVaultStoreTest plan
test_search_cache.pyacross 7 classes:ruff check— cleanmypy --ignore-missing-imports— clean