Skip to content

feat(indexer): semantic search result cache with LRU eviction#48

Merged
Mathews-Tom merged 2 commits into
mainfrom
feat/search-cache
Mar 25, 2026
Merged

feat(indexer): semantic search result cache with LRU eviction#48
Mathews-Tom merged 2 commits into
mainfrom
feat/search-cache

Conversation

@Mathews-Tom

Copy link
Copy Markdown
Owner

Summary

Add an in-memory LRU search result cache with semantic similarity matching, inspired by memora-lab/memory-service-public's three-tier cache hierarchy with namespace separation and DB exhaustion detection. Repeated or semantically similar queries are served from cache without hitting ChromaDB, reducing latency in thinking sessions where the same topic generates multiple similar queries across turns.

How It Works

VaultStore.search(query)
  ├─ Compute query embedding
  ├─ Check cache:
  │    ├─ Exact key match? → return cached results
  │    ├─ Semantic match (cosine sim > 0.85)? → return cached results
  │    └─ Miss → proceed to ChromaDB
  ├─ ChromaDB query
  ├─ Store results in cache (with embedding + n_requested metadata)
  └─ Return results

Key Features

Semantic similarity matching: A cached result for "kubernetes deployment" satisfies a subsequent query for "kubernetes scaling" if cosine similarity > threshold (default 0.85). No additional embedding API calls — uses the query embedding already computed for ChromaDB.

DB exhaustion detection: If a search returned fewer results than requested (e.g., 3 results when 10 were asked), the cache knows the DB was exhausted. Future requests for fewer results (e.g., 5) are served from cache since there are only 3 total results anyway.

LRU eviction: OrderedDict-based LRU with configurable max entries (default 50). Access refreshes position. Oldest entries evicted at capacity.

Note-level invalidation: invalidate(note_path) removes all cache entries containing results from a modified note. Ready for event bus integration on NoteModifiedEvent.

Filtered queries bypass cache: Cache only operates on unfiltered queries (where=None). Metadata-filtered searches always hit ChromaDB to ensure filter accuracy.

Changes

New Files

  • src/vaultmind/indexer/search_cache.py (176 lines) — SearchResultCache class with get() (exact + semantic matching), put() (with n_requested tracking), invalidate(note_path), clear(), stats property. Private _CacheEntry dataclass stores query, embedding, results, and request/return counts. _cosine_similarity() helper for vector comparison
  • tests/test_search_cache.py (187 lines) — 18 tests across 7 classes

Modified Files

  • src/vaultmind/indexer/store.pyVaultStore.__init__() accepts optional search_cache parameter. search() checks cache before ChromaDB query (unfiltered only) and stores results after query
  • src/vaultmind/config.py — Added cache_enabled: bool = True, cache_max_entries: int = 50, cache_similarity_threshold: float = 0.85 to SearchConfig
  • config/default.toml — Added cache settings to [search] section

Backward Compatibility

  • search_cache defaults to None in VaultStore.__init__() — existing callers unaffected
  • When cache is None, search() behaves identically to before (no conditional overhead)
  • All existing store/ranking tests pass unchanged
  • Cache is opt-in: callers must construct SearchResultCache and pass it to VaultStore

Test plan

  • 18 new tests in test_search_cache.py across 7 classes:
    • Cosine similarity: identical, orthogonal, zero, near-identical vectors (4)
    • Cache get/put: exact match, miss, semantic match, dissimilar miss, result slicing (5)
    • LRU eviction: evicts oldest, access refreshes position (2)
    • Invalidation: removes matching entries, no-match keeps all (2)
    • DB exhaustion: exhausted DB serves smaller request, non-exhausted requires enough (2)
    • Clear and stats: clear empties, hit/miss tracking (2)
    • SearchConfig: cache defaults (1)
  • Full suite: 1014/1014 tests pass, 0 regressions
  • ruff check — clean
  • mypy --ignore-missing-imports — clean
  • Integration: construct cache at bot startup, pass to VaultStore, measure hit rate
  • Integration: subscribe to NoteModifiedEvent, call cache.invalidate(note_path)

New module indexer/search_cache.py with in-memory LRU cache that
matches queries by cosine similarity (threshold 0.85). Cached results
for similar queries satisfy subsequent requests without hitting
ChromaDB. Tracks DB exhaustion — if a search returned fewer results
than requested, the cached set IS complete.

Integrate into VaultStore.search() for unfiltered queries. Add
invalidate(note_path) for cache-busting on note modifications.

Add cache_enabled, cache_max_entries, cache_similarity_threshold
to SearchConfig.
18 tests across 7 classes: cosine similarity (4), cache get/put with
exact and semantic matching (5), LRU eviction and access refresh (2),
invalidation by note path (2), DB exhaustion detection (2), clear
and hit/miss stats (2), SearchConfig defaults (1).
@Mathews-Tom Mathews-Tom merged commit 0665db3 into main Mar 25, 2026
3 checks passed
@Mathews-Tom Mathews-Tom deleted the feat/search-cache branch March 25, 2026 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant