feat(indexer): semantic search result cache with LRU eviction by Mathews-Tom · Pull Request #48 · Mathews-Tom/VaultMind

Mathews-Tom · 2026-03-25T18:56:57Z

Summary

Add an in-memory LRU search result cache with semantic similarity matching, inspired by memora-lab/memory-service-public's three-tier cache hierarchy with namespace separation and DB exhaustion detection. Repeated or semantically similar queries are served from cache without hitting ChromaDB, reducing latency in thinking sessions where the same topic generates multiple similar queries across turns.

How It Works

VaultStore.search(query)
  ├─ Compute query embedding
  ├─ Check cache:
  │    ├─ Exact key match? → return cached results
  │    ├─ Semantic match (cosine sim > 0.85)? → return cached results
  │    └─ Miss → proceed to ChromaDB
  ├─ ChromaDB query
  ├─ Store results in cache (with embedding + n_requested metadata)
  └─ Return results

Key Features

Semantic similarity matching: A cached result for "kubernetes deployment" satisfies a subsequent query for "kubernetes scaling" if cosine similarity > threshold (default 0.85). No additional embedding API calls — uses the query embedding already computed for ChromaDB.

DB exhaustion detection: If a search returned fewer results than requested (e.g., 3 results when 10 were asked), the cache knows the DB was exhausted. Future requests for fewer results (e.g., 5) are served from cache since there are only 3 total results anyway.

LRU eviction: OrderedDict-based LRU with configurable max entries (default 50). Access refreshes position. Oldest entries evicted at capacity.

Note-level invalidation: invalidate(note_path) removes all cache entries containing results from a modified note. Ready for event bus integration on NoteModifiedEvent.

Filtered queries bypass cache: Cache only operates on unfiltered queries (where=None). Metadata-filtered searches always hit ChromaDB to ensure filter accuracy.

Changes

New Files

src/vaultmind/indexer/search_cache.py (176 lines) — SearchResultCache class with get() (exact + semantic matching), put() (with n_requested tracking), invalidate(note_path), clear(), stats property. Private _CacheEntry dataclass stores query, embedding, results, and request/return counts. _cosine_similarity() helper for vector comparison
tests/test_search_cache.py (187 lines) — 18 tests across 7 classes

Modified Files

src/vaultmind/indexer/store.py — VaultStore.__init__() accepts optional search_cache parameter. search() checks cache before ChromaDB query (unfiltered only) and stores results after query
src/vaultmind/config.py — Added cache_enabled: bool = True, cache_max_entries: int = 50, cache_similarity_threshold: float = 0.85 to SearchConfig
config/default.toml — Added cache settings to [search] section

Backward Compatibility

search_cache defaults to None in VaultStore.__init__() — existing callers unaffected
When cache is None, search() behaves identically to before (no conditional overhead)
All existing store/ranking tests pass unchanged
Cache is opt-in: callers must construct SearchResultCache and pass it to VaultStore

Test plan

18 new tests in test_search_cache.py across 7 classes:
- Cosine similarity: identical, orthogonal, zero, near-identical vectors (4)
- Cache get/put: exact match, miss, semantic match, dissimilar miss, result slicing (5)
- LRU eviction: evicts oldest, access refreshes position (2)
- Invalidation: removes matching entries, no-match keeps all (2)
- DB exhaustion: exhausted DB serves smaller request, non-exhausted requires enough (2)
- Clear and stats: clear empties, hit/miss tracking (2)
- SearchConfig: cache defaults (1)
Full suite: 1014/1014 tests pass, 0 regressions
ruff check — clean
mypy --ignore-missing-imports — clean
Integration: construct cache at bot startup, pass to VaultStore, measure hit rate
Integration: subscribe to NoteModifiedEvent, call cache.invalidate(note_path)

New module indexer/search_cache.py with in-memory LRU cache that matches queries by cosine similarity (threshold 0.85). Cached results for similar queries satisfy subsequent requests without hitting ChromaDB. Tracks DB exhaustion — if a search returned fewer results than requested, the cached set IS complete. Integrate into VaultStore.search() for unfiltered queries. Add invalidate(note_path) for cache-busting on note modifications. Add cache_enabled, cache_max_entries, cache_similarity_threshold to SearchConfig.

18 tests across 7 classes: cosine similarity (4), cache get/put with exact and semantic matching (5), LRU eviction and access refresh (2), invalidation by note path (2), DB exhaustion detection (2), clear and hit/miss stats (2), SearchConfig defaults (1).

Mathews-Tom added 2 commits March 26, 2026 00:19

Mathews-Tom merged commit 0665db3 into main Mar 25, 2026
3 checks passed

Mathews-Tom deleted the feat/search-cache branch March 25, 2026 18:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(indexer): semantic search result cache with LRU eviction#48

feat(indexer): semantic search result cache with LRU eviction#48
Mathews-Tom merged 2 commits into
mainfrom
feat/search-cache

Mathews-Tom commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mathews-Tom commented Mar 25, 2026

Summary

How It Works

Key Features

Changes

New Files

Modified Files

Backward Compatibility

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant