feat: cross-session memory persistence with hybrid search (closes #133)#151
Merged
Merged
Conversation
Wire up the 3-tier memory system so conversation context survives across REPL sessions: - Tier 2: ConversationEngine._maybe_compact() now calls compact_and_save() when a summaries directory is configured, writing a Markdown summary to disk and auto-indexing it into Tier 3. - Tier 3: MemorySearcher gains an optional embedding_fn callable. When provided, summaries are embedded and stored in a new session_embeddings table; search() fuses BM25 keyword hits with cosine-similarity vector hits via reciprocal rank fusion. Keyword-only and vector-only search paths still work on their own. - Cross-session loading: qracer repl instantiates a file-backed MemorySearcher at ~/.qracer/memory_index.duckdb, indexes every Markdown file in ~/.qracer/summaries/ on startup, and reports how many past contexts were loaded. Side improvements driven by the tests: - FTS extension loading is now lazy (deferred to first keyword search) and cached per process, so pure-vector and offline workflows aren't blocked by a missing fts extension. - _keyword_search() degrades gracefully to an empty result set when the FTS extension can't be loaded, allowing vector-only hybrid search to proceed. Updated docs/memory-system.md to reflect the new wiring.
63b581b to
343b9bd
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #133.
Wires up the 3-tier memory system so conversation context survives across REPL sessions.
Tier 2 — disk persistence (
qracer/conversation/engine.py)ConversationEnginenow takes an optionalsummaries_dirkwarg._maybe_compact()callsSessionCompactor.compact_and_save()instead of the in-memorycompact(), writing~/.qracer/summaries/<session_id>.mdafter the turn log crosses the 8 000-token threshold.memory_searcheris also set, the summary is auto-indexed into Tier 3 viaMemorySearcher.index_summary(session_id, summary)so the very next session can find it.summaries_dir, behaviour is unchanged (callscompact()).Tier 3 — hybrid search (
qracer/memory/memory_searcher.py)embedding_fn: Callable[[str], list[float]] | Noneconstructor parameter. Leave itNonefor keyword-only search (existing behaviour); pass any callable (Claude API,sentence-transformers, a stub, …) to get hybrid search.session_embeddings (session_id VARCHAR PK, embedding FLOAT[], indexed_at TIMESTAMP)table stores vectors alongside the existing FTSsession_index._vector_search()runs cosine similarity via DuckDB's nativelist_cosine_similarity, joining back tosession_indexfor the summary text._merge_results()fuses keyword + vector hits with reciprocal rank fusion (k=60), so BM25 and cosine scores combine without normalisation.search()stays backward compatible: falls back to keyword-only when noembedding_fnis configured._keyword_search()degrades to an empty result set (with a warning) when FTS can't be loaded — vector-only hybrid search still proceeds.Cross-session loading (
qracer/cli.py)qracer replnow creates~/.qracer/summaries/, instantiates a file-backedMemorySearcherat~/.qracer/memory_index.duckdb, indexes every Markdown summary undersummaries/, and prints✓ Loaded N past session summaries from …so returning users immediately see how much prior memory is in scope.memory_searcherandsummaries_dirare passed into theConversationEngine, closing the loop.Tests
tests/memory/test_memory_searcher.py— newTestHybridSearchclass coveringhas_embeddingsflag, embedding row storage, semantic vector search, hybridsearch(),_merge_results()RRF ordering, removal from both tables, empty vector branch when noembedding_fn, and graceful handling when the embedding function raises.tests/conversation/test_engine.py— newTestCompactionPersistenceclass with two tests: (1)_maybe_compact()writes the Markdown file and auto-indexes the row whensummaries_dir+memory_searcherare set; (2) withoutsummaries_dir, it still calls the in-memorycompact()path for backward compatibility.tests/conversation/test_topic_resolver.py— updated the existing FTS-availability skip pattern to pre-warm FTS now thatMemorySearcher()construction no longer triggers it.docs/memory-system.md— removed the 구현 예정 markers and described the new wiring.Results
uv run pytest→ 636 passed, 13 skipped (skipped tests require DuckDB'sftsextension, which this sandbox can't download; they run on CI with internet access).uv run ruff check .→ clean.uv run ruff format --check→ clean.uv run pyright→ 0 errors.How to test
End-to-end smoke test: