refactor(memory): append-only writes + retrieval-driven consolidation (fixes #34)#41
Draft
DeerGoat wants to merge 1 commit into
Draft
refactor(memory): append-only writes + retrieval-driven consolidation (fixes #34)#41DeerGoat wants to merge 1 commit into
DeerGoat wants to merge 1 commit into
Conversation
…onsolidation architecture This is a planning commit for discussion. No code changes yet. Closes #34
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Reworks Suzent's memory write/dedup path. Closes #34 and supersedes #36 (which patches the symptom).
This is a design PR for discussion. The branch currently carries only a planning commit — no code yet. Please comment inline on the plan below; once we converge I'll implement it file-by-file.
1. Root cause of #34
MemoryManager._deduplicate_and_store_factsuses a fixed cosine-similarity threshold (0.85) to decide whether a newly extracted fact is a duplicate of an existing memory. This is a category error: cosine similarity measures topical proximity, not factual identity."I work at Google"vs"I work at Microsoft"→ cosine ≈ 0.92 → dropped as a duplicate (a real update, silently lost — this is [FEAT] Memory deduplication: fixed cosine threshold causes silent data loss #34)"I enjoy hiking"vs"I love spending time outdoors"→ cosine ≈ 0.78 → stored as two facts (a real duplicate, kept)No threshold value fixes this, because the metric cannot distinguish "same fact, different phrasing" from "same topic, different fact." That distinction requires language understanding, not geometry.
There is also an architectural conflict:
_deduplicate_and_store_factswrites facts directly to LanceDB, whileCoreMemoryFileIndexerindependently re-syncs markdown → LanceDB on a 300s timer. Both write the same table, racing each other. The markdown store's own docstring says "LanceDB serves as the search index over this markdown content" — but the code does the opposite.2. Why every threshold-style fix is a bandaid
We also rejected, for the same reason, two later attempts to reintroduce rigid structure: a per-source importance scalar (
0.3/0.7/0.75) and a category partition (consolidated/{category}.md). All three — threshold, scalar, partition — impose discrete/numeric structure on semantically fuzzy data. A fact can belong to two "categories" ("deploy to AWS on Fridays" is preference + technical + scheduling); a single bucket can't represent that.3. The principle this PR adopts
This is validated against three reference systems we studied:
autoDream): a forked LLM agent periodically rewrites memory files, resolving contradictions by editing. Gated by time AND volume AND lock, not a blind timer.dreaming): durability is driven by recall frequency (usage), not a write-time importance score; unused memory decays.background_review/curator): never auto-deletes — only archives (recoverable); consolidation is continuous because the agent owns explicit fact IDs.4. Architecture: three tiers
This maps onto Suzent's existing
get_core_memory()(which already injectsMEMORY.mdas the always-visible block) and mirrors Claude Code'slogs/→ topic files →MEMORY.mdlayout.persona.md/user.mdare untouched.Consolidated entry format — code owns the metadata, the LLM owns the content text:
5. The two procedures
Per-turn (fast, every message)
No similarity check, no dedup, no manager→LanceDB write. Today's log is
> watermark, so it's searchable immediately. This alone fixes #34.Consolidation (gated; background loop + POST /memory/consolidate)
Gate (Claude Code style):
not lock_held AND hours_since_last ≥ min_hours AND new_facts_since_watermark ≥ min_facts.Op semantics (default = keep every neighbor; the only way to lose a consolidated fact is an explicit
REMOVE, restricted to genuine duplicates/merges):6. History preservation (the most-scrutinized question)
"Moved from Google to Microsoft" is a timeline, not a contradiction — both were true. Four independent safety layers ensure nothing is lost:
REPLACEwrites "previously X", never a blind overwrite.MEMORY.mdstays inconsolidated/memory.md(still searchable).7. Usage-driven promotion (OpenClaw), no scoring weights
Suzent already stamps
access_count/accessed_aton every retrieval (_record_memory_accesses). We add an append-only.recall_log.jsonl(one line per retrieved fact). Consolidation hands the recall summary to the promotion LLM as evidence for what belongs in always-visibleMEMORY.md. No weighted formula, no tuned thresholds — the LLM interprets the signal.8. Deletion correctness (tombstones)
Raw logs are immutable, so a user delete can't edit them — and today
delete_archival_memorydeletes from LanceDB only, so reindex resurrects the memory. Fix: delete removes the entry fromconsolidated/memory.mdand appends to.tombstones.jsonl; consolidation skips tombstoned facts. Honors both immutability and deletion.9. Ranking change
Importance is removed as a lever (every indexed chunk gets a constant
0.5), sohybrid_searchranks on relevance + recency — like OpenClaw/Claude Code. Category is not a retrieval filter (it never was).10. File-by-file changes
Delete (no legacy fallback):
_deduplicate_and_store_facts,_add_memory_internal,process_message_for_memories,_extract_facts_simple,refresh_core_memory_facts;MarkdownIndexer+ its dead regex; constantsDEDUPLICATION_*/DEFAULT_IMPORTANCE; per-source importance values;MemoryExtractionResult.memories_created/updated.memory/manager.py— append-onlyprocess_conversation_turn_for_memories; shared_core_indexer+_consolidation_lock;_log_recalls;consolidate_memories(force),_consolidation_gate_open, neighbor/cluster/apply helpers, state read/write.memory/indexer.py— dropMarkdownIndexer;CoreMemoryFileIndexergets a lock, watermark-aware archive handling (index>W, drop≤W), per-entry indexing ofconsolidated/memory.md, constant importance,reindex_file_now,clear_and_full_reindex.memory/markdown_store.py—consolidated/dir + entry read/write/parse; recall-log + tombstone helpers.memory/memory_context.py—CONSOLIDATION_PROMPT,PROMOTION_PROMPT.memory/models.py— slimMemoryExtractionResult; structuredConsolidationResponseschema.memory/lifecycle.py— share_core_indexer; add_consolidation_loop; start/stop it.routes/session_routes.py—reindex_memoriesdelegates toCoreMemoryFileIndexer.routes/memory_routes.py— addPOST /memory/consolidate; fixdelete_archival_memory(truth + tombstone).core/context_compressor.py— drop thecreated_countlog line.11. Config knobs (all operational, none decide fact identity)
12. Migration
One-time
POST /memory/reindex {"clear_existing": true}→clear_and_full_reindex(wipe + rebuild from files). Seed.consolidation_state.jsonwith awatermark_datebefore the oldest log so the first run folds full history once. Nothing at risk — raw logs are truth.13. Test plan
forceoverrides. 6. Cluster failure → watermark unchanged, retried. 7. Deleted fact does not reappear after reindex. 8. clear+reindex reproduces the index from files.14. Open questions for reviewers
consolidated/memory.mdvs. sharding once entries reach thousands (rewrite cost)?k=8/ cluster cap25defaults reasonable for personal-assistant scale?Closes #34
🤖 Generated with Claude Code