Skip to content

refactor(memory): append-only writes + retrieval-driven consolidation (fixes #34)#41

Draft
DeerGoat wants to merge 1 commit into
mainfrom
refactor/memory-architecture-append-only
Draft

refactor(memory): append-only writes + retrieval-driven consolidation (fixes #34)#41
DeerGoat wants to merge 1 commit into
mainfrom
refactor/memory-architecture-append-only

Conversation

@DeerGoat
Copy link
Copy Markdown
Collaborator

Summary

Reworks Suzent's memory write/dedup path. Closes #34 and supersedes #36 (which patches the symptom).

This is a design PR for discussion. The branch currently carries only a planning commit — no code yet. Please comment inline on the plan below; once we converge I'll implement it file-by-file.


1. Root cause of #34

MemoryManager._deduplicate_and_store_facts uses a fixed cosine-similarity threshold (0.85) to decide whether a newly extracted fact is a duplicate of an existing memory. This is a category error: cosine similarity measures topical proximity, not factual identity.

No threshold value fixes this, because the metric cannot distinguish "same fact, different phrasing" from "same topic, different fact." That distinction requires language understanding, not geometry.

There is also an architectural conflict: _deduplicate_and_store_facts writes facts directly to LanceDB, while CoreMemoryFileIndexer independently re-syncs markdown → LanceDB on a 300s timer. Both write the same table, racing each other. The markdown store's own docstring says "LanceDB serves as the search index over this markdown content" — but the code does the opposite.

2. Why every threshold-style fix is a bandaid

Approach Why it fails
Lower / raise the threshold Still geometry; trades false-drops for false-dups. Value is embedding-model- and fact-type-dependent.
Two thresholds Two wrong numbers instead of one.
LLM-judge after a cosine pre-filter The pre-filter gates what the LLM sees; if cosine misses the related memory, the LLM never gets to resolve it.
Update-on-near-duplicate (#36) The "same fact" decision is still made by cosine.

We also rejected, for the same reason, two later attempts to reintroduce rigid structure: a per-source importance scalar (0.3/0.7/0.75) and a category partition (consolidated/{category}.md). All three — threshold, scalar, partition — impose discrete/numeric structure on semantically fuzzy data. A fact can belong to two "categories" ("deploy to AWS on Fridays" is preference + technical + scheduling); a single bucket can't represent that.

3. The principle this PR adopts

Cosine retrieves candidates (its correct use); the LLM makes the decision. Markdown is the source of truth; LanceDB is a derived index. The raw stream is immutable. No threshold, no importance scalar, no fixed partition — the only numbers are operational (hours, counts, size caps).

This is validated against three reference systems we studied:

  • Claude Code (autoDream): a forked LLM agent periodically rewrites memory files, resolving contradictions by editing. Gated by time AND volume AND lock, not a blind timer.
  • OpenClaw (dreaming): durability is driven by recall frequency (usage), not a write-time importance score; unused memory decays.
  • Hermes (background_review/curator): never auto-deletes — only archives (recoverable); consolidation is continuous because the agent owns explicit fact IDs.

4. Architecture: three tiers

TIER          FILE                       ROLE                          WHERE SEEN
1 stream      archive/YYYY-MM-DD.md      append-only daily log         on disk; recent ones
              (IMMUTABLE)                = source of truth             indexed for search
2 durable     consolidated/memory.md     deduped truth, atomic         search index (on demand)
              (LLM-maintained)           entries; history preserved
3 always-on   MEMORY.md                  few highest-value, most-      always in the prompt
              (LLM-rewritten, capped)    recalled facts                (core "facts" block)

This maps onto Suzent's existing get_core_memory() (which already injects MEMORY.md as the always-visible block) and mirrors Claude Code's logs/ → topic files → MEMORY.md layout. persona.md / user.md are untouched.

Consolidated entry format — code owns the metadata, the LLM owns the content text:

<!--m {"first_seen":"2026-05-27","updated":"2026-05-30","sources":["2026-05-27","2026-05-29"]}-->
Currently works at Microsoft (since 2026-05). Previously worked at Google.

5. The two procedures

Per-turn (fast, every message)

extract facts (LLM) → append to archive/today.md → reindex_file_now(today.md)

No similarity check, no dedup, no manager→LanceDB write. Today's log is > watermark, so it's searchable immediately. This alone fixes #34.

Consolidation (gated; background loop + POST /memory/consolidate)

Gate (Claude Code style): not lock_held AND hours_since_last ≥ min_hours AND new_facts_since_watermark ≥ min_facts.

entries   = parse(consolidated/memory.md)
new_facts = parse(archive logs where date > watermark)  minus tombstoned
embed(entries) and embed(new_facts)                       # reused for reindex

for f in new_facts:                                       # cosine = CANDIDATE retrieval only
    f.neighbors = top_k(cosine(f, entries), k)
clusters = connected_components(new_facts, edge = shared neighbor)  # each entry touched ≤ once

for cluster in clusters:                                  # LLM = the DECISION
    ops = LLM(CONSOLIDATION_PROMPT, neighbors, candidates)
    apply(ops)         # ADD / REPLACE(target) / REMOVE(target); default = keep; dup candidates dropped

write consolidated/memory.md                              # provenance set in code, not by LLM
MEMORY.md = LLM(PROMOTION_PROMPT, entries, recall_summary, max_lines)   # recall-driven promotion
advance watermark; reindex (drop archives ≤ watermark, reindex consolidated + MEMORY.md)

Op semantics (default = keep every neighbor; the only way to lose a consolidated fact is an explicit REMOVE, restricted to genuine duplicates/merges):

  • ADD — genuinely new fact.
  • REPLACE(target)correction (old content discarded) or state change over time (content becomes "now X; previously Y" → history preserved).
  • REMOVE(target) — two neighbors merged into one.

6. History preservation (the most-scrutinized question)

"Moved from Google to Microsoft" is a timeline, not a contradiction — both were true. Four independent safety layers ensure nothing is lost:

  1. Raw logs are immutable — the original fact lives on disk forever.
  2. State changes are kept as historyREPLACE writes "previously X", never a blind overwrite.
  3. Demotion ≠ deletion — a fact dropped from always-visible MEMORY.md stays in consolidated/memory.md (still searchable).
  4. Full rebuild — clearing the watermark and re-consolidating reconstructs everything from raw logs.

7. Usage-driven promotion (OpenClaw), no scoring weights

Suzent already stamps access_count/accessed_at on every retrieval (_record_memory_accesses). We add an append-only .recall_log.jsonl (one line per retrieved fact). Consolidation hands the recall summary to the promotion LLM as evidence for what belongs in always-visible MEMORY.md. No weighted formula, no tuned thresholds — the LLM interprets the signal.

8. Deletion correctness (tombstones)

Raw logs are immutable, so a user delete can't edit them — and today delete_archival_memory deletes from LanceDB only, so reindex resurrects the memory. Fix: delete removes the entry from consolidated/memory.md and appends to .tombstones.jsonl; consolidation skips tombstoned facts. Honors both immutability and deletion.

9. Ranking change

Importance is removed as a lever (every indexed chunk gets a constant 0.5), so hybrid_search ranks on relevance + recency — like OpenClaw/Claude Code. Category is not a retrieval filter (it never was).

10. File-by-file changes

Delete (no legacy fallback): _deduplicate_and_store_facts, _add_memory_internal, process_message_for_memories, _extract_facts_simple, refresh_core_memory_facts; MarkdownIndexer + its dead regex; constants DEDUPLICATION_* / DEFAULT_IMPORTANCE; per-source importance values; MemoryExtractionResult.memories_created/updated.

  • memory/manager.py — append-only process_conversation_turn_for_memories; shared _core_indexer + _consolidation_lock; _log_recalls; consolidate_memories(force), _consolidation_gate_open, neighbor/cluster/apply helpers, state read/write.
  • memory/indexer.py — drop MarkdownIndexer; CoreMemoryFileIndexer gets a lock, watermark-aware archive handling (index >W, drop ≤W), per-entry indexing of consolidated/memory.md, constant importance, reindex_file_now, clear_and_full_reindex.
  • memory/markdown_store.pyconsolidated/ dir + entry read/write/parse; recall-log + tombstone helpers.
  • memory/memory_context.pyCONSOLIDATION_PROMPT, PROMOTION_PROMPT.
  • memory/models.py — slim MemoryExtractionResult; structured ConsolidationResponse schema.
  • memory/lifecycle.py — share _core_indexer; add _consolidation_loop; start/stop it.
  • routes/session_routes.pyreindex_memories delegates to CoreMemoryFileIndexer.
  • routes/memory_routes.py — add POST /memory/consolidate; fix delete_archival_memory (truth + tombstone).
  • core/context_compressor.py — drop the created_count log line.

11. Config knobs (all operational, none decide fact identity)

memory_consolidation_enabled: bool = True
memory_consolidation_min_hours: float = 24.0
memory_consolidation_min_facts: int = 20
memory_consolidation_interval_seconds: int = 1800
memory_consolidation_memory_max_lines: int = 200
memory_consolidation_neighbor_k: int = 8
memory_consolidation_cluster_max: int = 25
memory_consolidation_model: Optional[str] = None

12. Migration

One-time POST /memory/reindex {"clear_existing": true}clear_and_full_reindex (wipe + rebuild from files). Seed .consolidation_state.json with a watermark_date before the oldest log so the first run folds full history once. Nothing at risk — raw logs are truth.

13. Test plan

  1. [FEAT] Memory deduplication: fixed cosine threshold causes silent data loss #34 regression: "Google" then "Microsoft" → both retrievable pre-consolidation; after → "currently Microsoft; previously Google".
  2. Duplicate phrasings → one entry. 3. Correction ("Jon"→"John") → no stale value. 4. Cross-topic fact → no forced bucket. 5. Gate respects time/volume; force overrides. 6. Cluster failure → watermark unchanged, retried. 7. Deleted fact does not reappear after reindex. 8. clear+reindex reproduces the index from files.

14. Open questions for reviewers

  1. Single consolidated/memory.md vs. sharding once entries reach thousands (rewrite cost)?
  2. All-or-nothing run vs. per-date watermark — how to stop one bad cluster blocking progress?
  3. Neighbor k=8 / cluster cap 25 defaults reasonable for personal-assistant scale?
  4. Recall-log retention: truncate each run vs. rolling N-day window?

Closes #34

🤖 Generated with Claude Code

…onsolidation architecture

This is a planning commit for discussion. No code changes yet.

Closes #34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEAT] Memory deduplication: fixed cosine threshold causes silent data loss

1 participant