refactor(memory): append-only writes + retrieval-driven consolidation (fixes #34) by DeerGoat · Pull Request #41 · cyzus/suzent

DeerGoat · 2026-05-29T17:03:17Z

Summary

Reworks Suzent's memory write/dedup path. Closes #34 and supersedes #36 (which patches the symptom).

This is a design PR for discussion. The branch currently carries only a planning commit — no code yet. Please comment inline on the plan below; once we converge I'll implement it file-by-file.

1. Root cause of #34

MemoryManager._deduplicate_and_store_facts uses a fixed cosine-similarity threshold (0.85) to decide whether a newly extracted fact is a duplicate of an existing memory. This is a category error: cosine similarity measures topical proximity, not factual identity.

"I work at Google" vs "I work at Microsoft" → cosine ≈ 0.92 → dropped as a duplicate (a real update, silently lost — this is [FEAT] Memory deduplication: fixed cosine threshold causes silent data loss #34)
"I enjoy hiking" vs "I love spending time outdoors" → cosine ≈ 0.78 → stored as two facts (a real duplicate, kept)

No threshold value fixes this, because the metric cannot distinguish "same fact, different phrasing" from "same topic, different fact." That distinction requires language understanding, not geometry.

There is also an architectural conflict: _deduplicate_and_store_facts writes facts directly to LanceDB, while CoreMemoryFileIndexer independently re-syncs markdown → LanceDB on a 300s timer. Both write the same table, racing each other. The markdown store's own docstring says "LanceDB serves as the search index over this markdown content" — but the code does the opposite.

2. Why every threshold-style fix is a bandaid

Approach	Why it fails
Lower / raise the threshold	Still geometry; trades false-drops for false-dups. Value is embedding-model- and fact-type-dependent.
Two thresholds	Two wrong numbers instead of one.
LLM-judge after a cosine pre-filter	The pre-filter gates what the LLM sees; if cosine misses the related memory, the LLM never gets to resolve it.
Update-on-near-duplicate (#36)	The "same fact" decision is still made by cosine.

We also rejected, for the same reason, two later attempts to reintroduce rigid structure: a per-source importance scalar (0.3/0.7/0.75) and a category partition (consolidated/{category}.md). All three — threshold, scalar, partition — impose discrete/numeric structure on semantically fuzzy data. A fact can belong to two "categories" ("deploy to AWS on Fridays" is preference + technical + scheduling); a single bucket can't represent that.

3. The principle this PR adopts

Cosine retrieves candidates (its correct use); the LLM makes the decision. Markdown is the source of truth; LanceDB is a derived index. The raw stream is immutable. No threshold, no importance scalar, no fixed partition — the only numbers are operational (hours, counts, size caps).

This is validated against three reference systems we studied:

Claude Code (autoDream): a forked LLM agent periodically rewrites memory files, resolving contradictions by editing. Gated by time AND volume AND lock, not a blind timer.
OpenClaw (dreaming): durability is driven by recall frequency (usage), not a write-time importance score; unused memory decays.
Hermes (background_review/curator): never auto-deletes — only archives (recoverable); consolidation is continuous because the agent owns explicit fact IDs.

4. Architecture: three tiers

TIER          FILE                       ROLE                          WHERE SEEN
1 stream      archive/YYYY-MM-DD.md      append-only daily log         on disk; recent ones
              (IMMUTABLE)                = source of truth             indexed for search
2 durable     consolidated/memory.md     deduped truth, atomic         search index (on demand)
              (LLM-maintained)           entries; history preserved
3 always-on   MEMORY.md                  few highest-value, most-      always in the prompt
              (LLM-rewritten, capped)    recalled facts                (core "facts" block)

This maps onto Suzent's existing get_core_memory() (which already injects MEMORY.md as the always-visible block) and mirrors Claude Code's logs/ → topic files → MEMORY.md layout. persona.md / user.md are untouched.

Consolidated entry format — code owns the metadata, the LLM owns the content text:

<!--m {"first_seen":"2026-05-27","updated":"2026-05-30","sources":["2026-05-27","2026-05-29"]}-->
Currently works at Microsoft (since 2026-05). Previously worked at Google.

5. The two procedures

Per-turn (fast, every message)

extract facts (LLM) → append to archive/today.md → reindex_file_now(today.md)

No similarity check, no dedup, no manager→LanceDB write. Today's log is > watermark, so it's searchable immediately. This alone fixes #34.

Consolidation (gated; background loop + POST /memory/consolidate)

Gate (Claude Code style): not lock_held AND hours_since_last ≥ min_hours AND new_facts_since_watermark ≥ min_facts.

entries   = parse(consolidated/memory.md)
new_facts = parse(archive logs where date > watermark)  minus tombstoned
embed(entries) and embed(new_facts)                       # reused for reindex

for f in new_facts:                                       # cosine = CANDIDATE retrieval only
    f.neighbors = top_k(cosine(f, entries), k)
clusters = connected_components(new_facts, edge = shared neighbor)  # each entry touched ≤ once

for cluster in clusters:                                  # LLM = the DECISION
    ops = LLM(CONSOLIDATION_PROMPT, neighbors, candidates)
    apply(ops)         # ADD / REPLACE(target) / REMOVE(target); default = keep; dup candidates dropped

write consolidated/memory.md                              # provenance set in code, not by LLM
MEMORY.md = LLM(PROMOTION_PROMPT, entries, recall_summary, max_lines)   # recall-driven promotion
advance watermark; reindex (drop archives ≤ watermark, reindex consolidated + MEMORY.md)

Op semantics (default = keep every neighbor; the only way to lose a consolidated fact is an explicit REMOVE, restricted to genuine duplicates/merges):

ADD — genuinely new fact.
REPLACE(target) — correction (old content discarded) or state change over time (content becomes "now X; previously Y" → history preserved).
REMOVE(target) — two neighbors merged into one.

6. History preservation (the most-scrutinized question)

"Moved from Google to Microsoft" is a timeline, not a contradiction — both were true. Four independent safety layers ensure nothing is lost:

Raw logs are immutable — the original fact lives on disk forever.
State changes are kept as history — REPLACE writes "previously X", never a blind overwrite.
Demotion ≠ deletion — a fact dropped from always-visible MEMORY.md stays in consolidated/memory.md (still searchable).
Full rebuild — clearing the watermark and re-consolidating reconstructs everything from raw logs.

7. Usage-driven promotion (OpenClaw), no scoring weights

Suzent already stamps access_count/accessed_at on every retrieval (_record_memory_accesses). We add an append-only .recall_log.jsonl (one line per retrieved fact). Consolidation hands the recall summary to the promotion LLM as evidence for what belongs in always-visible MEMORY.md. No weighted formula, no tuned thresholds — the LLM interprets the signal.

8. Deletion correctness (tombstones)

Raw logs are immutable, so a user delete can't edit them — and today delete_archival_memory deletes from LanceDB only, so reindex resurrects the memory. Fix: delete removes the entry from consolidated/memory.md and appends to .tombstones.jsonl; consolidation skips tombstoned facts. Honors both immutability and deletion.

9. Ranking change

Importance is removed as a lever (every indexed chunk gets a constant 0.5), so hybrid_search ranks on relevance + recency — like OpenClaw/Claude Code. Category is not a retrieval filter (it never was).

10. File-by-file changes

Delete (no legacy fallback): _deduplicate_and_store_facts, _add_memory_internal, process_message_for_memories, _extract_facts_simple, refresh_core_memory_facts; MarkdownIndexer + its dead regex; constants DEDUPLICATION_* / DEFAULT_IMPORTANCE; per-source importance values; MemoryExtractionResult.memories_created/updated.

memory/manager.py — append-only process_conversation_turn_for_memories; shared _core_indexer + _consolidation_lock; _log_recalls; consolidate_memories(force), _consolidation_gate_open, neighbor/cluster/apply helpers, state read/write.
memory/indexer.py — drop MarkdownIndexer; CoreMemoryFileIndexer gets a lock, watermark-aware archive handling (index >W, drop ≤W), per-entry indexing of consolidated/memory.md, constant importance, reindex_file_now, clear_and_full_reindex.
memory/markdown_store.py — consolidated/ dir + entry read/write/parse; recall-log + tombstone helpers.
memory/memory_context.py — CONSOLIDATION_PROMPT, PROMOTION_PROMPT.
memory/models.py — slim MemoryExtractionResult; structured ConsolidationResponse schema.
memory/lifecycle.py — share _core_indexer; add _consolidation_loop; start/stop it.
routes/session_routes.py — reindex_memories delegates to CoreMemoryFileIndexer.
routes/memory_routes.py — add POST /memory/consolidate; fix delete_archival_memory (truth + tombstone).
core/context_compressor.py — drop the created_count log line.

11. Config knobs (all operational, none decide fact identity)

memory_consolidation_enabled: bool = True
memory_consolidation_min_hours: float = 24.0
memory_consolidation_min_facts: int = 20
memory_consolidation_interval_seconds: int = 1800
memory_consolidation_memory_max_lines: int = 200
memory_consolidation_neighbor_k: int = 8
memory_consolidation_cluster_max: int = 25
memory_consolidation_model: Optional[str] = None

12. Migration

One-time POST /memory/reindex {"clear_existing": true} → clear_and_full_reindex (wipe + rebuild from files). Seed .consolidation_state.json with a watermark_date before the oldest log so the first run folds full history once. Nothing at risk — raw logs are truth.

13. Test plan

[FEAT] Memory deduplication: fixed cosine threshold causes silent data loss #34 regression: "Google" then "Microsoft" → both retrievable pre-consolidation; after → "currently Microsoft; previously Google".
Duplicate phrasings → one entry. 3. Correction ("Jon"→"John") → no stale value. 4. Cross-topic fact → no forced bucket. 5. Gate respects time/volume; force overrides. 6. Cluster failure → watermark unchanged, retried. 7. Deleted fact does not reappear after reindex. 8. clear+reindex reproduces the index from files.

14. Open questions for reviewers

Single consolidated/memory.md vs. sharding once entries reach thousands (rewrite cost)?
All-or-nothing run vs. per-date watermark — how to stop one bad cluster blocking progress?
Neighbor k=8 / cluster cap 25 defaults reasonable for personal-assistant scale?
Recall-log retention: truncate each run vs. rolling N-day window?

Closes #34

🤖 Generated with Claude Code

…onsolidation architecture This is a planning commit for discussion. No code changes yet. Closes #34

refactor(memory): replace cosine-threshold dedup with append-only + c…

32847cf

…onsolidation architecture This is a planning commit for discussion. No code changes yet. Closes #34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(memory): append-only writes + retrieval-driven consolidation (fixes #34)#41

refactor(memory): append-only writes + retrieval-driven consolidation (fixes #34)#41
DeerGoat wants to merge 1 commit into
mainfrom
refactor/memory-architecture-append-only

DeerGoat commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DeerGoat commented May 29, 2026

Summary

1. Root cause of #34

2. Why every threshold-style fix is a bandaid

3. The principle this PR adopts

4. Architecture: three tiers

5. The two procedures

Per-turn (fast, every message)

Consolidation (gated; background loop + POST /memory/consolidate)

6. History preservation (the most-scrutinized question)

7. Usage-driven promotion (OpenClaw), no scoring weights

8. Deletion correctness (tombstones)

9. Ranking change

10. File-by-file changes

11. Config knobs (all operational, none decide fact identity)

12. Migration

13. Test plan

14. Open questions for reviewers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant