Collected research on memory systems for AI agents. Informs architecture decisions.
Paper: https://arxiv.org/abs/2310.08560 GitHub: https://github.com/letta-ai/letta
Core Insight: Treat context like an OS manages RAM. The LLM controls memory via function calls.
-
Core Memory (~2-4KB) — Always in context, editable
- Persona block (who the agent is)
- Human block (user info, preferences)
- Agent edits via
core_memory_append,core_memory_replace
-
Archival Memory (unlimited) — Vector DB for long-term storage
archival_memory_insert(content)archival_memory_search(query, k=10)
-
Recall Memory — Complete conversation history
- Searchable past interactions
- Used for context retrieval
Agent deliberately manages what it remembers. Memory operations are explicit tool calls, not hidden magic.
Core Insight: Separate memory agent runs alongside main agent, handling memory asynchronously.
┌─────────────┐ async ┌─────────────┐
│ Main Agent │ ←───────────── │ Sidecar │
│ (Expensive) │ │ (Cheap) │
└─────────────┘ └─────────────┘
│ │
└──────── Observations ────────┘
│
┌─────┴─────┐
│ Memory DB │
└───────────┘
- Non-blocking (main agent never waits)
- Uses cheaper models for memory processing
- Memory injected transparently into context
- Better for high-throughput systems
Source: Mastra blog / Ryan Carson bookmark (Feb 2026)
Core Insight: Agent passively observes and stores without being explicitly asked. Memory is a side-effect of interaction, not a deliberate action.
- MemGPT: Agent decides what to remember (active)
- Sidecar: External process decides (passive, external)
- Observational: Every interaction generates observations automatically (passive, integrated)
The memory_observe tool implements this — fire-and-forget fact storage. Convex actions handle enrichment (entity extraction, embeddings, importance scoring) asynchronously.
5-part system for zero knowledge loss across sessions:
- Hourly Memory Summaries — Granular session snapshots
- Full Conversation History (.JSONL) — Raw conversation replay
- Bi-Hourly Learnings Extraction — LLM extracts patterns, corrections, preferences
- User Prompt Submit Hook — Embed prompt → retrieve relevant learnings → inject before processing (<300ms)
- Scratch Pad — Running list of mistakes, corrections, patterns
Sub-300ms overhead per prompt for memory retrieval + injection.
Source: Team9 blog (Feb 2026), bookmarked by community
Problem: Long-running agent sessions accumulate stale context, leading to:
- Repeated mistakes
- Forgotten corrections
- Degrading response quality over time
Solution patterns:
- Active memory management (prune stale facts)
- Importance-weighted context injection (not FIFO)
- Session summarization at boundaries
- Decay functions on stored knowledge
Problem: When agents hand off work, knowledge is lost. Agent A discovers a bug pattern, Agent B hits the same bug an hour later.
Existing approaches:
| System | How It Shares |
|---|---|
| Agent Mail | Mailbox messages (text, not structured memory) |
| Claude Code Teams | Shared task list + mailbox (no persistent memory) |
| NTM | File-based state (fragile, no search) |
| Letta | Multi-agent via shared archival memory |
Engram's approach:
- Memory scopes:
private,team,project,global - Any agent can write to a scope it has access to
- Any agent can search across permitted scopes
- Conversation handoffs preserve full context chain
From our existing ImportanceCalculator (Swift):
importance = (content_score × 0.4) + (centrality × 0.3) + (temporal × 0.2) + (access × 0.1)
Content Analysis (40%):
- High: decision, error, critical, breaking, failed → 0.9
- Medium: fix, implement, create, build, update → 0.6
- Low: note, observation, maybe, consider → 0.3
- Default: 0.5
Entity Centrality (30%):
- 0 entities → 0.1
- 1 entity → 0.4
- 2-3 entities → 0.7
- 4+ entities → 1.0
Temporal Relevance (20%):
- < 7 days → 1.0
- Exponential decay: 0.99^days
- Floor: 0.1 (old facts stay searchable)
Access Frequency (10%):
- +0.05 per access, max 0.5
- Keyword-based content scoring is rough but effective for v1
- Should upgrade to LLM-based classification in v2
- Decay floor of 0.1 is important — don't delete old knowledge, just deprioritize
- Access boosting creates a natural "frequently needed" signal
- Extended schemas with migration defaults (backward compat)
- Access control per fact (
sharedWith) - Conversation threading (facts → conversations → sessions)
- Agent attribution (
createdBy)
- Colon-delimited relationships → structured objects
- Local-only → cloud sync
- Script-based queries → MCP tools
- Manual entity linking → auto-extraction
- No multi-device → Convex realtime sync
- Export existing entities.json → Convex entities table
- Export facts_schema.json → Convex facts table
- Import daily logs → conversation records
- Rebuild LanceDB indices from Convex data
- MemGPT paper: https://arxiv.org/abs/2310.08560
- Letta GitHub: https://github.com/letta-ai/letta
- Mastra observational memory: https://mastra.ai/blog
- Team9 context rot: referenced in community bookmarks
- Zac's 5-part system: @PerceptualPeak Twitter thread
- Our previous research: ~/clawd/memory/archive/2026-01-29-memory-systems.md
- Context system plan: ~/clawd/memory/context-system-plan.md
- Memory plugin integration: ~/clawd/memory/resources/memory-plugin-integration-2026-02-01.md
- MemoryLance source: ~/clawd/memory/resources/memory-system-dev/