Problem
CodeWhale currently has a 1M-token context window, but beyond that there is no real memory system. The current mechanisms are:
/compact: manual compression of early turns into a natural-language summary
note tool: agents can persist key-value facts
- Session persistence (SQLite): stores raw transcripts on disk
Ctrl+R session picker: manually switch between sessions
These are not a memory system. They are a flat storage — no indexing, no cross-session retrieval, no consolidation, no active forgetting. The user starts a new session and the AI remembers nothing unless explicitly told.
A True Hippocampal Memory System for AI
Biological hippocampal memory does four things that current AI context management does not:
1. Binding / Indexing
When an AI performs related actions (edits dispatch.rs, adds started_at to subagent error, opens PR #2933), these facts should be cross-indexed as a graph, not stored as independent text fragments:
edit dispatch.rs ── partOf ── PR #2933 ── fixes ── issue #2657
│ │
│ └─ alsoContains ── yolo.md edit
│
└─ reason ── format_tool_error generic suffixes mislead the agent
This enables pattern completion: given the fragment "tool error message issue", the system reconstructs the full graph — format_tool_error → dispatch.rs → PR #2933 → linked yolo.md and subagent changes.
2. Pattern Completion
A true memory system doesn't do literal full-text search. It takes a partial cue and reconstructs the full context:
- Cue: "那个工具错误消息的问题..."
- Completion: format_tool_error → dispatch.rs → the generic suffix removal fix → plus related changes in the same PR
This is fundamentally different from keyword search or vector similarity. It requires a structured index that models relationships, not just proximity.
3. Consolidation (Offline Processing)
The hippocampus replays experiences during idle periods (sleep) and transfers important patterns to the cortex. For AI:
- During idle time (between user messages, or a background task), scan recent conversation turns
- Extract structured decisions: what files were changed, what architecture decisions were made, what approaches were tried and discarded
- Discard ephemeral noise: specific error messages that were resolved, intermediate debug output
- Commit the extracted structure to long-term storage
4. Active Forgetting
Not "ran out of space." The system actively judges what is worth keeping:
- Yesterday's lunch → not important → discarded
- "The stove is hot, don't touch it" → important → consolidated
- A specific compiler error that was fixed → transitional → discarded after fix is applied
- The architecture decision that led to the fix approach → valuable → kept
This judgment should be model-driven (the AI decides what matters), not rule-based.
Proposed Architecture
┌─────────────────────────────────────────┐
│ Working Memory │
│ (Current 1M context window) │
│ Active conversation + loaded memories │
└────────────────┬────────────────────────┘
↕ real-time binding
┌─────────────────────────────────────────┐
│ Hippocampal Index Layer │
│ Entity graph (files, issues, PRs, │
│ decisions, relationships) │
│ Episodic records (timeline of events) │
│ Pattern completion engine │
└────────────────┬────────────────────────┘
↕ consolidation (idle time)
┌─────────────────────────────────────────┐
│ Cortex (Long-term Storage) │
│ Semantic knowledge (extracted rules, │
│ user preferences, project conventions) │
│ Independent of raw transcripts │
└─────────────────────────────────────────┘
Open Questions for Discussion
-
Storage: Should the index use SQLite with structured relations, or is a graph database needed?
-
Model-driven decisions: How much context budget should be allocated for the AI to decide what to consolidate/forget?
-
Trigger: Should consolidation run on a timer, at context pressure thresholds, or explicitly via a tool call?
-
Pattern completion granularity: When a user references "the tool issue from earlier," how much context should the system retrieve and inject?
-
Forgetting policy: Who decides what is ephemeral — the AI, the user, a config threshold, or a combination?
-
Cross-session retrieval: When the user starts a new session, what (if anything) should be pre-loaded into the working memory?
-
Relationship to existing code: The note tool, session persistence, and compaction infrastructure already exist. How should a hippocampal system build on or replace these?
Desired Outcome
A design discussion. Not an implementation ticket. This should produce a documented architecture that the community can review, critique, and eventually implement in slices.
Problem
CodeWhale currently has a 1M-token context window, but beyond that there is no real memory system. The current mechanisms are:
/compact: manual compression of early turns into a natural-language summarynotetool: agents can persist key-value factsCtrl+Rsession picker: manually switch between sessionsThese are not a memory system. They are a flat storage — no indexing, no cross-session retrieval, no consolidation, no active forgetting. The user starts a new session and the AI remembers nothing unless explicitly told.
A True Hippocampal Memory System for AI
Biological hippocampal memory does four things that current AI context management does not:
1. Binding / Indexing
When an AI performs related actions (edits
dispatch.rs, addsstarted_atto subagent error, opens PR #2933), these facts should be cross-indexed as a graph, not stored as independent text fragments:This enables pattern completion: given the fragment "tool error message issue", the system reconstructs the full graph —
format_tool_error→dispatch.rs→ PR #2933 → linked yolo.md and subagent changes.2. Pattern Completion
A true memory system doesn't do literal full-text search. It takes a partial cue and reconstructs the full context:
This is fundamentally different from keyword search or vector similarity. It requires a structured index that models relationships, not just proximity.
3. Consolidation (Offline Processing)
The hippocampus replays experiences during idle periods (sleep) and transfers important patterns to the cortex. For AI:
4. Active Forgetting
Not "ran out of space." The system actively judges what is worth keeping:
This judgment should be model-driven (the AI decides what matters), not rule-based.
Proposed Architecture
Open Questions for Discussion
Storage: Should the index use SQLite with structured relations, or is a graph database needed?
Model-driven decisions: How much context budget should be allocated for the AI to decide what to consolidate/forget?
Trigger: Should consolidation run on a timer, at context pressure thresholds, or explicitly via a tool call?
Pattern completion granularity: When a user references "the tool issue from earlier," how much context should the system retrieve and inject?
Forgetting policy: Who decides what is ephemeral — the AI, the user, a config threshold, or a combination?
Cross-session retrieval: When the user starts a new session, what (if anything) should be pre-loaded into the working memory?
Relationship to existing code: The
notetool, session persistence, and compaction infrastructure already exist. How should a hippocampal system build on or replace these?Desired Outcome
A design discussion. Not an implementation ticket. This should produce a documented architecture that the community can review, critique, and eventually implement in slices.