Skip to content

design: hippocampal memory system for infinite-context and cross-session recall #2935

@cy2311

Description

@cy2311

Problem

CodeWhale currently has a 1M-token context window, but beyond that there is no real memory system. The current mechanisms are:

  • /compact: manual compression of early turns into a natural-language summary
  • note tool: agents can persist key-value facts
  • Session persistence (SQLite): stores raw transcripts on disk
  • Ctrl+R session picker: manually switch between sessions

These are not a memory system. They are a flat storage — no indexing, no cross-session retrieval, no consolidation, no active forgetting. The user starts a new session and the AI remembers nothing unless explicitly told.

A True Hippocampal Memory System for AI

Biological hippocampal memory does four things that current AI context management does not:

1. Binding / Indexing

When an AI performs related actions (edits dispatch.rs, adds started_at to subagent error, opens PR #2933), these facts should be cross-indexed as a graph, not stored as independent text fragments:

edit dispatch.rs ── partOf ── PR #2933 ── fixes ── issue #2657
                   │                       │
                   │                       └─ alsoContains ── yolo.md edit
                   │
                   └─ reason ── format_tool_error generic suffixes mislead the agent

This enables pattern completion: given the fragment "tool error message issue", the system reconstructs the full graph — format_tool_errordispatch.rs → PR #2933 → linked yolo.md and subagent changes.

2. Pattern Completion

A true memory system doesn't do literal full-text search. It takes a partial cue and reconstructs the full context:

  • Cue: "那个工具错误消息的问题..."
  • Completion: format_tool_error → dispatch.rs → the generic suffix removal fix → plus related changes in the same PR

This is fundamentally different from keyword search or vector similarity. It requires a structured index that models relationships, not just proximity.

3. Consolidation (Offline Processing)

The hippocampus replays experiences during idle periods (sleep) and transfers important patterns to the cortex. For AI:

  • During idle time (between user messages, or a background task), scan recent conversation turns
  • Extract structured decisions: what files were changed, what architecture decisions were made, what approaches were tried and discarded
  • Discard ephemeral noise: specific error messages that were resolved, intermediate debug output
  • Commit the extracted structure to long-term storage

4. Active Forgetting

Not "ran out of space." The system actively judges what is worth keeping:

  • Yesterday's lunch → not important → discarded
  • "The stove is hot, don't touch it" → important → consolidated
  • A specific compiler error that was fixed → transitional → discarded after fix is applied
  • The architecture decision that led to the fix approach → valuable → kept

This judgment should be model-driven (the AI decides what matters), not rule-based.

Proposed Architecture

┌─────────────────────────────────────────┐
│           Working Memory                 │
│      (Current 1M context window)         │
│  Active conversation + loaded memories   │
└────────────────┬────────────────────────┘
                 ↕ real-time binding
┌─────────────────────────────────────────┐
│      Hippocampal Index Layer            │
│  Entity graph (files, issues, PRs,      │
│  decisions, relationships)              │
│  Episodic records (timeline of events)  │
│  Pattern completion engine              │
└────────────────┬────────────────────────┘
                 ↕ consolidation (idle time)
┌─────────────────────────────────────────┐
│      Cortex (Long-term Storage)         │
│  Semantic knowledge (extracted rules,   │
│  user preferences, project conventions) │
│  Independent of raw transcripts         │
└─────────────────────────────────────────┘

Open Questions for Discussion

  1. Storage: Should the index use SQLite with structured relations, or is a graph database needed?

  2. Model-driven decisions: How much context budget should be allocated for the AI to decide what to consolidate/forget?

  3. Trigger: Should consolidation run on a timer, at context pressure thresholds, or explicitly via a tool call?

  4. Pattern completion granularity: When a user references "the tool issue from earlier," how much context should the system retrieve and inject?

  5. Forgetting policy: Who decides what is ephemeral — the AI, the user, a config threshold, or a combination?

  6. Cross-session retrieval: When the user starts a new session, what (if anything) should be pre-loaded into the working memory?

  7. Relationship to existing code: The note tool, session persistence, and compaction infrastructure already exist. How should a hippocampal system build on or replace these?

Desired Outcome

A design discussion. Not an implementation ticket. This should produce a documented architecture that the community can review, critique, and eventually implement in slices.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    Status
    Done

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions