English | 中文
Persistent, research-backed user memory for AI assistants.
Your AI assistant forgets you after every conversation. memX gives it a memory that learns, decays, and evolves -- just like yours.
Every time you start a new conversation with an AI assistant, you start from zero. It doesn't know you prefer TypeScript, that you're building a specific project, or that you've already tried and rejected a particular approach. You repeat yourself. The assistant gives generic advice. The relationship never deepens.
This isn't a missing feature -- it's a missing layer.
memX is a Model Context Protocol (MCP) server that gives any MCP-compatible AI assistant -- Claude Desktop, Claude Code, or your own agent -- a persistent, structured memory of its human user. Facts are extracted from conversations, deduplicated against existing knowledge, and stored locally. On the next conversation, relevant memories surface automatically.
What makes memX different from a simple key-value store:
- Memories decay. Like human cognition, rarely-accessed facts fade. Frequently reinforced facts strengthen. This is modeled by an exponential half-life decay function with configurable parameters, inspired by spaced repetition research [1][2].
- Memories are deduplicated. Every incoming fact passes through an AUDN (Add/Update/Delete/Noop) pipeline. A local LLM classifies whether each fact is genuinely new, an update to existing knowledge, or redundant -- preventing the memory bloat that plagues simpler systems.
- Memories are searchable three ways. Hybrid search fuses vector similarity, BM25 full-text, and knowledge graph traversal into a single ranked result set. CJK-aware weight adjustment ensures quality across languages.
- Memories self-organize. A three-tier system (Core / Working / Peripheral) promotes and demotes facts based on access patterns, composite decay scores, and importance signals -- echoing the hierarchical memory architectures studied in recent agent memory research [3][4].
Everything runs on your machine. SQLite for storage, Ollama for inference, zero cloud dependencies.
MCP Client (Claude Desktop / Claude Code / Agent)
|
MCP Protocol (stdio)
|
+------+------+
| MCP Server | src/server.ts
| (13 tools) |
+------+------+
|
+----------------+----------------+
| |
+-----+-----+ +------+------+
| Bridge | src/bridge.ts | LLM |
| autoRecall| | Ollama HTTP | src/llm/
| autoExtract | extract |
+-----+-----+ | embed |
| +------+------+
| |
+--------------+--------------+------------------+
| | | |
+-----+----+ +-----+----+ +------+-----+ +----+------+
| Search | | AUDN | | Graph | | Decay |
| Engine | | Pipeline | | Store | | Engine |
+-----+----+ +-----+----+ +------+-----+ +----+------+
| | | |
+-----+----+ +------+-----+ +-----+-----+ +-----+------+
| Vector | | Memory | | Reflection| | Consolidation|
| FTS5 | | Store | | Temporal | | Promotion |
| Graph | | Feedback | | Profile | | Importance |
+-----+----+ +------+-----+ +-----+-----+ +------+-----+
| | | |
+--------------+--------------+--------------+
|
+------------+------------+
| SQLite Database |
| sqlite-vec | FTS5 |
| memories | entities |
| observations | relations|
+-------------------------+
Conversation
|
v
[Extract] ---- LLM parses durable facts from dialogue
| (3-layer filter: length, capture signals, LLM)
v
[AUDN] ------ Classify each fact: Add / Update / Delete / Noop
| (vector pre-filter + LLM classification)
v
[Store] ----- Persist to SQLite with content hash, embedding, FTS index
| (also extracts entities/relations for knowledge graph)
v
[Decay] ----- Composite score: recency x frequency x intrinsic value
| (exponential half-life, logarithmic saturation)
v
[Promote] ---- Evaluate tier transitions: Peripheral <-> Working <-> Core
| (based on access count, composite score, importance)
v
[Recall] ----- Hybrid search: vector + FTS5 + graph, freshness-weighted
| (CJK-aware weight adjustment, search result caching)
v
[Context] ----- Format for prompt injection: compact, structured, or JSON
(thematic clustering, token budget enforcement)
- Node.js >= 22.12.0 (uses native
node:sqlite) - Ollama running locally with:
- An extraction/reasoning model (default:
qwen2.5:32b) - An embedding model (default:
qwen3-embedding:8b)
- An extraction/reasoning model (default:
# Install Ollama models
ollama pull qwen2.5:32b
ollama pull qwen3-embedding:8bgit clone https://github.com/toby-bridges/memx-memory.git
cd memx-memory
npm install
npm run buildAdd to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):
{
"mcpServers": {
"memx": {
"command": "node",
"args": ["/path/to/memx/dist/index.js"],
"env": {}
}
}
}Add to your Claude Code MCP settings:
{
"mcpServers": {
"memx": {
"command": "node",
"args": ["/path/to/memx/dist/index.js"]
}
}
}Once configured, the assistant will have access to all mem_* tools. Try:
"Remember that I prefer TypeScript over JavaScript."
The assistant will call mem_store and the fact will persist across conversations.
- MCP-native -- Works with Claude Desktop, Claude Code, and any MCP client. 13 tools covering the full memory lifecycle.
- Local-first -- SQLite + sqlite-vec for storage and vector search. Your data never leaves your machine.
- Hybrid search -- Three-channel weighted fusion: vector similarity (cosine), BM25 full-text (FTS5), and knowledge graph traversal. CJK-aware weight auto-adjustment.
- Memory decay model -- Composite scoring: recency (exponential half-life), frequency (logarithmic saturation), intrinsic value (importance x confidence). Memories naturally fade unless reinforced.
- AUDN dedup pipeline -- Every incoming fact classified as Add, Update, Delete, or Noop by the LLM, with vector pre-filtering for efficient comparison.
- Three-tier system -- Core (stable identity, decay floor 0.9), Working (active context, decay floor 0.7), Peripheral (aging/low-priority, decay floor 0.5). Promotion and demotion based on usage patterns.
- Knowledge graph -- Entities, temporal observations, typed relations. Point-in-time queries and causal chain traversal.
- Bilingual sentiment detection -- Chinese + English pattern matching for frustration, satisfaction, correction, confusion. Zero LLM calls. Feeds the retrieval feedback loop.
- Token-efficient formatting -- Compact output achieves ~60% token savings vs JSON, with stable prefix ordering optimized for LLM prompt caching.
- Creative ideas engine -- Generates hypotheses, expansions, inversions, analogies, and combinations from memory clusters.
- User profile synthesis -- LLM-generated structured identity summary cached for 24 hours.
- Retrieval feedback loop -- Positive/negative/irrelevant feedback adjusts memory importance over time.
- Health metrics -- IRR (Information Retention Rate), FRR (Failure Recovery Rate), MTPR (Memory-Task Proficiency Ratio) inspired by MemGUI-Bench [5].
memX was designed after a systematic review of both academic literature and open-source projects. The following table compares memX with representative projects from each major approach:
| Feature | memX | Mem0 | Letta/MemGPT | Graphiti/Zep | Official MCP Memory |
|---|---|---|---|---|---|
| MCP Native | Yes | Yes | No (REST) | Plugin | Yes |
| Local-first (no cloud) | Yes | Partial | No | No | Yes |
| Hybrid Search (Vec+BM25+Graph) | 3-channel | Vector only | Vector only | Vec+BM25+Graph | None |
| AUDN Dedup Pipeline | Yes | No | No | Edge dedup | No |
| Memory Decay Model | Half-life | No | No | No | No |
| Three-Tier Promotion | Yes | No | 2-tier | No | No |
| Knowledge Graph | Temporal | Graph variant | No | Temporal | Basic |
| Point-in-Time Queries | Yes | No | No | Yes | No |
| Local LLM (Ollama) | Yes | No | Optional | No | No |
| Bilingual Sentiment | Yes | No | No | No | No |
| Retrieval Feedback | Yes | No | No | No | No |
| Creative Ideas Engine | Yes | No | No | No | No |
| Storage Backend | SQLite | Qdrant+PG | Postgres | Neo4j | JSON file |
| Language | TypeScript | Python | Python | Python | TypeScript |
| Infrastructure | Zero | Docker stack | Cloud | Neo4j+Cloud | Zero |
"Why not just build a ChatGPT plugin or use EverMemOS?"
| Dimension | memX (MCP, Self-hosted) | Cloud Memory Services (e.g. EverMemOS) |
|---|---|---|
| Architecture | Local-first, single SQLite file | Cloud-hosted API service |
| Privacy | Data never leaves your machine | Data stored on third-party servers |
| LLM Cost | Zero (local Ollama inference) | Pay-per-API-call |
| Client Lock-in | MCP is an open protocol — Claude Desktop, Claude Code, any MCP client | Bound to one ecosystem (e.g. ChatGPT only) |
| Setup | Requires Ollama + Node.js | Sign up and use |
| GUI | CLI / MCP tools | Web Dashboard |
| Memory Quality | AUDN dedup + Knowledge Graph + Importance-modulated decay | MemCell lifecycle + Foresight |
| Retrieval | Three-channel hybrid (vector + BM25 + graph), RRF fusion | RRF fusion + multi-round agentic retrieval |
| Customizability | Fully open source, all parameters tunable | Black box |
| Data Portability | Your SQLite file, export anytime | Locked in platform |
memX's core value proposition: Data sovereignty + Zero cost + Protocol openness.
Cloud services win on ease-of-use (no setup, has GUI). memX wins on privacy, cost, customizability, and multi-client support. Two different philosophies:
- Cloud memory = "A service that manages your memories" (SaaS)
- memX = "Your own memory infrastructure" (Self-hosted)
- Mem0 [6] is the most popular memory layer, but it's cloud-centric (Qdrant + Postgres) and lacks decay modeling or AUDN deduplication. memX provides a richer memory lifecycle in a lighter package.
- Letta/MemGPT [3] pioneered the idea of LLMs managing their own memory, but it's agent-centric (the LLM manages its context), not user-centric (extracting facts about the human). Different paradigm, different goals.
- Graphiti/Zep [7] shares memX's interest in temporal knowledge graphs and hybrid search, but requires Neo4j and cloud LLM APIs. memX achieves comparable functionality with SQLite + Ollama only.
- Official MCP Memory is a minimal reference implementation (JSON file, no search, no LLM). memX is what you upgrade to when you need production-grade memory.
| Tool | Description | Key Parameters |
|---|---|---|
mem_store |
Store a fact. Goes through AUDN dedup. | content (required), category |
mem_recall |
Semantic + keyword search across memories. | query (required), max_results |
mem_extract |
Extract durable facts from a conversation. | conversation (required), conversation_id |
mem_context |
Formatted memory context for system prompt injection. | format, max_working, max_tokens |
mem_graph |
Query the knowledge graph. | entity, entity_type, hops, at_time |
mem_timeline |
Entity evolution over time with causal chains. | entity (required), causal |
mem_reflect |
Cluster memories and synthesize insights. | category, dry_run |
mem_consolidate |
Find and merge overlapping memories. | dry_run, max_merges |
mem_maintain |
Health check: decay, staleness, tier distribution. | show_stale, show_tiers, show_metrics |
mem_feedback |
Submit retrieval feedback for importance adjustment. | memory_ids, feedback_type |
mem_profile |
Generate structured user profile from memories. | refresh, format |
mem_insights |
Learning insights report on system activity. | period_days, include_zombies |
mem_status |
System status: counts, health, storage info. | -- |
User: "Remember that I use Neovim as my primary editor."
The assistant calls mem_store:
{
"content": "User uses Neovim as their primary editor",
"category": "preference"
}Response:
{
"action": "ADD",
"reason": "New preference fact not covered by existing memories",
"stored": true
}Later, in a different conversation:
User: "What editor do I use?"
The assistant calls mem_recall:
{ "query": "editor preference" }Response:
{
"results": [{
"content": "User uses Neovim as their primary editor",
"category": "preference",
"tier": "working",
"score": 0.847,
"freshness": 0.923
}]
}The mem_context tool returns a compact format designed for system prompt injection:
=== USER MEMORY CONTEXT ===
[C:Personal] Software architect, based in Shanghai
[C:Preference] TypeScript over JavaScript, functional style
[W:Project] Building memX with MCP protocol (2 days ago)
[W:Preference] Uses Neovim as primary editor (1 day ago)
---
Legend: C=Core, W=Working, P=Peripheral
memX's architecture was shaped by a systematic review of the latest agent memory research. Key decisions and their rationale:
Simple cosine similarity dedup (threshold > 0.95) catches exact duplicates but misses semantic overlaps like "User prefers TypeScript" vs "User likes TypeScript for type safety". The AUDN pipeline uses vector pre-filtering to select candidates, then an LLM to classify the relationship: genuinely new (Add), refined version of existing knowledge (Update), contradicts old fact (Delete), or already covered (Noop). This mirrors the conflict detection approach described in the Mem0 paper [6] but formalizes it into a four-way taxonomy.
Flat importance scores create a ranking problem: a highly important but stale fact competes with a moderately important but fresh one. The three-tier system (Core / Working / Peripheral) with decay floors ensures that identity-level facts (Core, floor 0.9) never fade below the recall threshold, while project-level context (Working) naturally ages out when no longer reinforced. This echoes the hierarchical memory models in MemGPT [3] and BudgetMem [8], but applies them to user modeling rather than agent self-management.
Vector search excels at semantic similarity but fails on exact terms (product names, version numbers). BM25 catches exact keywords but misses paraphrases. Knowledge graph traversal finds relational connections invisible to both. Fusing all three with configurable weights (default: 0.4 vector + 0.2 BM25 + 0.4 graph) produces more robust retrieval than any single channel. This approach is supported by the GraphRAG survey [9] and implemented by Graphiti [7], but memX adds CJK-aware weight auto-adjustment for multilingual users.
Personal memory is inherently private data. Cloud APIs introduce latency, cost, and data sovereignty concerns. With modern local LLMs (Qwen 2.5 32B for reasoning, Qwen3-Embedding-8B for vectors), quality is competitive with cloud APIs while keeping everything on-device. The embedding model was selected after evaluating 8 alternatives on C-MTEB Chinese retrieval benchmarks -- Qwen3-Embedding-8B scored 78.21, outperforming BGE-M3 (69.20) and nomic-embed-text (no CJK support).
JSON context injection wastes 40-60% of tokens on structural overhead (keys, quotes, braces). The compact format ([C:Personal] content) achieves equivalent information density in fewer tokens, with a stable prefix ordering that maximizes LLM prompt cache hit rates. This design was directly inspired by analyzing prompt caching behavior in production.
memX is configured via YAML. Place config.yaml in ~/.memx/ or edit config/default.yaml:
# Data storage
data_dir: "~/.memx"
agent_id: "default"
# Ollama LLM configuration
ollama:
base_url: "http://localhost:11434"
extraction_model: "qwen2.5:32b" # Fact extraction + AUDN + reflection
embedding_model: "qwen3-embedding:8b" # Vector embeddings
reflection_model: "qwen2.5:32b" # Reflection + consolidation + profile
# Hybrid search weights (must sum to ~1.0)
search:
max_results: 6
min_score: 0.35
vector_weight: 0.4 # Semantic similarity
text_weight: 0.2 # BM25 keyword matching
graph_weight: 0.4 # Knowledge graph traversal
# Memory decay parameters
decay:
recency_half_life_days: 30 # Days until recency score halves
recency_weight: 0.4 # Weight of recency in composite
frequency_weight: 0.3 # Weight of access frequency
intrinsic_weight: 0.3 # Weight of importance x confidence
stale_threshold: 0.3 # Below this = stale
# Tier promotion thresholds
promotion:
core_access_threshold: 10
core_composite_threshold: 0.7
core_importance_threshold: 0.8
peripheral_composite_threshold: 0.15
peripheral_age_days: 60
# Vector store
vector_store:
backend: "sqlite-vec" # "sqlite-vec" or "qdrant"
dims: 4096 # Must match embedding modelEnvironment variable overrides:
MEMX_DATA_DIR=/custom/path MEMX_AGENT_ID=myagent node dist/index.jsmemX has comprehensive test coverage across unit, BDD, and E2E layers.
npm test # Run all tests
npm run test:watch # Watch mode
npx tsc --noEmit # Type check- 37 unit/integration test files covering store, search, decay, graph, AUDN, consolidation, reflection, feedback, sentiment, formatter, profile, insights, ideas, and more
- 15 BDD feature files (
tests/bdd/) with 113 scenario tests covering extraction, recall, store, context, decay, graph, feedback, sentiment, cache, vector-store, tier lifecycle, ideas, profile, Chinese search, and bridge integration - E2E tests (
tests/e2e.test.ts) spawning the actual MCP server process and communicating via JSON-RPC over stdio - Performance benchmarks (
tests/performance/) for cache, vector, ideas, and stress testing
memories -- Core user facts with tier, importance, access tracking
memory_history -- Full audit trail (ADD/UPDATE/DELETE/NOOP/MERGE/PROMOTE/DEMOTE)
memory_vectors -- sqlite-vec embeddings for semantic search
memories_fts -- FTS5 full-text index (unicode61 tokenizer, CJK-aware)
embedding_cache -- Cached embeddings keyed by model + content hash
entities -- Knowledge graph nodes (person, device, project, tool, ...)
observations -- Temporal facts about entities (valid_from / valid_until)
observation_vectors -- sqlite-vec embeddings for observation search
observations_fts -- FTS5 full-text index for observations
relations -- Typed directed edges between entities (temporal)
feedback_log -- Retrieval feedback entries for importance adjustment
search_logs -- Search query logs for MTPR metrics calculation
memx/
src/
index.ts # Entry point (MCP server startup)
server.ts # 13 MCP tool handlers
bridge.ts # Auto-recall / auto-extract API
config.ts # YAML config loader with validation
types.ts # Core type definitions
db/
connection.ts # SQLite + sqlite-vec + FTS5 initialization
llm/
ollama.ts # Ollama HTTP client (generate, embed, health)
extractor.ts # Fact extraction pipeline
prompts.ts # System prompts (extraction, AUDN, graph, reflection, merge)
memory/
store.ts # Memory CRUD with audit trail
search.ts # Three-channel hybrid search engine
search-cache.ts # LRU search result cache with auto-invalidation
audn.ts # ADD/UPDATE/DELETE/NOOP dedup pipeline
decay.ts # Recency/frequency/intrinsic decay scoring
graph.ts # Knowledge graph store (entities, observations, relations)
graph-extractor.ts # LLM-based entity/relation extraction
temporal.ts # Timeline queries + causal chain traversal
reflection.ts # Memory cluster reflection (higher-level insights)
consolidation.ts # Semantic memory merging
feedback.ts # Retrieval feedback loop (importance adjustment)
sentiment.ts # Bilingual sentiment detection (CN + EN)
formatter.ts # Token-efficient compact/structured/JSON output
working-compression.ts # Decay-ranked clustering + token budget enforcement
vector-store.ts # Abstraction layer (sqlite-vec / Qdrant)
embeddings.ts # Embedding service with caching
profile.ts # LLM-generated user profile synthesis
importance.ts # LLM-based importance evaluation
insights.ts # Learning insights report (pure SQL analytics)
metrics.ts # IRR/FRR/MTPR health score calculation
shared-utils.ts # Greedy clustering, cosine similarity, JSON parsing
ideas/
index.ts # Creative ideas engine (report + markdown output)
spark.ts # Spark generators (hypothesis, expansion, inversion, analogy, combination)
types.ts # Ideas type definitions
tests/ # 37 test files + 15 BDD features + benchmarks
config/
default.yaml # Default configuration
memX is under active development. Near-term priorities:
- Embedding model migration -- Switch from nomic-embed-text to Qwen3-Embedding-8B for proper CJK support (C-MTEB 78.21)
- Thematic clustering -- Hierarchical memory organization inspired by xMemory's sparsity-semantics decoupling [10]
- Query rewriting -- Integrate InfMem-style query expansion into
mem_recallfor better retrieval quality [4] - Temporal reasoning -- Three-date model (observation, reference, relative) for richer time-aware retrieval
Longer-term explorations:
- Multi-agent shared context -- Shared memory namespace for multi-bot deployments
- Learnable memory skills -- MemSkill-inspired pluggable operations [11]
- Benchmark suite -- Custom evaluation benchmark for personal memory assistant quality (no existing benchmark covers this use case)
Academic papers and open-source projects that informed memX's design:
[1] Settles, B. & Meeder, B. "A Trainable Spaced Repetition Model for Language Learning." ACL 2016. PDF -- Half-life regression (HLR) model foundational to memX's decay scoring.
[2] Tabibian, B. et al. "Enhancing Human Learning via Spaced Repetition Optimization." PNAS 2019. DOI -- Mathematical framework for optimal memory scheduling.
[3] Packer, C. et al. "MemGPT: Towards LLMs as Operating Systems." arXiv:2310.08560, 2023. Paper | Code -- Pioneered hierarchical memory tiers for LLMs.
[4] Wang, X. et al. "InfMem: Learning System-2 Memory Control for Long-Context Agent." arXiv:2602.02704, 2026. Paper | Code -- System-2 PreThink-Retrieve-Write paradigm.
[5] Liu, G. et al. "MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments." arXiv:2602.06075, 2026. Paper | Code -- Inspired memX's IRR/FRR/MTPR health metrics.
[6] Chhikara, P. et al. "Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory." arXiv:2504.19413, 2025. Paper | Code -- Closest commercial competitor; memX's AUDN extends their conflict detection.
[7] Rasmussen, P. et al. "Zep: A Temporal Knowledge Graph Architecture for Agent Memory." arXiv:2501.13956, 2025. Paper | Code -- Bi-temporal knowledge graph; comparable to memX's temporal observations.
[8] Zhang, H. et al. "BudgetMem: Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory." arXiv:2602.06025, 2026. Paper -- RL-trained budget-tier routing analogous to memX's three-tier system.
[9] "Graph Retrieval-Augmented Generation: A Survey." ACM Transactions on Information Systems. DOI -- Theoretical basis for memX's graph search channel.
[10] Hu, Z. et al. "xMemory: Beyond RAG for Agent Memory -- Retrieval by Decoupling and Aggregation." ICML 2026. arXiv:2602.02007. Paper -- Four-level hierarchy; sparsity-semantics decoupling informs future clustering.
[11] Zhang, H. et al. "MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents." arXiv:2602.02474, 2026. Paper | Code -- Learnable, evolvable memory operations.
[12] Wang, T. et al. "AI PERSONA: Towards Life-long Personalization of LLMs." arXiv:2412.13103, 2024. Paper -- Life-long user profile learning closest to memX's mission.
[13] Xu, W. et al. "A-MEM: Agentic Memory for LLM Agents." NeurIPS 2025. arXiv:2502.12110. Paper | Code -- Zettelkasten-style dynamic indexing.
[14] Hu, Y. et al. "Memory in the Age of AI Agents: A Survey." arXiv:2512.13564, 2025. Paper | Paper List -- Comprehensive taxonomy of agent memory forms and functions.
[15] Cheng, Y. et al. "TAME: A Trustworthy Test-Time Evolution of Agent Memory." arXiv:2602.03224, 2026. Paper -- Safety evaluation framework (studied for applicability to memX's single-user context).
- sqlite-vec -- Vector search extension for SQLite. Core infrastructure for memX.
- Ollama -- Local LLM inference. Powers all memX reasoning and embedding.
- Model Context Protocol -- The protocol that makes memX interoperable with any MCP client.
- mcp-memory-service -- Alternative MCP memory with BM25+vector (no graph, no AUDN, no decay).
- basic-memory -- File-first Markdown memory (different paradigm, human-readable storage).
- Cognee -- Document-oriented ECL pipeline with knowledge graph construction.
- HippoRAG -- Neurobiologically inspired long-term memory using knowledge graphs + PageRank.
Contributions are welcome. To get started:
- Fork the repository
- Create a feature branch (
git checkout -b feature/my-feature) - Install dependencies (
npm install) - Make your changes
- Run the test suite (
npm test) - Run the type checker (
npx tsc --noEmit) - Commit your changes and open a pull request
Please keep in mind:
- Write tests for new features. BDD feature files in
tests/bdd/are preferred for user-facing behavior. - Maintain TypeScript strict mode. No
anytypes. - Follow existing code patterns (factory functions, explicit interfaces, ESM imports with
.jsextensions). - Prompts are bilingual (Chinese + English) by design -- maintain this if modifying LLM prompts.
MIT
Built with Claude Code and Ollama.