Tiered Retrieval Memory System for AI Agents
The only AI memory system with a native desktop app, 100% local storage, and automatic conflict resolution.
Language: English | 繁體中文 | 日本語
No Docker. No Python. No configuration. Just download and go!
1️⃣ Download → Kiroku Memory.app from GitHub Releases
2️⃣ Install → npx skills add yelban/kiroku-memory
3️⃣ Restart → Restart Claude Code and enjoy persistent memory!
| Kiroku | mem0 | claude-mem | |
|---|---|---|---|
| 🖥️ Desktop GUI | ✅ Native App | ❌ Cloud | ❌ Web |
| 🔒 100% Local | ✅ | ❌ Cloud-first | ✅ |
| 🔄 Conflict Resolution | ✅ | ❌ | ❌ |
| ⏰ Time Decay | ✅ | ❌ | ❌ |
Core differentiators:
- Native Desktop App — Visual memory browser, not just CLI
- Fully Local — Your data never leaves your machine
- Smart Memory — Auto-detects contradictions, confidence decays over time
A production-ready memory system for AI agents that implements persistent, evolving memory with tiered retrieval. Built on the principles from Rohit's "How to Build an Agent That Never Forgets" and community feedback.
Traditional RAG (Retrieval-Augmented Generation) faces fundamental challenges at scale:
- Semantic similarity ≠ Factual truth: Embeddings capture similarity, not correctness
- No temporal context: Cannot handle "user liked A before, now prefers B"
- Memory contradictions: Information accumulated over time may conflict
- Scalability issues: Retrieval performance degrades with tens of thousands of memories
This system addresses these challenges with a Hybrid Memory Stack architecture.
Leading researchers in AI agents and cognitive science emphasize why persistent memory is crucial:
In her influential article "LLM Powered Autonomous Agents", she identifies memory as a core component:
Memory enables agents to go beyond stateless interactions, accumulating knowledge across sessions.
Kiroku implements this through Tiered Retrieval — summaries first, then drill-down — avoiding the semantic drift problem of naive RAG.
He outlines three layers of agent memory: Episodic (events), Semantic (facts), Procedural (skills).
| LangChain Concept | Kiroku Implementation |
|---|---|
| Episodic | events category |
| Semantic | facts, preferences categories |
| Procedural | skills category |
Plus: Conflict Resolution automatically detects contradicting facts, and Cross-project Sharing via global:user scope.
From "Thinking, Fast and Slow" — System 1 (intuition) vs System 2 (analysis).
Kiroku's implementation:
| Mode | Feature | Benefit |
|---|---|---|
| System 1 | Auto-load context | Claude "knows" you instantly |
| System 2 | /remember command |
Explicit marking of important info |
Real impact: No more repeating "I prefer uv for Python" every session.
These experts converge on one insight: Memory transforms AI from a tool into a partner.
- Continuity — Conversations aren't isolated islands
- Personalization — AI truly "knows" you
- Efficiency — Eliminates cognitive overhead of re-explaining context
- Evolution — Memory accumulates, making AI smarter over time
- Append-only Raw Logs: Immutable provenance tracking
- Atomic Facts Extraction: LLM-powered structured fact extraction (subject-predicate-object)
- Category-based Organization: 6 default categories with evolving summaries
- Tiered Retrieval: Summaries first, drill down to facts when needed
- Conflict Resolution: Automatic detection and archival of contradicting facts
- Time Decay: Exponential decay of memory confidence over time
- Vector Search: pgvector-powered semantic similarity search
- Knowledge Graph: Relationship mapping between entities
- Scheduled Maintenance: Nightly, weekly, and monthly maintenance jobs
- Production Ready: Structured logging, metrics, and health checks
flowchart TB
subgraph KM["Kiroku Memory"]
direction TB
Ingest["Ingest<br/>(Raw Log)"] --> Resources[("Resources<br/>(immutable)")]
Resources --> Extract["Extract<br/>(Facts)"]
Extract --> Classify["Classify<br/>(Category)"]
Classify --> Conflict["Conflict<br/>Resolver"]
Conflict --> Items[("Items<br/>(active)")]
Items --> Embeddings["Embeddings<br/>(pgvector)"]
Items --> Summary["Summary<br/>Builder"]
Embeddings --> Retrieve["Retrieve<br/>(Tiered + Priority)"]
Summary --> Retrieve
end
The easiest way to run Kiroku Memory — no Docker, no Python setup required.
Download the latest release for your platform from GitHub Releases:
| Platform | Architecture | Format |
|---|---|---|
| macOS | Apple Silicon (M1/M2/M3) | .dmg |
| macOS | Intel | .dmg |
| Windows | x86_64 | .msi |
| Linux | x86_64 | .AppImage |
- Install: Double-click the downloaded file to install
- Run: Launch "Kiroku Memory" from your applications
- Configure (Optional): Click settings icon to add your OpenAI API Key for semantic search
The Desktop App uses embedded SurrealDB — all data is stored locally with zero external dependencies.
- Zero Configuration: Works out of the box, no Docker or database setup
- Embedded Database: SurrealDB stores data in your app data directory
- Cross-Platform: Native apps for macOS, Windows, and Linux
- Same API: Full REST API available at
http://127.0.0.1:8000
For developers who want to run from source or customize the system.
- Python 3.11+
- Docker (for PostgreSQL + pgvector) OR SurrealDB (embedded, no Docker needed)
- OpenAI API Key
New to development? See the detailed installation guide with step-by-step instructions.
# Clone the repository
git clone https://github.com/yelban/kiroku-memory.git
cd kiroku-memory
# Install dependencies using uv
uv sync
# Copy environment file
cp .env.example .env
# Edit .env and set your OPENAI_API_KEY# Start PostgreSQL with pgvector
docker compose up -d
# Start the API server
uv run uvicorn kiroku_memory.api:app --reload
# The API will be available at http://localhost:8000# Configure backend in .env
echo "BACKEND=surrealdb" >> .env
# Start the API server (no Docker needed!)
uv run uvicorn kiroku_memory.api:app --reload
# Data stored in ./data/kiroku/# Health check
curl http://localhost:8000/health
# Expected: {"status":"ok","version":"0.1.0"}
# Detailed health status
curl http://localhost:8000/health/detailedcurl -X POST http://localhost:8000/ingest \
-H "Content-Type: application/json" \
-d '{
"content": "My name is John and I work at Google as a software engineer. I prefer using Neovim.",
"source": "user:john",
"metadata": {"channel": "chat"}
}'curl -X POST http://localhost:8000/extract \
-H "Content-Type: application/json" \
-d '{"resource_id": "YOUR_RESOURCE_ID"}'This extracts structured facts like:
Johnworks atGoogle(category: facts)Johnis asoftware engineer(category: facts)JohnprefersNeovim(category: preferences)
curl -X POST http://localhost:8000/summarize# Tiered retrieval (summaries + items)
curl "http://localhost:8000/retrieve?query=What%20does%20John%20do"
# Get context for agent prompt
curl "http://localhost:8000/context"| Method | Path | Description |
|---|---|---|
| POST | /ingest |
Ingest raw message into memory |
| GET | /resources |
List raw resources |
| GET | /resources/{id} |
Get specific resource |
| GET | /retrieve |
Tiered memory retrieval |
| GET | /items |
List extracted items |
| GET | /categories |
List categories with summaries |
| Method | Path | Description |
|---|---|---|
| POST | /extract |
Extract facts from resource |
| POST | /process |
Batch process pending resources |
| POST | /summarize |
Build category summaries |
| GET | /context |
Get memory context for agent prompt |
| Method | Path | Description |
|---|---|---|
| POST | /jobs/nightly |
Run nightly consolidation |
| POST | /jobs/weekly |
Run weekly maintenance |
| POST | /jobs/monthly |
Run monthly re-indexing |
| Method | Path | Description |
|---|---|---|
| GET | /health |
Basic health check |
| GET | /health/detailed |
Detailed health status |
| GET | /metrics |
Application metrics |
| POST | /metrics/reset |
Reset metrics |
Install launchd jobs for automatic maintenance:
bash launchd/install.sh| Job | Schedule | Description |
|---|---|---|
| nightly | 03:00 daily | Decay calculation, cleanup, summaries |
| weekly | 04:00 Sunday | Archive, compress |
| monthly | 05:00 1st | Embeddings rebuild, graph rebuild |
Verify installation:
launchctl list | grep kirokunpx skills add yelban/kiroku-memory# Step 1: Add the marketplace
/plugin marketplace add https://github.com/yelban/kiroku-memory.git
# Step 2: Install the plugin
/plugin install kiroku-memory# One-click install
curl -fsSL https://raw.githubusercontent.com/yelban/kiroku-memory/main/skill/assets/install.sh | bash
# Or clone and install
git clone https://github.com/yelban/kiroku-memory.git
cd kiroku-memory/skill/assets && ./install.shAfter installation, restart Claude Code and use:
/remember 用戶偏好深色模式 # Save memory
/recall 編輯器偏好 # Search memories
/memory-status # Check statusFeatures:
- Auto-load: SessionStart hook injects memory context
- Smart-save: Stop hook automatically saves important facts
- Priority ordering: preferences > facts > goals (hybrid static+dynamic weights)
- Smart truncation: Never truncates mid-category, maintains completeness
- Cross-project: Global + project-specific memory scopes
When hooks are working correctly, you'll see this at conversation start:
SessionStart:startup hook success: <kiroku-memory>
## User Memory Context
### Preferences
...
</kiroku-memory>
This confirms:
- ✅ SessionStart hook executed successfully
- ✅ API service is connected
- ✅ Memory context has been injected
If memory content is empty (only category headers), no memories have been stored yet. Use /remember to store manually.
Stop Hook uses a Fast + Slow dual-phase architecture:
Phase 1: Fast Path (<1s, sync)
Regex-based pattern matching for immediate capture:
| Pattern Type | Examples | Min Weighted Length |
|---|---|---|
| Preferences | I prefer..., I like... |
10 |
| Decisions | decided to use..., chosen... |
10 |
| Discoveries | discovered..., found that..., solution is... |
10 |
| Learnings | learned..., root cause..., the issue was... |
10 |
| Facts | work at..., live in... |
10 |
| No pattern | General content | 35 |
Also extracts conclusion markers from Claude's responses:
Solution,Discovery,Conclusion,Recommendation,Root cause
Weighted length: CJK chars × 2.5 + other chars × 1
Phase 2: Slow Path (5-15s, async)
Background LLM analysis using Claude CLI:
- Runs in detached subprocess (doesn't block Claude Code)
- Analyzes last 6 user + 4 assistant messages
- Extracts up to 5 memories with type/confidence
- Memory types:
discovery,decision,learning,preference,fact
Filtered out (noise):
- Short responses:
OK,好的,Thanks - Questions:
What is...,How to... - Errors:
error,failed
For long conversations, memories are captured incrementally during the session:
- Trigger: After each tool use, with throttling
- Throttle conditions: ≥5 min interval AND ≥10 new messages
- Offset tracking: Only analyzes new messages since last capture
- Smart skip: Skips if content too short
This distributes the capture load and ensures early conversation content isn't lost.
See Claude Code Integration Guide for details.
For custom MCP server integration:
# memory_mcp.py
from mcp.server import Server
from kiroku_memory.db.database import get_session
from kiroku_memory.summarize import get_tiered_context
app = Server("memory-system")
@app.tool("memory_context")
async def memory_context():
async with get_session() as session:
return await get_tiered_context(session)Configure in ~/.claude/mcp.json:
{
"mcpServers": {
"memory": {
"command": "uv",
"args": ["run", "python", "memory_mcp.py"]
}
}
}const MEMORY_API = "http://localhost:8000";
// Get memory context before responding
async function getMemoryContext(userId) {
const response = await fetch(`${MEMORY_API}/context`);
const data = await response.json();
return data.context;
}
// Save important information after conversation
async function saveToMemory(userId, content) {
await fetch(`${MEMORY_API}/ingest`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
content,
source: `bot:${userId}`
})
});
}
// Use in your bot
const memoryContext = await getMemoryContext(userId);
const enhancedPrompt = `${memoryContext}\n\n${SYSTEM_PROMPT}`;See Integration Guide for detailed examples.
Set up cron jobs for automatic maintenance:
# Nightly: Merge duplicates, promote hot memories
0 2 * * * curl -X POST http://localhost:8000/jobs/nightly
# Weekly: Apply time decay, archive old items
0 3 * * 0 curl -X POST http://localhost:8000/jobs/weekly
# Monthly: Rebuild embeddings and knowledge graph
0 4 1 * * curl -X POST http://localhost:8000/jobs/monthlyMemories decay exponentially with a configurable half-life (default: 30 days):
def time_decay_score(created_at, half_life_days=30):
age_days = (now - created_at).days
return 0.5 ** (age_days / half_life_days)| Variable | Default | Description |
|---|---|---|
BACKEND |
postgres |
Backend selection: postgres or surrealdb |
DATABASE_URL |
postgresql+asyncpg://... |
PostgreSQL connection string |
SURREAL_URL |
file://./data/kiroku |
SurrealDB URL (file:// for embedded) |
SURREAL_NAMESPACE |
kiroku |
SurrealDB namespace |
SURREAL_DATABASE |
memory |
SurrealDB database name |
OPENAI_API_KEY |
(required) | OpenAI API key for embeddings |
EMBEDDING_MODEL |
text-embedding-3-small |
OpenAI embedding model |
EMBEDDING_DIMENSIONS |
1536 |
Vector dimensions |
DEBUG |
false |
Enable debug mode |
.
├── kiroku_memory/ # Core Python package
│ ├── api.py # FastAPI endpoints
│ ├── ingest.py # Resource ingestion
│ ├── extract.py # Fact extraction (LLM)
│ ├── classify.py # Category classification
│ ├── conflict.py # Conflict resolution
│ ├── summarize.py # Summary generation
│ ├── embedding.py # Vector search
│ ├── observability.py # Metrics & logging
│ ├── db/ # Database layer
│ └── jobs/ # Maintenance jobs
├── skill/ # Claude Code Skill
│ ├── SKILL.md # Skill documentation (EN)
│ ├── SKILL.zh-TW.md # 繁體中文
│ ├── SKILL.ja.md # 日本語
│ ├── scripts/ # Commands & hooks
│ ├── references/ # Reference docs
│ └── assets/ # Install script
├── tests/
├── docs/
├── docker-compose.yml
├── pyproject.toml
└── README.md
- Installation Guide - Step-by-step installation for beginners
- Architecture Design - System architecture and design decisions
- Development Journey - From idea to implementation
- User Guide - Comprehensive usage guide
- Integration Guide - Integration with chat bots and custom agents
- Claude Code Integration - Claude Code skill setup and usage
- Renaming Changelog - Project renaming history
- Language: Python 3.11+
- Framework: FastAPI + asyncio
- Database: PostgreSQL 16 + pgvector OR SurrealDB (embedded)
- ORM: SQLAlchemy 2.x / SurrealDB Python SDK
- Embeddings: OpenAI text-embedding-3-small
- Package Manager: uv
Contributions are welcome! Please read our contributing guidelines before submitting a pull request.
This project is licensed under the PolyForm Noncommercial License 1.0.0.
Free for: Personal use, academic research, non-profit organizations, evaluation.
Commercial use: Please contact yelban@gmail.com for licensing.
- Rohit (@rohit4verse) for the original "How to Build an Agent That Never Forgets" article
- MemoraX team for open-source implementation reference
- Rishi Sood for LC-OS Context Engineering papers
- The community for valuable feedback and suggestions
