GitHub - LLM-VLM-GSL/AriadneMem: Code for Paper: AriadneMem: Threading the Maze of Lifelong Memory for LLM Agents

Threading the Maze of Lifelong Memory for LLM Agents

AriadneMem is a structured memory system that addresses disconnected evidence and state update challenges in long-horizon LLM agents through a decoupled two-phase pipeline.

🌐 Project Page | 📥 Paper (PDF)

If you find our work is useful in your research, please consider raising a star ⭐ and citing:

@misc{zhu2026ariadnememthreadingmazelifelong,
      title={AriadneMem: Threading the Maze of Lifelong Memory for LLM Agents}, 
      author={Wenhui Zhu and Xiwen Chen and Zhipeng Wang and Jingjing Wang and Xuanzhao Dong and Minzhou Huang and Rui Cai and Hejian Sang and Hao Wang and Peijie Qiu and Yueyue Deng and Prayag Tiwari and Brendan Hogan Rappazzo and Yalin Wang},
      year={2026},
      eprint={2603.03290},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2603.03290}, 
}

🚀 Platform Compatibility

Works seamlessly with any AI platform supporting MCP or Python integration.

Cursor	Claude	Copilot	Python	MCP Client
✅ Fully Tested	🤝 Compatible	🤝 Compatible	✅ Fully Tested	🔗 Universal

🌟 Key Features

Two-Phase Pipeline: Decouples memory retrieval and state updates for enhanced stability.
Evidence Threading: Successfully bridges disconnected information across long-horizon tasks.
Plug & Play: Easy integration with modern AI IDEs and development workflows.

Quick Start

💡 Hardware: AriadneMem works on both GPU and CPU. Uses remote LLM APIs (OpenAI/Qwen) and local embedding models.

Installation

pip install -r requirements.txt

Configuration

cp config.py.example config.py

Edit config.py:

OPENAI_API_KEY = "your-api-key"
OPENAI_BASE_URL = None  # or Qwen: "https://dashscope.aliyuncs.com/compatible-mode/v1"
LLM_MODEL = "gpt-4o"    # or "qwen-plus-2025-07-28"

# Per-component model overrides (optional, falls back to LLM_MODEL)
BUILDER_LLM_MODEL = None   # Phase I: e.g. "gpt-4.1-mini" for cost savings
ANSWER_LLM_MODEL  = None   # Phase II: e.g. "gpt-4o" for better quality

# Reasoning mode
REASONING_MODE = "eco"     # "eco" | "pro" | "custom"

# Local Embedding Model (no API needed)
# Lightweight option (fast on CPU):
EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
# Or for better retrieval quality (GPU accelerates):
# EMBEDDING_MODEL = "Qwen/Qwen3-Embedding-0.6B"

Basic Usage

from main import AriadneMemSystem
from models.memory_entry import Dialogue

# Initialize system
system = AriadneMemSystem(clear_db=True)

# Add dialogues
dialogues = [
    Dialogue(speaker="Alice", content="Let's meet at Starbucks tomorrow at 2pm", timestamp="2024-01-15T14:30:00"),
    Dialogue(speaker="Bob", content="Sorry, can we change to 3pm?", timestamp="2024-01-15T15:00:00"),
    Dialogue(speaker="Alice", content="Sure, 3pm works for me", timestamp="2024-01-15T15:05:00"),
]
system.add_dialogues(dialogues)

# Build memory graph
system.finalize()

# Query
answer = system.ask("What time (in hour) will Alice and Bob meet?")
# Output: "3pm" (correctly handles the state update from 2pm to 3pm)

API Reference

AriadneMemSystem

class AriadneMemSystem:
    def __init__(
        self,
        api_key: str = None,           # Uses config.OPENAI_API_KEY if None
        model: str = None,             # Default LLM model (falls back to config.LLM_MODEL)
        base_url: str = None,          # Uses config.OPENAI_BASE_URL if None
        clear_db: bool = False,        # Clear existing database
        db_path: str = None,           # Custom database path
        redundancy_threshold: float = None,
        coarsening_threshold: float = None,
        builder_model: str = None,     # Phase I model override (extraction + coarsening)
        answer_model: str = None,      # Phase II model override (topology-aware synthesis)
        reasoning_mode: str = None     # "eco" | "pro" | "custom"
    )
    
    def add_dialogue(self, speaker: str, content: str, timestamp: str = None)
    def add_dialogues(self, dialogues: List[Dialogue])
    def finalize(self)  # Build memory graph
    def ask(self, question: str) -> str
    def get_all_memories() -> List[MemoryEntry]
    def print_memories()

Per-Component LLM Models

Different phases can use different LLM models. Set via __init__ params or config.py (init params take priority):

# Option 1: via __init__ (runtime)
system = AriadneMemSystem(
    builder_model="gpt-4.1-mini",   # Phase I: cheaper model
    answer_model="gpt-4o",          # Phase II: stronger model
)

# Option 2: via config.py (global default)
BUILDER_LLM_MODEL = "gpt-4.1-mini"
ANSWER_LLM_MODEL  = "gpt-4o"

If both are None, all phases use model (or config.LLM_MODEL).

Reasoning Mode

Control retrieval depth and prompt verbosity. Set via __init__ or config.py:

Mode	`MAX_REASONING_PATHS`	`MAX_REASONING_PATH_DEPTH`	Reasoning Length
`"eco"` (default)	10	3	1-2 sentences
`"pro"`	25	3	9-10 sentences
`"custom"`	User-defined	User-defined	User-defined template

# Option 1: via __init__ (runtime)
system = AriadneMemSystem(reasoning_mode="pro")

# Option 2: via config.py (global default)
REASONING_MODE = "eco"   # "eco" | "pro" | "custom"

Configuration Reference

LLM Configuration

Parameter	Description	Example
`LLM_MODEL`	Default model for all components	`gpt-4o`, `qwen-plus-2025-07-28`
`BUILDER_LLM_MODEL`	Phase I model override (set `None` to use default)	`gpt-4.1-mini`
`ANSWER_LLM_MODEL`	Phase II model override (set `None` to use default)	`gpt-4o`
`OPENAI_BASE_URL`	API endpoint	`https://dashscope.aliyuncs.com/compatible-mode/v1`
`ENABLE_THINKING`	Qwen deep thinking mode	`True` / `False`
`USE_JSON_FORMAT`	Force JSON output	`True` (recommended)

Reasoning Modes

Mode	`MAX_REASONING_PATHS`	Reasoning Depth	Token Cost	Use Case
`"eco"`	10	1-2 sentences	Low	Simple queries, batch testing
`"pro"`	25	9-10 sentences	High	Multi-hop, complex reasoning
`"custom"`	User-defined	User-defined	Varies	Fine-tuned for specific tasks

# Switch mode in config.py
REASONING_MODE = "eco"     # Fast & token-efficient
REASONING_MODE = "pro"     # Thorough & detailed
REASONING_MODE = "custom"  # Your own settings + prompt template

Phase I Parameters (Memory Construction)

Parameter	Default	Paper	Description
`REDUNDANCY_THRESHOLD`	0.6	λ_red (Eq.3)	Entropy-aware gating threshold
`COARSENING_THRESHOLD`	0.6	λ_coal (Eq.6)	Merge vs Link decision threshold
`WINDOW_SIZE`	40	-	Dialogues per processing window
`OVERLAP_SIZE`	2	-	Window overlap for context continuity

Phase II Parameters (Retrieval & Reasoning)

Parameter	Default	Paper	Description
`SEMANTIC_TOP_K`	25	-	Max nodes from semantic search
`KEYWORD_TOP_K`	5	-	Max nodes from keyword search
`MAX_REASONING_PATH_DEPTH`	3	L (Eq.10)	Max hops in DFS path discovery (auto-set by mode)
`MAX_REASONING_PATHS`	10/25	-	Max reasoning paths (eco=10, pro=25, auto-set by mode)

Prompt Templates (Customizable)

Prompt templates are auto-selected based on REASONING_MODE. You can also define a fully custom template:

# System prompt for topology-aware synthesis
ANSWER_SYSTEM_PROMPT = "You are a QA system with graph-based memory..."

# Custom mode: define your own template
REASONING_MODE = "custom"
_CUSTOM_USER_PROMPT_TEMPLATE = """Q: {query}
{entity_hint}{graph_hint}
{context_str}
... your own reasoning instructions ...
"""

Running Tests

Quick Test

python quick_test.py

LoCoMo Benchmark

# Run on 3 sessions with parallel question processing
python test_locomo10.py --num_sessions 3 --parallel_questions

# Run with LLM-as-Judge evaluation
python test_locomo10.py --num_sessions 3 --use_llm_judge

Multi-hop Reasoning Demo

python demo_multihop.py

MCP Server (Cursor Integration)

AriadneMem can be used as an MCP server in Cursor, providing long-term memory tools directly in your AI chat.

stdio mode (recommended for Cursor):

Edit ~/.cursor/mcp.json:

{
  "mcpServers": {
    "ariadnemem": {
      "command": "/path/to/python",
      "args": ["/path/to/MCP/server/stdio_server.py"]
    }
  }
}

For remote compute (e.g. Slurm clusters), use SSH to jump to the GPU node:

{
  "mcpServers": {
    "ariadnemem": {
      "command": "ssh",
      "args": [
        "-o", "StrictHostKeyChecking=no",
        "-o", "LogLevel=ERROR",
        "gpu-node-name",
        "/path/to/python",
        "/path/to/MCP/server/stdio_server.py"
      ]
    }
  }
}

HTTP mode (for programmatic clients):

cd MCP
pip install -r requirements.txt
python run.py

See MCP/README.md for full setup guide with step-by-step CoreWeave/Slurm example, tool reference, and troubleshooting.

🚧 Under Active Development: We are currently optimizing memory construction for code and math domains to better handle technical content and formal reasoning.

Key Features

Feature	Paper Reference	Benefit
Entropy-Aware Gating	Eq. 2-3	Filters noise before LLM extraction
Conflict-Aware Coarsening	Eq. 5-6	Merges duplicates while preserving state updates
Hybrid Retrieval	Eq. 7	Semantic + Lexical search for terminal nodes
Bridge Discovery	Eq. 9	Steiner tree approximation for missing links
Multi-Hop Path Mining	Eq. 10	DFS-based reasoning chain discovery
Topology-Aware Synthesis	Eq. 11	Single LLM call with graph-guided reasoning

Comparison with Baselines

Dimension	Flat RAG	Planning-based	AriadneMem
Retrieval	Vector search	Multi-round LLM	Graph + Algorithm
Multi-hop	Not supported	3-4 LLM calls	DFS (0 LLM calls)
State Updates	Keep all / Conflict	Keep all	Smart merge + temporal edges
LLM Calls/Query	1	4-6	1
Latency	Fast	Slow	Fast

System Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                      AriadneMem Pipeline                            │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  ═══════════════════════════════════════════════════════════════   │
│  ║           PHASE I: Asynchronous Memory Construction          ║   │
│  ═══════════════════════════════════════════════════════════════   │
│                                                                     │
│  [Dialogue Stream D]                                                │
│         │                                                           │
│         ▼                                                           │
│  ┌────────────────────────────────┐                                │
│  │  Entropy-Aware Gating (Eq.3)  │  ← Φ_gate: filter low-info     │
│  │  H(m) < τ → block             │                                │
│  └────────────────────────────────┘                                │
│         │                                                           │
│         ▼                                                           │
│  ┌────────────────────────────────┐                                │
│  │  Atomic Extraction F_θ (Eq.4) │  ← LLM: dialogue → entries     │
│  │  De-linearization transform   │                                │
│  └────────────────────────────────┘                                │
│         │                                                           │
│         ▼                                                           │
│  ┌────────────────────────────────┐                                │
│  │  Conflict-Aware Coarsening    │  ← Merge/Link/Add (Eq.6)       │
│  │  (Eq.5-6)                     │                                │
│  │  • Static duplicates → Merge  │                                │
│  │  • State updates → Link edge  │                                │
│  └────────────────────────────────┘                                │
│         │                                                           │
│         ▼                                                           │
│  ┌────────────────────────────────┐                                │
│  │  VectorStore (LanceDB)        │  ← Multi-view indexing         │
│  │  • Semantic (dense vectors)   │                                │
│  │  • Lexical (keyword/BM25)     │                                │
│  │  • Symbolic (metadata)        │                                │
│  └────────────────────────────────┘                                │
│                                                                     │
│  ═══════════════════════════════════════════════════════════════   │
│  ║           PHASE II: Real-Time Structural Reasoning           ║   │
│  ═══════════════════════════════════════════════════════════════   │
│                                                                     │
│  [Query q]                                                          │
│         │                                                           │
│         ▼                                                           │
│  ┌────────────────────────────────┐                                │
│  │  Fast Paths (O(1) lookup)     │  ← Cache/regex short-circuit   │
│  │  Count/List/Relation queries  │                                │
│  └────────────────────────────────┘                                │
│         │ (if miss)                                                 │
│         ▼                                                           │
│  ┌────────────────────────────────┐                                │
│  │  Hybrid Retrieval (Eq.7)      │  ← Find terminal nodes V_term  │
│  │  score = α·sim_sem + β·sim_lex│                                │
│  └────────────────────────────────┘                                │
│         │                                                           │
│         ▼                                                           │
│  ┌────────────────────────────────┐                                │
│  │  Base Graph Construction      │  ← Entity/temporal edges       │
│  │  (Eq.8)                       │                                │
│  └────────────────────────────────┘                                │
│         │                                                           │
│         ▼                                                           │
│  ┌────────────────────────────────┐                                │
│  │  Bridge Discovery (Eq.9)      │  ← Steiner tree approximation  │
│  │  Find b* to connect V_term    │     (no LLM calls!)            │
│  └────────────────────────────────┘                                │
│         │                                                           │
│         ▼                                                           │
│  ┌────────────────────────────────┐                                │
│  │  Multi-Hop Path Mining (Eq.10)│  ← DFS reasoning chains        │
│  │  Discover logical paths P_q   │                                │
│  └────────────────────────────────┘                                │
│         │                                                           │
│         ▼                                                           │
│  ┌────────────────────────────────┐                                │
│  │  Topology-Aware Synthesis     │  ← Single LLM call             │
│  │  (Eq.11)                      │                                │
│  │  a = LLM(q, Serialize(G_q))   │                                │
│  └────────────────────────────────┘                                │
│         │                                                           │
│         ▼                                                           │
│  [Answer a]                                                         │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Project Structure

AriadneMem/
├── main.py                          # Main system entry point
├── config.py                        # Configuration (LLM, thresholds, prompts, modes)
├── requirements.txt                 # Dependencies
│
├── core/
│   ├── ariadne_memory_builder.py    # Phase I: Memory Construction
│   ├── ariadne_graph_retriever.py   # Phase II: Structural Reasoning
│   ├── ariadne_answer_generator.py  # Topology-Aware Synthesis
│   ├── semantic_normalizer.py       # Answer post-processing
│   └── aggregation_builder.py       # Entity aggregation
│
├── models/
│   ├── memory_entry.py              # MemoryEntry, Dialogue dataclasses
│   └── enhanced_structures.py       # EnhancedMemoryIndex, caches
│
├── database/
│   └── vector_store.py              # LanceDB vector store
│
├── utils/
│   ├── llm_client.py                # OpenAI-compatible LLM client
│   └── embedding.py                 # SentenceTransformers embeddings
│
├── dataset/
│   └── locomo10.json                # LoCoMo benchmark data
│
├── MCP/                             # MCP Server (Model Context Protocol)
│   ├── README.md                    # MCP documentation
│   ├── run.py                       # HTTP server entry point
│   ├── requirements.txt             # MCP dependencies
│   ├── mcp_config/
│   │   └── settings.py              # Server settings (inherits from config.py)
│   └── server/
│       ├── stdio_server.py          # stdio transport (recommended for Cursor)
│       ├── http_server.py           # HTTP transport (FastAPI + Streamable HTTP)
│       └── mcp_handler.py           # MCP protocol handler (7 tools)
│
├── test_locomo10.py                 # Full benchmark evaluation
├── quick_test.py                    # Quick functionality test
└── demo_multihop.py                 # Multi-hop reasoning demo

Troubleshooting

Q: How to switch to Qwen models?

# config.py
OPENAI_API_KEY = "your-qwen-api-key"
OPENAI_BASE_URL = "https://dashscope.aliyuncs.com/compatible-mode/v1"
LLM_MODEL = "qwen-plus-2025-07-28"
ENABLE_THINKING = True  # Enable Qwen's deep thinking mode

Q: Multi-hop reasoning not working?

Check:

Nodes have shared entities or temporal proximity
Inspect discovered paths: graph_path.reasoning_paths
Increase MAX_REASONING_PATH_DEPTH for longer chains

Q: How to adjust filtering strength?

# More aggressive filtering (fewer nodes, faster)
REDUNDANCY_THRESHOLD = 0.5
COARSENING_THRESHOLD = 0.5

# More permissive (more nodes, better recall)
REDUNDANCY_THRESHOLD = 0.7
COARSENING_THRESHOLD = 0.7

Citation

@article{zhu2026ariadnemem,
  title   = {AriadneMem: Threading the Maze of Lifelong Memory for LLM Agents},
  author  = {Zhu, Wenhui and Chen, Xiwen and Wang, Zhipeng and Wang, Jingjing and Dong, Xuanzhao and Huang, Minzhou and Cai, Rui and Sang, Hejian and Wang, Hao and Qiu, Peijie and Deng, Yueyue and Tiwari, Prayag and Hogan Rappazzo, Brendan and Wang, Yalin},
  journal = {Preprint},
  year    = {2026},
  url     = {https://github.com/LLM-VLM-GSL/AriadneMem}
}

Acknowledgments

We would like to thank the following projects and teams:

Codebase: SimpleMem (Special thanks to their open-source contribution!)
Embedding Models:
- all-MiniLM-L6-v2 (Sentence Transformers) - Lightweight and CPU-friendly
- Qwen3-Embedding - State-of-the-art retrieval performance
Vector Database: LanceDB - High-performance columnar storage
Benchmark: LoCoMo - Long-context memory evaluation framework

License

This work is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).

You are free to use, share, and adapt this work for non-commercial purposes with proper attribution. For commercial licensing, please contact the authors.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
MCP		MCP
core		core
database		database
dataset		dataset
models		models
utils		utils
.gitignore		.gitignore
AriadneMem_Paper.pdf		AriadneMem_Paper.pdf
LICENSE		LICENSE
README.md		README.md
config.py.example		config.py.example
demo_multihop.py		demo_multihop.py
main.py		main.py
quick_test.py		quick_test.py
requirements.txt		requirements.txt
test_locomo10.py		test_locomo10.py

Folders and files

Latest commit

History

Repository files navigation

🚀 Platform Compatibility

🌟 Key Features

Quick Start

Installation

Configuration

Basic Usage

API Reference

AriadneMemSystem

Per-Component LLM Models

Reasoning Mode

Configuration Reference

LLM Configuration

Reasoning Modes

Phase I Parameters (Memory Construction)

Phase II Parameters (Retrieval & Reasoning)

Prompt Templates (Customizable)

Running Tests

Quick Test

LoCoMo Benchmark

Multi-hop Reasoning Demo

MCP Server (Cursor Integration)

Key Features

Comparison with Baselines

System Architecture

Project Structure

Troubleshooting

Q: How to switch to Qwen models?

Q: Multi-hop reasoning not working?

Q: How to adjust filtering strength?

Citation

Acknowledgments

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages