ContextCore: Enhanced Context Management for Local LLMs

ContextCore is a Python library designed to overcome context window limitations in smaller local LLMs like smollm2:1.7b from Ollama. It implements a unified memory system that combines high-level thinking memory with detailed raw memory to provide extended context capabilities.

Installation

# Clone the repository
git clone https://github.com/Priyanshu-i/ContextCore.git
cd contextcore

# Install the package
pip install -e .

# Optional but recommended dependencies
pip install sentence-transformers  # For better embeddings
pip install hnswlib  # For vector storage
pip install redis  # For faster key-value storage
pip install requests  # For Ollama API communication

Quick Start

from contextcore import ContextCore

# Initialize ContextCore with your local LLM
context = ContextCore(
    model_name="smollm2:1.7b",  # Your Ollama model
    ollama_url="http://localhost:11434"  # Ollama API URL
)

# Initialize a new session with an objective
context.initialize_session("Building a robust memory system for local LLMs")

# Process user inputs and get responses
response = context.process_user_input("How can I implement a vector store for text embeddings?")
print(response)

# Save the session for later use
context.save()

# Load a saved session
loaded_context = ContextCore.load("./contextcore_storage")

Key Features

Two-Tier Memory System:
- Thinking Memory (TME): High-level reasoning, concepts, and session strategies
- Raw Memory (RME): Detailed facts, user inputs, and specific technical information
Semantic Search: Find relevant memories based on semantic similarity
Session Management: Maintain coherent, ongoing conversations with automatic summarization
Local LLM Integration: Seamless integration with Ollama-based local models
Persistence: Save and load sessions to continue conversations later

Advanced Usage

Customizing Memory Storage

# Use Redis for faster key-value storage
context = ContextCore(
    model_name="smollm2:1.7b",
    use_redis=True  # Enable Redis storage
)

# Customize vector dimensions (if using a different embedding model)
context = ContextCore(
    model_name="smollm2:1.7b",
    vector_dim=768  # For larger embedding models
)

Working with Different LLMs

# Use a different Ollama model
context = ContextCore(
    model_name="llama3:8b",  # Any model you have in Ollama
)

# Connect to a remote Ollama instance
context = ContextCore(
    model_name="mistral:7b",
    ollama_url="http://your-ollama-server:11434"
)

Memory Management

# Manually add thinking memory
context.memory_store.add_thinking_memory(
    content="The key insight is to use hierarchical summarization",
    importance=0.9,
    metadata={"topic": "architecture", "source": "design_doc"}
)

# Manually add raw memory
context.memory_store.add_raw_memory(
    content="User prefers Python over JavaScript for this project",
    category="user",  # user, session, or agent
    relevance_score=0.7,
    metadata={"source": "conversation"}
)

# Search memories
memories = context.memory_store.search_memories(
    query="vector databases",
    k=5,  # Return top 5 results
    filter_type="raw",  # Only raw memories
    min_score=0.6  # Minimum similarity threshold
)

Implementation Details

Memory Types

ThinkingMemory: Used for high-level concepts and reasoning
- Contains: content, timestamp, importance score, metadata
RawMemory: Used for detailed facts and specific information
- Contains: content, timestamp, category, relevance score, metadata

Components

VectorStore: Stores and retrieves memories using semantic search
- Uses HNSWlib for efficient similarity search
SimpleEmbedder: Converts text to vector embeddings
- Uses sentence-transformers if available, with a simple fallback
MemoryStore: Combines vector storage with metadata-based retrieval
- Optional Redis integration for faster lookups
OllamaClient: Interfaces with Ollama API for text generation
ContextCore: Main class that coordinates all components

Best Practices

Initialization:
- Always provide a clear session objective
- Use the most powerful local LLM you have available
Memory Management:
- Let the system handle memory management automatically
- For critical information, manually add high-importance memories
Performance Optimization:
- Install sentence-transformers for better embeddings
- Use Redis for faster key-value lookups in production
Troubleshooting:
- Check Ollama is running and the model is loaded
- Ensure you have sufficient RAM for vector operations
- Look at the logs for detailed information about operations

Memory System Architecture

ContextCore implements a unified memory system that combines:

Hierarchical Summarization: Continuously distills conversation into structured summaries
Incremental Updates: Updates high-level summaries with new insights
Semantic Retrieval: Fetches the most relevant detailed memories
Dynamic Injection: Combines high-level thinking with detailed context

This approach enables small local LLMs to maintain coherent conversations even when the raw input exceeds their context window.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ContextCore: Enhanced Context Management for Local LLMs

Installation

Quick Start

Key Features

Advanced Usage

Customizing Memory Storage

Working with Different LLMs

Memory Management

Implementation Details

Memory Types

Components

Best Practices

Memory System Architecture

FilesExpand file tree

readme.md

Latest commit

History

readme.md

File metadata and controls

ContextCore: Enhanced Context Management for Local LLMs

Installation

Quick Start

Key Features

Advanced Usage

Customizing Memory Storage

Working with Different LLMs

Memory Management

Implementation Details

Memory Types

Components

Best Practices

Memory System Architecture