Skip to content

Latest commit

 

History

History
174 lines (126 loc) · 5.29 KB

File metadata and controls

174 lines (126 loc) · 5.29 KB

ContextCore: Enhanced Context Management for Local LLMs

ContextCore is a Python library designed to overcome context window limitations in smaller local LLMs like smollm2:1.7b from Ollama. It implements a unified memory system that combines high-level thinking memory with detailed raw memory to provide extended context capabilities.

Installation

# Clone the repository
git clone https://github.com/Priyanshu-i/ContextCore.git
cd contextcore

# Install the package
pip install -e .

# Optional but recommended dependencies
pip install sentence-transformers  # For better embeddings
pip install hnswlib  # For vector storage
pip install redis  # For faster key-value storage
pip install requests  # For Ollama API communication

Quick Start

from contextcore import ContextCore

# Initialize ContextCore with your local LLM
context = ContextCore(
    model_name="smollm2:1.7b",  # Your Ollama model
    ollama_url="http://localhost:11434"  # Ollama API URL
)

# Initialize a new session with an objective
context.initialize_session("Building a robust memory system for local LLMs")

# Process user inputs and get responses
response = context.process_user_input("How can I implement a vector store for text embeddings?")
print(response)

# Save the session for later use
context.save()

# Load a saved session
loaded_context = ContextCore.load("./contextcore_storage")

Key Features

  1. Two-Tier Memory System:

    • Thinking Memory (TME): High-level reasoning, concepts, and session strategies
    • Raw Memory (RME): Detailed facts, user inputs, and specific technical information
  2. Semantic Search: Find relevant memories based on semantic similarity

  3. Session Management: Maintain coherent, ongoing conversations with automatic summarization

  4. Local LLM Integration: Seamless integration with Ollama-based local models

  5. Persistence: Save and load sessions to continue conversations later

Advanced Usage

Customizing Memory Storage

# Use Redis for faster key-value storage
context = ContextCore(
    model_name="smollm2:1.7b",
    use_redis=True  # Enable Redis storage
)

# Customize vector dimensions (if using a different embedding model)
context = ContextCore(
    model_name="smollm2:1.7b",
    vector_dim=768  # For larger embedding models
)

Working with Different LLMs

# Use a different Ollama model
context = ContextCore(
    model_name="llama3:8b",  # Any model you have in Ollama
)

# Connect to a remote Ollama instance
context = ContextCore(
    model_name="mistral:7b",
    ollama_url="http://your-ollama-server:11434"
)

Memory Management

# Manually add thinking memory
context.memory_store.add_thinking_memory(
    content="The key insight is to use hierarchical summarization",
    importance=0.9,
    metadata={"topic": "architecture", "source": "design_doc"}
)

# Manually add raw memory
context.memory_store.add_raw_memory(
    content="User prefers Python over JavaScript for this project",
    category="user",  # user, session, or agent
    relevance_score=0.7,
    metadata={"source": "conversation"}
)

# Search memories
memories = context.memory_store.search_memories(
    query="vector databases",
    k=5,  # Return top 5 results
    filter_type="raw",  # Only raw memories
    min_score=0.6  # Minimum similarity threshold
)

Implementation Details

Memory Types

  1. ThinkingMemory: Used for high-level concepts and reasoning

    • Contains: content, timestamp, importance score, metadata
  2. RawMemory: Used for detailed facts and specific information

    • Contains: content, timestamp, category, relevance score, metadata

Components

  1. VectorStore: Stores and retrieves memories using semantic search

    • Uses HNSWlib for efficient similarity search
  2. SimpleEmbedder: Converts text to vector embeddings

    • Uses sentence-transformers if available, with a simple fallback
  3. MemoryStore: Combines vector storage with metadata-based retrieval

    • Optional Redis integration for faster lookups
  4. OllamaClient: Interfaces with Ollama API for text generation

  5. ContextCore: Main class that coordinates all components

Best Practices

  1. Initialization:

    • Always provide a clear session objective
    • Use the most powerful local LLM you have available
  2. Memory Management:

    • Let the system handle memory management automatically
    • For critical information, manually add high-importance memories
  3. Performance Optimization:

    • Install sentence-transformers for better embeddings
    • Use Redis for faster key-value lookups in production
  4. Troubleshooting:

    • Check Ollama is running and the model is loaded
    • Ensure you have sufficient RAM for vector operations
    • Look at the logs for detailed information about operations

Memory System Architecture

ContextCore implements a unified memory system that combines:

  1. Hierarchical Summarization: Continuously distills conversation into structured summaries
  2. Incremental Updates: Updates high-level summaries with new insights
  3. Semantic Retrieval: Fetches the most relevant detailed memories
  4. Dynamic Injection: Combines high-level thinking with detailed context

This approach enables small local LLMs to maintain coherent conversations even when the raw input exceeds their context window.