Skip to content

feat(memory): Cross Agent Memory Experimentation#65

Merged
aiosfoundation merged 19 commits into
agiresearch:mainfrom
RyamL1221:main
Apr 23, 2026
Merged

feat(memory): Cross Agent Memory Experimentation#65
aiosfoundation merged 19 commits into
agiresearch:mainfrom
RyamL1221:main

Conversation

@RyamL1221
Copy link
Copy Markdown
Contributor

PR: System-Wide Personalization via Kernel-Managed Shared Memory

Summary

Adds a kernel-managed shared memory architecture to the Cerebrum SDK that enables system-wide personalization across agents. Three new example agents (ProfileAgent, TaskAgent, AssistantAgent) demonstrate the pattern: specialized agents write user context as shared memories with standardized metadata, and the AIOS kernel automatically injects that context into other agents' LLM calls. A two-phase benchmark harness validates that shared memory improves personalization quality over a private-only baseline.

What Changed

New Example Agents (cerebrum/example/agents/)

  • ProfileAgent — Extracts stable user attributes (name, preferred tools, language, response style) via structured LLM output. Stores as memory with memory_type="profile" and configurable sharing_policy.
  • TaskAgent — Extracts working context (project, experiment, goals, blockers, next steps) via structured LLM output. Stores as memory with memory_type="task_context".
  • AssistantAgent — Responds to user queries with plain llm_chat calls. No retrieval logic — the kernel's auto_inject handles shared context injection, and auto_extract handles conversation memory storage.
  • shared_memory_utils.pybuild_memory_metadata() helper with input validation for constructing standardized metadata dicts. filter_shared_memories() retained for optional debug/ablation use.

Memory API Extension (cerebrum/memory/apis.py)

  • search_memories() now accepts optional user_id and sharing_policy keyword parameters for cross-agent memory queries, with input validation and kernel contract documentation.

CLI Extension (cerebrum/commands/run_agent.py)

  • Added --share-memory flag to run-agent CLI command, propagated to agent instances.

Benchmark Harness (benchmarks/shared_memory/)

  • Two-phase experiment: Phase 1 (private baseline) vs Phase 2 (shared memory with kernel auto-inject). Kernel restart between phases clears the memory store.
  • Synthetic data generation (synth.py): Generates unique user profiles, task contexts, vague follow-up queries, and plausible actions per trial.
  • HybridJudge (judge.py): Combines deterministic keyword matching with LLM-based scoring. Keyword matching provides a reliable signal for whether the response references injected profile/task attributes. LLM scoring assesses quality and integration. Content-based rubric with no generic_penalty.
  • Pipeline (pipeline.py): Runs ProfileAgent → TaskAgent → AssistantAgent per trial. Captures written memory metadata. RetrievalLog with injection_status field ("confirmed"/"audit_inferred"/"unknown") for observability.
  • Orchestrator (run_evaluation.py): Manages per-phase share_memory flag, collects metrics, computes comparative analysis, writes JSON/CSV output.
  • Models (models.py): Pydantic models for synthetic data, judge scores, trial results, injection diagnostics, and experiment output.

Documentation (docs/)

  • system_wide_personalization.md — Architecture overview and design rationale
  • shared-memory-experiment-report.md — Full experiment report with results, ablation study, methodology, and reproduction instructions

CI (.github/workflows/test.yml)

  • Added step to create default AIOS kernel config.yaml if missing, preventing CI failures when the kernel repo doesn't ship one.

Experimental Results (30 trials, qwen2.5:7b, HybridJudge)

Metric Phase 1 (private) Phase 2 (shared) Delta Improvement
Profile Usage 2.30 ± 0.60 3.67 ± 0.80 +1.37 +59%
Task Usage 2.37 ± 0.85 3.63 ± 1.03 +1.27 +54%
Integration 2.17 ± 0.46 3.03 ± 0.93 +0.87 +40%

Kernel-managed shared memory produces measurably stronger personalized behavior across all three evaluation dimensions.

Test Coverage

Test File What It Covers
test_shared_memory_utils.py Metadata helper defaults, empty string rejection, extra kwargs
test_shared_memory_utils_props.py PBT: metadata field preservation, invalid enum rejection (Hypothesis)
test_assistant_agent.py AssistantAgent refactor: no search_memories, no create_memory, no filter_shared_memories import
test_benchmark_orchestrator.py Orchestrator share_memory flag per phase, no config.update calls
test_benchmark_metrics.py TrialResult field completeness, comparative analysis stdout output
test_results_props.py PBT: summary statistics arithmetic correctness (Hypothesis)
test_benchmark_harness_preservation.py PBT: _clamp_score, _retrieval_log_from_diagnostics, _build_retrieval_log_from_search, rubric structure, key normalization (Hypothesis)
test_search_memories_cross_agent.py search_memories cross-agent parameter forwarding
test_share_memory_flag.py CLI --share-memory flag parsing and propagation

How to Run

# Unit tests (no kernel needed)
python3 tests/agents/test_shared_memory_utils.py
python3 tests/agents/test_shared_memory_utils_props.py
python3 tests/agents/test_assistant_agent.py
python3 tests/agents/test_benchmark_orchestrator.py
python3 tests/agents/test_benchmark_metrics.py
python3 tests/agents/test_results_props.py
python3 tests/agents/test_benchmark_harness_preservation.py

# Benchmark (requires AIOS kernel with auto_inject/auto_extract enabled)
python benchmarks/shared_memory/run_evaluation.py --trials 30 --output results/phase1/ --condition phase1 --csv
# restart kernel
python benchmarks/shared_memory/run_evaluation.py --trials 30 --output results/phase2/ --condition phase2 --csv

Companion Kernel Changes (AIOS repo)

This PR requires corresponding changes in the AIOS kernel for full functionality:

  • Context injector: resolve user_id from memory metadata, enforce sharing_policy filter
  • Conversation extractor: propagate resolved user_id from injection context
  • Memory formatting: convert JSON content to natural language at inject time

@aiosfoundation aiosfoundation merged commit be132ad into agiresearch:main Apr 23, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants