feat(memory): Cross Agent Memory Experimentation by RyamL1221 · Pull Request #65 · agiresearch/Cerebrum

RyamL1221 · 2026-04-23T14:14:49Z

PR: System-Wide Personalization via Kernel-Managed Shared Memory

Summary

Adds a kernel-managed shared memory architecture to the Cerebrum SDK that enables system-wide personalization across agents. Three new example agents (ProfileAgent, TaskAgent, AssistantAgent) demonstrate the pattern: specialized agents write user context as shared memories with standardized metadata, and the AIOS kernel automatically injects that context into other agents' LLM calls. A two-phase benchmark harness validates that shared memory improves personalization quality over a private-only baseline.

What Changed

New Example Agents (`cerebrum/example/agents/`)

ProfileAgent — Extracts stable user attributes (name, preferred tools, language, response style) via structured LLM output. Stores as memory with memory_type="profile" and configurable sharing_policy.
TaskAgent — Extracts working context (project, experiment, goals, blockers, next steps) via structured LLM output. Stores as memory with memory_type="task_context".
AssistantAgent — Responds to user queries with plain llm_chat calls. No retrieval logic — the kernel's auto_inject handles shared context injection, and auto_extract handles conversation memory storage.
shared_memory_utils.py — build_memory_metadata() helper with input validation for constructing standardized metadata dicts. filter_shared_memories() retained for optional debug/ablation use.

Memory API Extension (`cerebrum/memory/apis.py`)

search_memories() now accepts optional user_id and sharing_policy keyword parameters for cross-agent memory queries, with input validation and kernel contract documentation.

CLI Extension (`cerebrum/commands/run_agent.py`)

Added --share-memory flag to run-agent CLI command, propagated to agent instances.

Benchmark Harness (`benchmarks/shared_memory/`)

Two-phase experiment: Phase 1 (private baseline) vs Phase 2 (shared memory with kernel auto-inject). Kernel restart between phases clears the memory store.
Synthetic data generation (synth.py): Generates unique user profiles, task contexts, vague follow-up queries, and plausible actions per trial.
HybridJudge (judge.py): Combines deterministic keyword matching with LLM-based scoring. Keyword matching provides a reliable signal for whether the response references injected profile/task attributes. LLM scoring assesses quality and integration. Content-based rubric with no generic_penalty.
Pipeline (pipeline.py): Runs ProfileAgent → TaskAgent → AssistantAgent per trial. Captures written memory metadata. RetrievalLog with injection_status field ("confirmed"/"audit_inferred"/"unknown") for observability.
Orchestrator (run_evaluation.py): Manages per-phase share_memory flag, collects metrics, computes comparative analysis, writes JSON/CSV output.
Models (models.py): Pydantic models for synthetic data, judge scores, trial results, injection diagnostics, and experiment output.

Documentation (`docs/`)

system_wide_personalization.md — Architecture overview and design rationale
shared-memory-experiment-report.md — Full experiment report with results, ablation study, methodology, and reproduction instructions

CI (`.github/workflows/test.yml`)

Added step to create default AIOS kernel config.yaml if missing, preventing CI failures when the kernel repo doesn't ship one.

Experimental Results (30 trials, qwen2.5:7b, HybridJudge)

Metric	Phase 1 (private)	Phase 2 (shared)	Delta	Improvement
Profile Usage	2.30 ± 0.60	3.67 ± 0.80	+1.37	+59%
Task Usage	2.37 ± 0.85	3.63 ± 1.03	+1.27	+54%
Integration	2.17 ± 0.46	3.03 ± 0.93	+0.87	+40%

Kernel-managed shared memory produces measurably stronger personalized behavior across all three evaluation dimensions.

Test Coverage

Test File	What It Covers
`test_shared_memory_utils.py`	Metadata helper defaults, empty string rejection, extra kwargs
`test_shared_memory_utils_props.py`	PBT: metadata field preservation, invalid enum rejection (Hypothesis)
`test_assistant_agent.py`	AssistantAgent refactor: no search_memories, no create_memory, no filter_shared_memories import
`test_benchmark_orchestrator.py`	Orchestrator share_memory flag per phase, no config.update calls
`test_benchmark_metrics.py`	TrialResult field completeness, comparative analysis stdout output
`test_results_props.py`	PBT: summary statistics arithmetic correctness (Hypothesis)
`test_benchmark_harness_preservation.py`	PBT: _clamp_score, _retrieval_log_from_diagnostics, _build_retrieval_log_from_search, rubric structure, key normalization (Hypothesis)
`test_search_memories_cross_agent.py`	search_memories cross-agent parameter forwarding
`test_share_memory_flag.py`	CLI --share-memory flag parsing and propagation

How to Run

# Unit tests (no kernel needed)
python3 tests/agents/test_shared_memory_utils.py
python3 tests/agents/test_shared_memory_utils_props.py
python3 tests/agents/test_assistant_agent.py
python3 tests/agents/test_benchmark_orchestrator.py
python3 tests/agents/test_benchmark_metrics.py
python3 tests/agents/test_results_props.py
python3 tests/agents/test_benchmark_harness_preservation.py

# Benchmark (requires AIOS kernel with auto_inject/auto_extract enabled)
python benchmarks/shared_memory/run_evaluation.py --trials 30 --output results/phase1/ --condition phase1 --csv
# restart kernel
python benchmarks/shared_memory/run_evaluation.py --trials 30 --output results/phase2/ --condition phase2 --csv

Companion Kernel Changes (AIOS repo)

This PR requires corresponding changes in the AIOS kernel for full functionality:

Context injector: resolve user_id from memory metadata, enforce sharing_policy filter
Conversation extractor: propagate resolved user_id from injection context
Memory formatting: convert JSON content to natural language at inject time

Feature/aios memory

Feature/system wide personalization

RyamL1221 and others added 19 commits April 18, 2026 14:53

added Kiro to gitignore

0d95440

updated memory documentation and storage parameters

cbb0b13

Merge pull request #1 from RyamL1221/feature/aios-memory

100a5e4

Feature/aios memory

created profile agent

f44d37c

created task agent

10e589d

Created assistant agent

6bb501c

created shared memory module

9d24d5a

added a flag to CLI for shared memory

fc826f0

added tqdm to dependencies

aed18c9

added results to gitignore

d8fe802

added documentation for system wide personalization

a78dbc8

code for experimentation with system wide personalization

c731b6b

changed experiment to focus on shared vs nonshared memory

eab397d

fixed judge

420c4a8

made judge instructions more explicit

155f01f

Created hybrid judge of LLM and keyword checker

257a6be

added experiment resutls

fd9cb66

fixed tests

757026b

Merge pull request #2 from RyamL1221/feature/system-wide-personalization

3b0059f

Feature/system wide personalization

aiosfoundation approved these changes Apr 23, 2026

View reviewed changes

aiosfoundation merged commit be132ad into agiresearch:main Apr 23, 2026
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(memory): Cross Agent Memory Experimentation#65

feat(memory): Cross Agent Memory Experimentation#65
aiosfoundation merged 19 commits into
agiresearch:mainfrom
RyamL1221:main

RyamL1221 commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

RyamL1221 commented Apr 23, 2026

PR: System-Wide Personalization via Kernel-Managed Shared Memory

Summary

What Changed

New Example Agents (cerebrum/example/agents/)

Memory API Extension (cerebrum/memory/apis.py)

CLI Extension (cerebrum/commands/run_agent.py)

Benchmark Harness (benchmarks/shared_memory/)

Documentation (docs/)

CI (.github/workflows/test.yml)

Experimental Results (30 trials, qwen2.5:7b, HybridJudge)

Test Coverage

How to Run

Companion Kernel Changes (AIOS repo)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

New Example Agents (`cerebrum/example/agents/`)

Memory API Extension (`cerebrum/memory/apis.py`)

CLI Extension (`cerebrum/commands/run_agent.py`)

Benchmark Harness (`benchmarks/shared_memory/`)

Documentation (`docs/`)

CI (`.github/workflows/test.yml`)