Fix content management and working memory by bellaland · Pull Request #99 · ChicagoHAI/NeuriCo

bellaland · 2026-04-28T04:35:49Z

Summary

This PR fixes Context management and working memory #51 on the roadmap by introducing a stateful, context-aware and validated execution pipeline.

Features implemented:

Maintain a STATE.md file updated at each phase
Add periodic working directory checks
Generate a summary of prior phases to keep the context focused
Validate outputs before starting the next one
Limit execution to the top-K most promising directions

What I have learned

1. Context structure matters more than context size

Improving agent performance is not only about adding more context, but also about how to structure it.

By adding following components, this PR reduces context drift, inconsistent reasoning and repeated work:

StateManager -> persistent execution memory via STATE.md
ContextSummarizer -> summarize prior phases for next prompts
Validators -> enforce correctness, prevent silent failures

2. Stateful execution improves reliability and user trust

This PR provides observability into the execution pipeline through STATE.md, .neurico/state.json and phase summaries, ensuring following improvements and improving user trust:

more focused outputs
more consistent reasoning across stages
more explicit failures

3. Evaluation is still an open question for multi-agent design

Multi-agent framework works well for research tasks, but following questions about evaluation still need optimization

evaluation and scoring standards should be more mature
quality metrics need more validation
more user studies are needed to define "good research output"

Next steps

1. Context drift detection

Add automated drift detection via LLM judge to compare expected goal with actual outputs.

2. Adaptive Top-K selection

Adjust K dynamically based on task complexity to improve scoring functions and balance context breadth and depth.

3. Cache optimization

Optimize memory and cache via:

Improve storage structure of files under ./neurico
Manage lifecycle of state and summaries
Reuse intermediate results across runs

4. User experience

Improve user experience via:

Add interactive UI
Visualize progress tracking
Add human-in-the-loop feedback

Research Questions

1. What information matters most across phases?

Current state / phase, key findings, decision rationale, constraints and failures, next steps and Top-K candidates.

2. How to detect context drift during execution?

Compare expected vs actual outputs
Analyze semantic similarity between planning and execution
LLM-driven evaluation of alignment with the original goal

3. What is the right tradeoff between context breadth and depth?

Context breadth improves coverage but increases cost and drift risk, while context depth improves rigor but may miss alternative approaches. To balance them, this PR uses Top-K selection to focus on most promising directions. In the future, dynamically adjust K based on task complexity can optimize tradeoff between context breadth and depth.

bellaland added 9 commits April 25, 2026 10:09

chore ignore local test artifacts

ccf6bb4

fix task 1 maintain state

a8f390b

fix task 2 add working directory checks

26b3e6b

fix task 3 add summaries of prior phases

0146606

fix task 4 add validation

a1386d0

fix task 5 add top-K most promising directions

c5ca0c4

update docs related to context management and working memory

378a0be

fix task 1 maintain state

968c138

fix merge conflicts

32b3016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix content management and working memory#99

Fix content management and working memory#99
bellaland wants to merge 9 commits intoChicagoHAI:mainfrom
bellaland:fix_content_management

bellaland commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bellaland commented Apr 28, 2026

Summary

What I have learned

1. Context structure matters more than context size

2. Stateful execution improves reliability and user trust

3. Evaluation is still an open question for multi-agent design

Next steps

1. Context drift detection

2. Adaptive Top-K selection

3. Cache optimization

4. User experience

Research Questions

1. What information matters most across phases?

2. How to detect context drift during execution?

3. What is the right tradeoff between context breadth and depth?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant