Skip to content

Fix content management and working memory#99

Open
bellaland wants to merge 9 commits intoChicagoHAI:mainfrom
bellaland:fix_content_management
Open

Fix content management and working memory#99
bellaland wants to merge 9 commits intoChicagoHAI:mainfrom
bellaland:fix_content_management

Conversation

@bellaland
Copy link
Copy Markdown

Summary

This PR fixes Context management and working memory #51 on the roadmap by introducing a stateful, context-aware and validated execution pipeline.

Features implemented:

  1. Maintain a STATE.md file updated at each phase
  2. Add periodic working directory checks
  3. Generate a summary of prior phases to keep the context focused
  4. Validate outputs before starting the next one
  5. Limit execution to the top-K most promising directions

What I have learned

1. Context structure matters more than context size

Improving agent performance is not only about adding more context, but also about how to structure it.

By adding following components, this PR reduces context drift, inconsistent reasoning and repeated work:

  • StateManager -> persistent execution memory via STATE.md
  • ContextSummarizer -> summarize prior phases for next prompts
  • Validators -> enforce correctness, prevent silent failures

2. Stateful execution improves reliability and user trust

This PR provides observability into the execution pipeline through STATE.md, .neurico/state.json and phase summaries, ensuring following improvements and improving user trust:

  • more focused outputs
  • more consistent reasoning across stages
  • more explicit failures

3. Evaluation is still an open question for multi-agent design

Multi-agent framework works well for research tasks, but following questions about evaluation still need optimization

  • evaluation and scoring standards should be more mature
  • quality metrics need more validation
  • more user studies are needed to define "good research output"

Next steps

1. Context drift detection

Add automated drift detection via LLM judge to compare expected goal with actual outputs.

2. Adaptive Top-K selection

Adjust K dynamically based on task complexity to improve scoring functions and balance context breadth and depth.

3. Cache optimization

Optimize memory and cache via:

  • Improve storage structure of files under ./neurico
  • Manage lifecycle of state and summaries
  • Reuse intermediate results across runs

4. User experience

Improve user experience via:

  • Add interactive UI
  • Visualize progress tracking
  • Add human-in-the-loop feedback

Research Questions

1. What information matters most across phases?

Current state / phase, key findings, decision rationale, constraints and failures, next steps and Top-K candidates.

2. How to detect context drift during execution?

  • Compare expected vs actual outputs
  • Analyze semantic similarity between planning and execution
  • LLM-driven evaluation of alignment with the original goal

3. What is the right tradeoff between context breadth and depth?

Context breadth improves coverage but increases cost and drift risk, while context depth improves rigor but may miss alternative approaches. To balance them, this PR uses Top-K selection to focus on most promising directions. In the future, dynamically adjust K based on task complexity can optimize tradeoff between context breadth and depth.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant