feat: persistent session state and experiment attempt logging for session execution agent (experiment runner agent)) by bimu233 · Pull Request #98 · ChicagoHAI/NeuriCo

bimu233 · 2026-04-27T22:44:01Z

Summary

Adds durable inter-phase memory to the session executor so that interrupted sessions (rate limit, token exhaustion, 5-hour Claude Code session limit) can resume without repeating completed work or re-encountering already-fixed bugs.

Problem

The session executor runs all 6 phases in a single continuous session. When that session is interrupted mid-run, all in-session context is lost. On the next run, the agent restarts from Phase 0, repeats phases it already completed, and re-discovers errors it already fixed — wasting tokens and time. This is common when users share their Claude Code session with other work.

Changes

templates/base/researcher.txt

Added RESUME CHECK block at the top: agent reads STATE.md immediately on startup, checks the phase status table, and skips all DONE phases. Writes a session-start Worklog entry before any other action so the session is recorded even if interrupted again.
Added mandatory CHECKPOINT writes at each phase boundary: phase status, worklog entry, and section-specific content (files, results, design decisions).
Added per-attempt experiment logging protocol (STEP A/B/C): agent writes Status: RUNNING to STATE.md before each experiment run, then updates to FAILED (with exact error and fix) or SUCCESS (with metrics and output files) immediately after. A hard constraint blocks code changes until the current attempt entry is resolved — preventing the agent from batching all error writes to the end of the session.

templates/base/deliverables/state_template.md

Added STATE.md template with structured sections: # Current State (phase status table), # Worklog, # Research Specification, # Files and Resources, # Workflow, # Experiment Design, # Learnings, # Experiment Results, # Experiment Attempts.
# Experiment Attempts is the append-only per-run log. Each entry records the command, error, fix applied, and output files — giving the agent a complete anti-repetition record on resume.

Why researcher.txt

researcher.txt is always injected at {{ prompt }} by Jinja2 regardless of domain override. Placing STATE.md write instructions here means all domains get consistent phase logging without needing to duplicate instructions in every domains/X/session_instructions.txt.

Verified

Tested on L2 regularization experiment. STATE.md was updated in-time at each phase transition and after each failed attempt — including exact error messages, fixes applied, and metric values. Confirmed successful resume from correct phase without repeating prior work.

bimu233 and others added 4 commits April 25, 2026 11:07

inter phase memory

729198b

experiment attempts error and fix log enabled

601e0f4

Merge branch 'ChicagoHAI:main' into main

6de129e

add resume instruction to researcher.txt

1fdffeb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: persistent session state and experiment attempt logging for session execution agent (experiment runner agent))#98

feat: persistent session state and experiment attempt logging for session execution agent (experiment runner agent))#98
bimu233 wants to merge 4 commits intoChicagoHAI:mainfrom
bimu233:main

bimu233 commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bimu233 commented Apr 27, 2026

Summary

Problem

Changes

Why researcher.txt

Verified

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant