feat: persistent session state and experiment attempt logging for session execution agent (experiment runner agent))#98
Open
bimu233 wants to merge 4 commits intoChicagoHAI:mainfrom
Open
feat: persistent session state and experiment attempt logging for session execution agent (experiment runner agent))#98bimu233 wants to merge 4 commits intoChicagoHAI:mainfrom
bimu233 wants to merge 4 commits intoChicagoHAI:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds durable inter-phase memory to the session executor so that interrupted sessions (rate limit, token exhaustion, 5-hour Claude Code session limit) can resume without repeating completed work or re-encountering already-fixed bugs.
Problem
The session executor runs all 6 phases in a single continuous session. When that session is interrupted mid-run, all in-session context is lost. On the next run, the agent restarts from Phase 0, repeats phases it already completed, and re-discovers errors it already fixed — wasting tokens and time. This is common when users share their Claude Code session with other work.
Changes
templates/base/researcher.txtStatus: RUNNINGto STATE.md before each experiment run, then updates toFAILED(with exact error and fix) orSUCCESS(with metrics and output files) immediately after. A hard constraint blocks code changes until the current attempt entry is resolved — preventing the agent from batching all error writes to the end of the session.templates/base/deliverables/state_template.md# Current State(phase status table),# Worklog,# Research Specification,# Files and Resources,# Workflow,# Experiment Design,# Learnings,# Experiment Results,# Experiment Attempts.# Experiment Attemptsis the append-only per-run log. Each entry records the command, error, fix applied, and output files — giving the agent a complete anti-repetition record on resume.Why researcher.txt
researcher.txtis always injected at{{ prompt }}by Jinja2 regardless of domain override. Placing STATE.md write instructions here means all domains get consistent phase logging without needing to duplicate instructions in everydomains/X/session_instructions.txt.Verified
Tested on L2 regularization experiment. STATE.md was updated in-time at each phase transition and after each failed attempt — including exact error messages, fixes applied, and metric values. Confirmed successful resume from correct phase without repeating prior work.