Skip to content

feat: persistent session state and experiment attempt logging for session execution agent (experiment runner agent))#98

Open
bimu233 wants to merge 4 commits intoChicagoHAI:mainfrom
bimu233:main
Open

feat: persistent session state and experiment attempt logging for session execution agent (experiment runner agent))#98
bimu233 wants to merge 4 commits intoChicagoHAI:mainfrom
bimu233:main

Conversation

@bimu233
Copy link
Copy Markdown
Contributor

@bimu233 bimu233 commented Apr 27, 2026

Summary

Adds durable inter-phase memory to the session executor so that interrupted sessions (rate limit, token exhaustion, 5-hour Claude Code session limit) can resume without repeating completed work or re-encountering already-fixed bugs.

Problem

The session executor runs all 6 phases in a single continuous session. When that session is interrupted mid-run, all in-session context is lost. On the next run, the agent restarts from Phase 0, repeats phases it already completed, and re-discovers errors it already fixed — wasting tokens and time. This is common when users share their Claude Code session with other work.

Changes

templates/base/researcher.txt

  • Added RESUME CHECK block at the top: agent reads STATE.md immediately on startup, checks the phase status table, and skips all DONE phases. Writes a session-start Worklog entry before any other action so the session is recorded even if interrupted again.
  • Added mandatory CHECKPOINT writes at each phase boundary: phase status, worklog entry, and section-specific content (files, results, design decisions).
  • Added per-attempt experiment logging protocol (STEP A/B/C): agent writes Status: RUNNING to STATE.md before each experiment run, then updates to FAILED (with exact error and fix) or SUCCESS (with metrics and output files) immediately after. A hard constraint blocks code changes until the current attempt entry is resolved — preventing the agent from batching all error writes to the end of the session.

templates/base/deliverables/state_template.md

  • Added STATE.md template with structured sections: # Current State (phase status table), # Worklog, # Research Specification, # Files and Resources, # Workflow, # Experiment Design, # Learnings, # Experiment Results, # Experiment Attempts.
  • # Experiment Attempts is the append-only per-run log. Each entry records the command, error, fix applied, and output files — giving the agent a complete anti-repetition record on resume.

Why researcher.txt

researcher.txt is always injected at {{ prompt }} by Jinja2 regardless of domain override. Placing STATE.md write instructions here means all domains get consistent phase logging without needing to duplicate instructions in every domains/X/session_instructions.txt.

Verified

Tested on L2 regularization experiment. STATE.md was updated in-time at each phase transition and after each failed attempt — including exact error messages, fixes applied, and metric values. Confirmed successful resume from correct phase without repeating prior work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant