Effective Harnesses for Long-Running Agents

Best practices for building agents that work reliably across multiple sessions and context windows.

Core Problem

Long-running agents struggle because they work in discrete sessions with no memory of prior context. Even frontier models fail at complex tasks spanning multiple context windows when given only high-level prompts.

Two-Part Solution Architecture

1. Initializer Agent (First Session Only)

Sets up the foundational environment:

init.sh script - Enables the development server to run reliably
claude-progress.txt file - Maintains a log of completed work
Initial git commit - Documents what files were added
Feature list (JSON) - Comprehensive breakdown of all end-to-end features, each initially marked as "passes": false

2. Coding Agent (Subsequent Sessions)

Follows a structured workflow:

Reads progress files and git history to understand context
Works on only one feature at a time (critical constraint)
Commits progress with descriptive messages
Updates progress documentation
Leaves code in production-ready state

Key Implementation Patterns

Feature List Management

Use JSON instead of Markdown - models are less likely to inappropriately modify structured data.

{
  "features": [
    {
      "id": "auth-001",
      "category": "Authentication",
      "description": "User can sign in with email OTP",
      "steps": [
        "Enter email on login screen",
        "Receive OTP code",
        "Enter code and gain access"
      ],
      "passes": false
    }
  ]
}

Startup Checklist

Each session must begin with:

Run pwd to confirm working directory
Read git logs and progress files
Select next highest-priority incomplete feature
Run basic end-to-end tests before new development

Testing Strategy

Provide browser automation tools (e.g., Puppeteer MCP)
Explicitly prompt for user-level testing rather than just unit tests
This significantly improves bug detection

Critical Success Factors

Factor	Why It Matters
Incremental progress	Addressing one feature per session prevents context exhaustion and undocumented half-implementations
Clean state enforcement	Require proper documentation and git commits to prevent downstream sessions from debugging unrelated issues
Clear artifact trails	Progress files and git history enable quick context recovery despite context window limitations

Common Failure Modes & Prevention

Problem	Prevention
Agent declares victory prematurely	Maintain comprehensive feature list; only mark items complete after testing
Environment left in broken state	Require git commits and progress documentation before session ends
Features marked complete without testing	Mandate end-to-end browser/device automation testing
Time wasted on setup	Provide `init.sh` for automatic server startup

Session Handoff Protocol

Before ending any session:

Commit all changes with descriptive message
Update claude-progress.txt with:
- What was completed
- What issues were encountered
- What should be tackled next
Ensure tests pass or document known failures
Leave no uncommitted work

Multi-Agent Architecture (Future)

Consider specialized agents with distinct roles:

Coding Agent - Implements features
Testing Agent - Validates implementations with E2E tests
QA Agent - Reviews code quality and consistency
Cleanup Agent - Refactors and removes dead code

Applying to This Project

For Early Reader, implement these patterns:

/earlyreader
├── init.sh                    # Dev server startup script
├── claude-progress.txt        # Session-by-session log
├── features.json              # All features with pass/fail status
└── src/

Feature Categories for Early Reader

Authentication - Supabase email OTP flow
Onboarding - Child management, subscription
Card Generation - Gemini word generation, Imagen images
Voice Interaction - OpenAI Realtime API integration
Spaced Repetition - SM-2 algorithm, card queue management
UI/UX - Swipe detection, blur reveal, orthography rendering

Each feature should have clear acceptance criteria and be testable via device automation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Effective Harnesses for Long-Running Agents

Core Problem

Two-Part Solution Architecture

1. Initializer Agent (First Session Only)

2. Coding Agent (Subsequent Sessions)

Key Implementation Patterns

Feature List Management

Startup Checklist

Testing Strategy

Critical Success Factors

Common Failure Modes & Prevention

Session Handoff Protocol

Multi-Agent Architecture (Future)

Applying to This Project

Feature Categories for Early Reader

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

Effective Harnesses for Long-Running Agents

Core Problem

Two-Part Solution Architecture

1. Initializer Agent (First Session Only)

2. Coding Agent (Subsequent Sessions)

Key Implementation Patterns

Feature List Management

Startup Checklist

Testing Strategy

Critical Success Factors

Common Failure Modes & Prevention

Session Handoff Protocol

Multi-Agent Architecture (Future)

Applying to This Project

Feature Categories for Early Reader