Best practices for building agents that work reliably across multiple sessions and context windows.
Long-running agents struggle because they work in discrete sessions with no memory of prior context. Even frontier models fail at complex tasks spanning multiple context windows when given only high-level prompts.
Sets up the foundational environment:
init.shscript - Enables the development server to run reliablyclaude-progress.txtfile - Maintains a log of completed work- Initial git commit - Documents what files were added
- Feature list (JSON) - Comprehensive breakdown of all end-to-end features, each initially marked as
"passes": false
Follows a structured workflow:
- Reads progress files and git history to understand context
- Works on only one feature at a time (critical constraint)
- Commits progress with descriptive messages
- Updates progress documentation
- Leaves code in production-ready state
Use JSON instead of Markdown - models are less likely to inappropriately modify structured data.
{
"features": [
{
"id": "auth-001",
"category": "Authentication",
"description": "User can sign in with email OTP",
"steps": [
"Enter email on login screen",
"Receive OTP code",
"Enter code and gain access"
],
"passes": false
}
]
}Each session must begin with:
- Run
pwdto confirm working directory - Read git logs and progress files
- Select next highest-priority incomplete feature
- Run basic end-to-end tests before new development
- Provide browser automation tools (e.g., Puppeteer MCP)
- Explicitly prompt for user-level testing rather than just unit tests
- This significantly improves bug detection
| Factor | Why It Matters |
|---|---|
| Incremental progress | Addressing one feature per session prevents context exhaustion and undocumented half-implementations |
| Clean state enforcement | Require proper documentation and git commits to prevent downstream sessions from debugging unrelated issues |
| Clear artifact trails | Progress files and git history enable quick context recovery despite context window limitations |
| Problem | Prevention |
|---|---|
| Agent declares victory prematurely | Maintain comprehensive feature list; only mark items complete after testing |
| Environment left in broken state | Require git commits and progress documentation before session ends |
| Features marked complete without testing | Mandate end-to-end browser/device automation testing |
| Time wasted on setup | Provide init.sh for automatic server startup |
Before ending any session:
- Commit all changes with descriptive message
- Update
claude-progress.txtwith:- What was completed
- What issues were encountered
- What should be tackled next
- Ensure tests pass or document known failures
- Leave no uncommitted work
Consider specialized agents with distinct roles:
- Coding Agent - Implements features
- Testing Agent - Validates implementations with E2E tests
- QA Agent - Reviews code quality and consistency
- Cleanup Agent - Refactors and removes dead code
For Early Reader, implement these patterns:
/earlyreader
├── init.sh # Dev server startup script
├── claude-progress.txt # Session-by-session log
├── features.json # All features with pass/fail status
└── src/
- Authentication - Supabase email OTP flow
- Onboarding - Child management, subscription
- Card Generation - Gemini word generation, Imagen images
- Voice Interaction - OpenAI Realtime API integration
- Spaced Repetition - SM-2 algorithm, card queue management
- UI/UX - Swipe detection, blur reveal, orthography rendering
Each feature should have clear acceptance criteria and be testable via device automation.