If processing crashes at image 800/1246, the next run restarts from zero. For large datasets this wastes significant compute time and is a production reliability gap.
Scope:
- Create
internal/checkpoint/checkpoint.go managing a checkpoint.json in the output directory
- Track completed file paths with their output hash
- On engine startup, load existing checkpoint and skip already-processed files
- Atomic checkpoint writes (write to temp file, rename) to prevent corruption on crash
- Add
-no-resume flag to force full reprocessing
checkpoint.json shape:
{
"version": 1,
"started_at": "2026-02-18T10:00:00Z",
"completed": ["images/photo1.jpg", "images/photo2.png"],
"total_processed": 800
}
Acceptance Criteria:
- Interrupted run resumes from last checkpoint on restart
checkpoint.json is never left in a corrupted state
-no-resume flag bypasses checkpoint entirely
- Checkpoint file excluded from output metrics
If processing crashes at image 800/1246, the next run restarts from zero. For large datasets this wastes significant compute time and is a production reliability gap.
Scope:
internal/checkpoint/checkpoint.gomanaging acheckpoint.jsonin the output directory-no-resumeflag to force full reprocessingcheckpoint.json shape:
{ "version": 1, "started_at": "2026-02-18T10:00:00Z", "completed": ["images/photo1.jpg", "images/photo2.png"], "total_processed": 800 }Acceptance Criteria:
checkpoint.jsonis never left in a corrupted state-no-resumeflag bypasses checkpoint entirely