Issue
Learner.checkpoint_interval is currently declared as a Param:
https://github.com/experimaestro/xpm-torch/blob/main/src/xpm_torch/learner.py#L146
checkpoint_interval: Param[int] = field(default=1, ignore_default=True)
This means changing it (e.g. from 1 to 15 to reduce checkpoint I/O on short epochs) invalidates the task hash and forces a fresh task directory. Since checkpoint_interval only controls how often state is persisted to disk — not the optimisation trajectory — it should be Meta so it can be tuned across runs without losing cached training state.
Suggested fix
checkpoint_interval: Meta[int] = field(default=1, ignore_default=True)
Context
Hit while tuning the cadence on a multi-day distillation run: switching steps_per_epoch from 8000 to 200 made checkpoint-every-epoch too noisy on disk, but raising checkpoint_interval from 1 → 15 forced a re-submission with a fresh hash.
Issue
Learner.checkpoint_intervalis currently declared as aParam:https://github.com/experimaestro/xpm-torch/blob/main/src/xpm_torch/learner.py#L146
This means changing it (e.g. from
1to15to reduce checkpoint I/O on short epochs) invalidates the task hash and forces a fresh task directory. Sincecheckpoint_intervalonly controls how often state is persisted to disk — not the optimisation trajectory — it should beMetaso it can be tuned across runs without losing cached training state.Suggested fix
Context
Hit while tuning the cadence on a multi-day distillation run: switching
steps_per_epochfrom 8000 to 200 made checkpoint-every-epoch too noisy on disk, but raisingcheckpoint_intervalfrom 1 → 15 forced a re-submission with a fresh hash.