I am seeing repeated behavior of premature releases in the past where validate_env runs before move_and_rebuild.py is run. This should never happen, we used to hard gate on some host and cloud-level values I believe like provisioned We need to double-check this
Just now I was able to replicate this:
How to Replicate
- Have move_and_rebuild disabled
- Have an active schedule on hosts that would have been processed
- Manually kickstart/clean some hosts on a cloud that would have gone out
Outcomes
validate_env runs anyway even though move_and_rebuild has not and passes the cloud prematurely
notify.py sends notifiications, cloud is "released"
move_and_rebuild.py then runs, pushes cloud back into validating phase
- cloud re-releases
Functional Example
- Prior to a
10:30 UTC schedule start, cloud36 below had move_and_rebuild disabled as I worked through cleaning some disk cruft left by a previous tenant
- I kickstarted and cleaned the hosts manually
validate_env kicked off anyway at schedule start time regardless of move_and_rebuild being disabled
=== Validating Started == @ Wed Feb 4 10:30:01 AM UTC 2026
Validating cloud36
You're still within the configurable validation grace period. Skipping validation for cloud36.
Quads assignments validation executed.
=== Validating Finished == @ Wed Feb 4 10:30:05 AM UTC 2026
=== Validating Started == @ Wed Feb 4 10:45:01 AM UTC 2026
Validating cloud36
I am seeing repeated behavior of premature releases in the past where
validate_envruns beforemove_and_rebuild.pyis run. This should never happen, we used to hard gate on some host and cloud-level values I believe likeprovisionedWe need to double-check thisJust now I was able to replicate this:
How to Replicate
Outcomes
validate_envruns anyway even thoughmove_and_rebuildhas not and passes the cloud prematurelynotify.pysends notifiications, cloud is "released"move_and_rebuild.pythen runs, pushes cloud back into validating phaseFunctional Example
10:30 UTCschedule start,cloud36below hadmove_and_rebuilddisabled as I worked through cleaning some disk cruft left by a previous tenantvalidate_envkicked off anyway at schedule start time regardless ofmove_and_rebuildbeing disabled