-
Notifications
You must be signed in to change notification settings - Fork 23
Description
copying a Slack DM discussion here
@aidanheerdegen and Utkarsh discovered ACCESS-OM2 is not reproducible across restarts, i.e. 2x1-day runs is different from 1x2-day run. The non-reproducibility was detected via this test in this PR.
I've done some test runs in ~aek156/payu/om2-restart-repro and confirmed this problem occurs even when comparing 2x2-timestep vs 1x4-timestep runs (the shortest possible - can't run for one timestep) - see ~aek156/payu/om2-restart-repro, e.g. see all the md5 differences in
diff /g/data/v45/aek156/outputs/om2-restart-repro/1deg_jra55_iaf_2step/output002/manifests/restart.yaml /g/data/v45/aek156/outputs/om2-restart-repro/1deg_jra55_iaf_4step/output001/manifests/restart.yaml
Aidan found some old TWG notes suggesting we used to have reproducibility across restarts https://cosima.org.au/index.php/2018/12/13/technical-working-group-meeting-december-2018/
Nic's COSIMA repro tests use MOM built with a --repro flag https://github.com/COSIMA/access-om2/blob/master/test/exp_test_helper.py#L217-L218
but it was apparently never turned on for production builds https://github.com/COSIMA/access-om2/blame/master/install.sh#L49