Problem
babs merge runs interactively on the login/submit node today. Two reasons
that doesn't scale to M4:
- It's heavy compute, not bookkeeping. At 1000 subjects — especially
once optional-zipping (#364) is removed and outputs aren't a single zip
key per subject — the octopus merge + git-annex-branch union is heavy
I/O+compute. Wrong thing to run on a login node.
- Multi-cluster needs it to fire automatically, when an array
finishes, on the cluster that ran the work — not as a remembered manual
step a human runs later from a submit node.
Proposal
Submit babs merge as its own SLURM job, --dependency=afterany on
the job array (failures ok — merge folds in whatever branches succeeded).
It's a real compute job, not a tiny finalizer, so the queue-slot +
AssocGrpCpuLimit cost is justified: it's doing work that has to happen
somewhere, not babysitting.
Post-merge hook
The merge job's last act (login-node-cheap) writes the per-derivative merge
state and pushes it — the durable signal a study/superdataset ledger
consumes. Keep the heavy work in the job; keep the bookkeeping in a cheap
hook. Needs upstream babs support for (a) submitting merge as a
dependent job and (b) a post-merge hook point.
Relations
Not filed upstream yet (upstream-NOT-FILED).
Problem
babs mergeruns interactively on the login/submit node today. Two reasonsthat doesn't scale to M4:
once optional-zipping (#364) is removed and outputs aren't a single zip
key per subject — the octopus merge + git-annex-branch union is heavy
I/O+compute. Wrong thing to run on a login node.
finishes, on the cluster that ran the work — not as a remembered manual
step a human runs later from a submit node.
Proposal
Submit
babs mergeas its own SLURM job,--dependency=afteranyonthe job array (failures ok — merge folds in whatever branches succeeded).
It's a real compute job, not a tiny finalizer, so the queue-slot +
AssocGrpCpuLimitcost is justified: it's doing work that has to happensomewhere, not babysitting.
Post-merge hook
The merge job's last act (login-node-cheap) writes the per-derivative merge
state and pushes it — the durable signal a study/superdataset ledger
consumes. Keep the heavy work in the job; keep the bookkeeping in a cheap
hook. Needs upstream babs support for (a) submitting merge as a
dependent job and (b) a post-merge hook point.
Relations
--dependency=afteranyfinalizer" verdict inRecording finished-job state — how is "done" stored? #13 — that assumed merge was login-node-cheap bookkeeping; it's compute.
datalad run babs mergefor merge-commit provenance).babs status --done/--merged— machine-readable gates #12's--mergedconfirms the result machine-readably.Not filed upstream yet (
upstream-NOT-FILED).