Skip to content

babs merge as a dependent (afterany) compute job + post-merge hook #46

@asmacdo

Description

@asmacdo

Problem

babs merge runs interactively on the login/submit node today. Two reasons
that doesn't scale to M4:

  • It's heavy compute, not bookkeeping. At 1000 subjects — especially
    once optional-zipping (#364) is removed and outputs aren't a single zip
    key per subject — the octopus merge + git-annex-branch union is heavy
    I/O+compute. Wrong thing to run on a login node.
  • Multi-cluster needs it to fire automatically, when an array
    finishes, on the cluster that ran the work — not as a remembered manual
    step a human runs later from a submit node.

Proposal

Submit babs merge as its own SLURM job, --dependency=afterany on
the job array (failures ok — merge folds in whatever branches succeeded).
It's a real compute job, not a tiny finalizer, so the queue-slot +
AssocGrpCpuLimit cost is justified: it's doing work that has to happen
somewhere, not babysitting.

Post-merge hook

The merge job's last act (login-node-cheap) writes the per-derivative merge
state and pushes it — the durable signal a study/superdataset ledger
consumes. Keep the heavy work in the job; keep the bookkeeping in a cheap
hook. Needs upstream babs support for (a) submitting merge as a
dependent job and (b) a post-merge hook point.

Relations

Not filed upstream yet (upstream-NOT-FILED).

Metadata

Metadata

Assignees

No one assigned

    Labels

    babs-upstreamFix lands in PennLINC/babs; carries the upstream #Nupstream-NOT-FILEDThe upstream issue hasn't been filed yet (any upstream repo)

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions