Skip to content

Add pre_run/post_run hooks; make zipping an optional built-in hook#381

Open
asmacdo wants to merge 18 commits into
PennLINC:mainfrom
asmacdo:hooks-pr1-splice-points
Open

Add pre_run/post_run hooks; make zipping an optional built-in hook#381
asmacdo wants to merge 18 commits into
PennLINC:mainfrom
asmacdo:hooks-pr1-splice-points

Conversation

@asmacdo

@asmacdo asmacdo commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Implements the hooks proposal discussed in #365 — the splice-point mechanism plus its first real consumer: output zipping moves out of the generated run script into an optional built-in post_run hook, making zipping opt-in (#327, #364).
(Originally split as two stacked PRs; combined per the 2026-06-10 community meeting, since a release will be cut before this merges and the split bought nothing.)

Part 1 — the hooks mechanism

A new optional hooks: section in the container config YAML splices user-supplied shell commands into each participant job at two splice points that bracket the BIDS App run:

  • pre_run — after job setup, just before the datalad run that executes the BIDS App.
  • post_run — after that datalad run completes (its outputs committed), before results are pushed to the output RIA.
hooks:
  pre_run:
    - echo "starting ${subid}"               # snippet: spliced verbatim, runs inline
    - script: /path/to/validate-inputs.sh    # script: copied into code/hooks/ at init
  post_run:
    - script: /path/to/validate-outputs.sh
    - builtin: zip                           # built-in: ships with BABS

Three entry forms:

  • snippet — a bare string, spliced verbatim into participant_job.sh.
  • script{script: <absolute path>}; copied into analysis/code/hooks/<basename>.sh at babs init (the same mechanism as imported_files, so the hook is git-tracked in the analysis dataset) and run as bash ./code/hooks/<name>.sh.
  • built-in{builtin: <name>}; a static script shipped inside BABS, copied in like a script hook. Its params become shell arguments at the splice site, resolved at config time — several instances of one built-in share a single script and differ only in args. zip is the first built-in.

Design points:

  • Hooks splice outside the datalad run wrapper. The hook author owns commit semantics: the safe default use is validation that fails the job (a non-zero post_run hook aborts before push, so bad results never leave the node); a hook that persists output runs its own datalad run/save — which is exactly what the zip built-in does. This is also what lets a future preprocessing hook (NORDIC) produce an intermediate without committing it.
  • A subshell-scoped exported runtime contract. Each splice region is a subshell exporting subid (+ sesid at session level), BRANCH, PROJECT_ROOT, JOB_SCRATCH_DIR, so a separate-process hook can read them; a hook's cd/variable changes can't leak into the rest of the job, and the export can't leak into the BIDS App environment. set -e is preserved: a failing hook fails the job.
  • Resolution is a pure function. babs/hooks.py:resolve_hooks() classifies config entries (no I/O); bootstrap executes the copies through the existing _init_import_files. Collisions key on conflicting content, not name reuse: the same hook used at both splice points is copied once; two different sources claiming the same destination raise at init. Built-in names and params are validated against a registry, so a typo fails babs init, not the job.

Part 2 — zip as a built-in hook, and the config swap

The zip/output config is redefined around one new top-level key, output_dir — the folder the app writes into (relative to the dataset root), carrying the full versioned derivative name. It replaces both zip_foldernames (the name was derived from key + version) and all_results_in_one_zip:

# before                                   # after
all_results_in_one_zip: true               output_dir: outputs/fmriprep_anat-24-1-1
zip_foldernames:                           hooks:
    fmriprep_anat: "24-1-1"                  post_run:
                                               - builtin: zip   # zips `output_dir`

output_dir is the single source for the versioned name — app write dir = datalad run -o declaration = zip source = zip name, one string that can't drift. Zipping itself becomes just a hook: no zip hook = no zipping.

  • The zip hook takes two optional parameters: path: (the folder to zip; defaults to the top-level output_dir) and name: (the archive-name stem — the X in ${subid}[_${sesid}]_X.zip; defaults to path's basename). In the old schema the map value only ever versioned the archive name; a free-form stem keeps that without any derivation.

  • Multi-zip (what the multi-entry zip_foldernames map did): one zip hook per folder, each with its own path: (+ name: when the app controls the folder name and the archive should stay versioned):

    output_dir: outputs    # the app writes here, creating the subfolders below
    hooks:
      post_run:
        - builtin: zip
          path: outputs/fmriprep_func
          name: fmriprep_func-24-1-1     # -> ${subid}_fmriprep_func-24-1-1.zip
        - builtin: zip
          path: outputs/freesurfer
          name: freesurfer-24-1-1        # -> ${subid}_freesurfer-24-1-1.zip
  • The zip built-in is a static script taking its params as args at the splice site (visible in participant_job.sh); all instances share one code/hooks/zip.sh. Nothing zip varies on needs render time — sesid presence encodes the processing level at runtime, per the splice contract. (An init-time rendered built-in form was prototyped during this work and removed as unconsumed; it returns alongside the container-running built-ins only if runtime arg/env composition turns out not to suffice there.)

  • The hook owns its own commit: datalad unlock7z inside its own datalad run --explicit → a separate git rm of the granular outputs (separate because datalad run --explicit doesn't track deletions — datalad run --explicit does not save file deletions in --output` paths datalad/datalad#7822, since fixed upstream; TODO in-script to fold it back in once babs's minimum datalad has the fix).

  • The generated run script no longer zips: <container>_zip.sh<container>_run.sh, the in-script 7z + rm -rf outputs tail is gone, and the participant job's datalad run -o declares the granular output_dir.

  • All shipped example configs (notebooks/eg_*.yaml), test fixtures, and docs are migrated; the legacy derivation survives only for pipeline mode, which the NORDIC follow-up deletes.

Breaking change, on purpose: the legacy keys hard-error with a migration message — a clean break, no deprecation shim (accepted at the 2026-06-10 community meeting; a release is cut before this merges).

Demo: what babs init produces

From a config with output_dir: outputs/fmriprep_anat-24-1-1 and an argless {builtin: zip} at post_run (simbids, in the slurm-docker-ci container):

code/hooks/zip.sh — the static zip built-in, copied in verbatim (git-tracked, so you can read exactly what will run)
#!/bin/bash
# Built-in babs `zip` hook: archive a BIDS App output folder, commit the
# archive, and remove the granular outputs it replaces.
#
# Copied verbatim at `babs init` into `code/hooks/zip.sh`. Runs at the
# `post_run` splice point (cwd = the job's dataset clone) as a separate
# process: `subid` (and `sesid` at session level) arrive via the exported
# splice-point contract; what to zip arrives as arguments:
#
#   zip.sh <path> [<name>]
#
# <path> is the folder to zip, relative to the dataset root. <name> is the
# archive-name stem (the X in ${subid}[_${sesid}]_X.zip), defaulting to
# <path>'s basename. The archive is written to the dataset root and contains
# the folder itself (not its parents), matching the layout of babs zips to
# date.
set -e -u -x

path="$1"
name="${2:-$(basename "$path")}"
zip_dir="$(dirname "$path")"
zip_folder="$(basename "$path")"

# subid is exported by the splice-point subshell in participant_job.sh;
# sesid only at session level, so its presence encodes the processing level.
# shellcheck disable=SC2154
ZIP_ID="${subid}${sesid:+_${sesid}}"
ZIP_NAME="${ZIP_ID}_${name}.zip"

# Zip real file content, not annex symlinks:
datalad unlock "${path}"

# cd into the parent so the archive contains the folder at its top level;
# OLDPWD (the dataset root) is where the archive lands.
datalad run \
	--explicit \
	--output "${ZIP_NAME}" \
	-m "Zip ${path} for ${ZIP_ID}" \
	-- \
	"cd ${zip_dir} && 7z a \"\${OLDPWD}/${ZIP_NAME}\" ${zip_folder}"

# `datalad run --explicit` does not track deletions, so the granular outputs
# are removed in a separate commit (workaround for datalad/datalad#7822,
# since fixed upstream).
# TODO research which datalad version shipped the datalad/datalad#7822 fix;
# once babs's minimum supported datalad is at or above it, fold this removal
# into the datalad run above and drop this step.
git rm -rf -q --sparse "${path}"
git commit -m "Remove ${path} for ${ZIP_ID} (zipped)"
code/participant_job.sh — the run's -o declares the granular output_dir; the post_run splice runs the zip hook with its resolved argument
# datalad run:
datalad run \
	-i "code/simbids-0-0-3_run.sh" \
	-i "inputs/data/BIDS/${subid}" \
	-i "inputs/data/BIDS/dataset_description.json" \
	-i "containers/.datalad/environments/simbids-0-0-3/image" \
	--expand inputs \
	--explicit \
	-o "outputs/fmriprep_anat-24-1-1" \
	-m "simbids-0-0-3 ${subid}" \
    "bash ./code/simbids-0-0-3_run.sh ${subid} "
# post_run hooks: spliced after the run, before push; subshell exports the contract.
(
  export subid BRANCH PROJECT_ROOT JOB_SCRATCH_DIR
bash ./code/hooks/zip.sh outputs/fmriprep_anat-24-1-1
)

# Finish up:
# push result file content to output RIA storage:
echo '# Push result file content to output RIA storage:'
datalad push --to output-storage
analysis dataset history after babs init — the static hook rides the imported-files commit
80bdd80 Save anything in folder code/ that hasn't been saved
2f4b876 Template for job submission
dc34509 Record of inclusion/exclusion of participants/sessions
695f62a Import files
3d4cab1 Generate script of running container
1e4f86f [DATALAD] Added subdataset
465ea59 Register input data dataset 'BIDS' as a subdataset
059b1da Initial save of babs_proj_config.yaml
62358f3 Save .gitignore file
580eaf7 Apply YODA dataset setup
45194cd [DATALAD] new dataset

What this deliberately does not do

  • Pipeline mode is untouched (its call sites stay hooks-unaware; NORDIC-as-hook then deletes the whole codepath — see roadmap).
  • No user-supplied jinja templates, no rendered built-ins. User content is only ever copied, never rendered.

Testing

  • Resolver unit tests: form classification, content-collision semantics, builtin registry/param validation, arg resolution + shell quoting.
  • Render tests: no-hooks output unchanged; hooks spliced in order at the right positions; sesid exported only at session level.
  • Init-level tests: hook materialized + spliced (positive), no-hooks ⟹ no code/hooks/ (negative), zip builtin copied verbatim with its resolved arg visible.
  • The raw_bids e2e splices a contract-guard script hook at both points (: "${subid:?}" etc. fails the job at runtime if the exported contract breaks) and runs post_run: [guard, {builtin: zip}] — guard before zip because hook ordering is load-bearing (a validator must see the outputs before zip flattens them).
  • Known state at last push: 99/103 unit-level green; 4 session-level render-shellcheck failures on non-*prep apps (suspected: sesid="$2" now unused once in-script zip is gone) — to fix, plus a fresh full-e2e run.

Follow-ups (next PRs)

  1. A generic singularity built-in (container-running hook, no datalad) — NORDIC then becomes that hook plus its specifics as additional hooks, after which pipeline mode is deleted (NORDIC is its only user).
  2. babs merge as a dependent slurm compute job (upstream issue to come) — makes the two-commit zip's heavier merge acceptable at ~1k-subject scale.
  3. babs status when zipping is off: merged-results accounting currently assumes zips; likely derivable from branch ancestry in git history instead of new persisted state.

asmacdo and others added 9 commits June 10, 2026 10:02
Add two optional, ordered hook lists to the participant job script:
- pre_app: spliced before the `datalad run` wrapper
- post_run: spliced after the run, before the push to output storage

Each splice region is a subshell that exports the contract vars
(subid, BRANCH, PROJECT_ROOT, JOB_SCRATCH_DIR; plus sesid at session
level) so separate-process hook scripts read them by name, while the
export cannot leak into the datalad run / container. processing_level
is intentionally not exported: it is a render-time constant, and sesid
presence already distinguishes subject vs session.

Hooks reach the template as list[str]; resolving the YAML hooks: block
into those strings is a later step. With no hooks configured the render
is byte-identical to before (jinja {% if %} guard + trim/lstrip).

Thread hook_pre_app/hook_post_run (default None) through
generate_submit_script(); both existing call sites are unaffected.

Add render tests: no-hooks (None == [], no markers, shellcheck),
pre_app+post_run expansion (contract export, snippet order, position
around the run wrapper, shellcheck), and session-only sesid export.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
First resolution step for the participant_job splice points (8b4185a):
turn a `hooks:` config block into the command strings those splice points
consume, plus the files babs materializes at init. This is what lets
cross-cutting concerns -- optional zip, NORDIC, a BIDS-validator gate --
live as user-configurable hooks in one codepath instead of diverged
templates. Design: PennLINC#365.

babs/hooks.py (pure, no I/O): resolve_hooks(hooks_config, *, source_base)
classifies each entry into Verbatim (form a, raw snippet) or CopyIn
(form b, `{script: <path>}` copied to code/hooks/<basename>.sh). Render
(templated built-ins / container hooks) is defined as the forward-compat
seam but not yet produced. Fail-fast ValueError on bad config.
CopyIn.as_import() emits the {original_path, analysis_path} shape
_init_import_files already consumes, so a later slice wires materialization
with no duplicated copy logic.

tests/test_hooks.py: 21 unit tests (run in the slurm-docker-ci container;
conftest pulls slurm deps at collection).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A hook collision is about the materialized file, not the name alone. Key the
check on descriptor equality: the *same* hook reused at multiple splice
points (e.g. a BIDS validator at both pre_app and post_run) is copied once
and referenced from each list, while two *different* descriptors claiming the
same code/hooks/<name>.sh still raise. Because Render carries `context`, this
also catches one template rendered two ways into the same name (a real
conflict) once Render is wired.

tests: same-source-at-both-points materializes once; different-source-same-
name still collides; Render equality distinguishes context (pins the
invariant the rule depends on, since resolve_hooks doesn't emit Render yet).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Follow babs's existing convention for copied-in local files instead of
inventing a parallel one. `imported_files.original_path` -- which form (b)
reuses via _init_import_files -- is an absolute local path used as-is
(op.exists + copy, no cwd-join, no abspath validation); mechababs likewise
resolves the inclusion file to absolute before passing it ("so datalad run cp
doesn't need the cwd to match"). So resolve_hooks no longer takes source_base
and no longer joins: CopyIn.original_path is the `script:` value verbatim, and
_init_import_files raises FileNotFoundError on a bad path exactly as it does
for imported_files today.

This also lands on the right side of the path-handling review on
PennLINC#369: source paths are absolute (the explicitly-supported
copy-in case), while the destination (code/hooks/<name>.sh) stays relative
and is containment-validated by _validate_name.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Connect the resolver to babs for the single-app codepath:

- Render: Container.generate_bash_participant_job resolves
  `self.config.get('hooks')` and passes hook_pre_app/hook_post_run to
  generate_submit_script (container.py).
- Materialize: babs_bootstrap appends the resolved CopyIns'
  as_import() dicts to the existing shared _init_import_files call, so
  hook scripts copy into code/hooks/ with no duplicated import logic.
- _init_import_files now makedirs the destination's parent (hooks land in
  code/hooks/, which -- unlike flat code/ -- doesn't pre-exist).

The pipeline call site (bootstrap.py:508) is left unwired by design: it
defaults hook_pre_app=None so pipeline renders byte-identical, and pipeline
mode is slated for deletion once NORDIC is a hook. No hooks configured =>
no code/hooks/ dir and no splice subshell (both are guarded), so projects
without hooks are untouched.

Tests:
- test_babs_init_raw_bids: splice a form-(b) contract-guard hook at both
  pre_app and post_run. It runs as a separate process, so it only sees the
  contract vars (subid/BRANCH/PROJECT_ROOT/JOB_SCRATCH_DIR) because the
  splice subshell exports them; `${var:?}` fails the job under set -e if any
  is unset. Same source at both points also exercises copy-once dedup. Plus
  a post-init assertion that the guard is materialized + spliced -- a dropped
  hook would otherwise pass silently (no guard left to fail).
- test_babs_init_single_app_hooks: init-only wiring check (no job execution)
  -- asserts a configured hook lands in code/hooks/ and is referenced twice.
- test_babs_init_list_sub_file: assert no code/hooks/ dir when no hooks.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the alphabetic form labels in docstrings/comments with names derived
from the config syntax, so a reader doesn't have to decode "(a)/(b)/(c)":

  (a) raw snippet  -> snippet      (Verbatim)
  (b) user script  -> script       (CopyIn)
  (c) container    -> templated built-in  (Render)

"templated built-in" names the Render *mode* (babs renders a shipped
*.sh.jinja2 into code/hooks/), which serves both a zip hook and a
container-running hook (e.g. nordic) alike -- they differ only in what the
shipped template does, not in how it's produced. So there is no distinct
"container" form; a container-running hook is just a {builtin: <name>} whose
template composes a singularity run.

Comments/docstrings only -- no behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The two splice points bracket the same `datalad run`, so name them for that
boundary: pre_run / post_run (was pre_app / post_run, which mixed "app" and
"run" vocab). Renames the config key, the hook_pre_app -> hook_pre_run template
param, and the test markers. Config-key churn is cheapest now, before the keys
are public. (Review feedback, PR1-review Q8.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a `hooks` section to the config-yaml docs: the pre_run/post_run splice
points, the snippet and script entry forms, the exported runtime contract
(subid/sesid/BRANCH/PROJECT_ROOT/JOB_SCRATCH_DIR) + subshell/set -e semantics,
and -- most importantly -- that hooks splice OUTSIDE the datalad run wrapper, so
a hook that writes files leaves uncommitted changes that aren't pushed; the safe
default is validation that fails the job, and persisting output requires the
hook to own its own datalad run/save. (Review feedback, PR1-review Q1/Q2/Q10/Q11.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two different scripts sharing a basename within one splice point produced
"(in 'pre_run' and 'pre_run')". Name the single point when prior_point ==
point, the pair otherwise. Adds test_same_point_same_name_collide and tightens
the cross-point test to assert the full location. (Review feedback, PR1-review Q12.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
asmacdo and others added 9 commits June 10, 2026 12:22
A {builtin: <name>} entry now resolves to a Render of the shipped
templates/hooks/<name>.sh.jinja2; keys beyond `builtin` become the
per-hook part of the render context (e.g. zip's optional `path`).
Bootstrap still gates with NotImplementedError until the next step
wires the actual rendering; CopyIns keep flowing to _init_import_files.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
`_init_render_hooks` renders each Render's shipped template
(babs/templates/hooks/) into code/hooks/<name>.sh at init and datalad
saves, replacing the NotImplementedError gate. Render context = the
per-hook config params plus the derived part (processing_level), with
derived values taking precedence.

zip.sh.jinja2 is the first built-in: unlock -> 7z inside its own
`datalad run --explicit` -> remove the granular outputs in a separate
commit (the datalad/datalad#7822 workaround, TODO'd for removal once
babs's minimum datalad carries the fix). One hook = one zip; the
archive is named ${subid}[_${sesid}]_<basename(path)>.zip and contains
basename(path) at its top level.

Also wraps a pre-existing >99-char line in hooks.py that failed ruff.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Asserts {builtin: zip, path: ...} renders code/hooks/zip.sh (id, archive
name, 7z line, granular removal), splices once at post_run, and lands
the "Materialize built-in hook scripts" init commit. The TODO-fenced
demo block copies the rendered artifacts into demo-output/ for
inspection after a docker run; remove before PR assembly.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
BUILTIN_PARAMS maps each built-in to its accepted per-hook params
(zip -> {path}). An unknown builtin or a typo'd param (e.g. `paht:`)
now fails at resolve time instead of flowing silently into the render
context (StrictUndefined only catches missing context, not unexpected
keys). The registry also subsumes name validation for built-ins: only
known names resolve, so a path-escaping name can't reach the template
loader.

Also wraps a >99-char line in test_hooks.py that failed ruff.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The inner run script stops being named for zipping (zip moves to the
post_run hook). generate_submit_script grows an optional `output_dir`
(mutually exclusive with the pipeline-only `zip_foldernames`): when
given, the datalad run declares the app output folder itself as the
output, committing granular results. Single-app still passes
zip_foldernames until the config-surface swap lands; pipeline mode is
untouched.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…t_dir

The top-level output_dir (the folder the app writes into, carrying the
versioned derivative name) is now the single source for the app write
dir, the datalad run -o declaration, and the default zip-hook path.
Legacy keys hard-error with a migration message
(app_output_settings_from_config stays for pipeline mode; dies in PR 3).

- bidsapp_run.sh.jinja2 de-zipped: cmd_zip gone, and the granular
  outputs survive the script (the run now commits them; the zip hook
  owns their removal)
- the zip built-in becomes a STATIC script taking <path> [<name>] as
  splice-site arguments (visible in participant_job.sh): path defaults
  to output_dir at resolve time, name (a free-form archive stem, the
  old map value's only real job) defaults to basename(path) at runtime,
  and sesid presence encodes the processing level. All zip instances
  share one code/hooks/zip.sh; multi-zip differs only in args. The
  Render seam stays reserved for the container-running built-ins (PR 3).
- notebooks/eg_*.yaml + test fixtures migrated; docs: output_dir
  section replaces zip_foldernames; raw_bids e2e flips post_run to
  [contract guard, zip] (guard before zip -- ordering is load-bearing)

WIP: not yet run in docker (units + e2e pending; docker held by the
PR-1 CI session).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
With zip shipped as a static arg-taking script, nothing produces a
Render: drop the dataclass, bootstrap's _init_render_hooks, and the
templated-builtin resolution branch; built-in CopyIns ride
_init_import_files like script hooks. Reintroduce a rendered form
alongside the container-running built-ins (NORDIC) only if runtime
arg/env composition turns out not to suffice there.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@asmacdo asmacdo changed the title Add pre_run and post_run hooks for user commands and scripts Add pre_run/post_run hooks; make zipping an optional built-in hook Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant