Adding common_paths#376
Conversation
Replaces the hard-coded `dataset_description.json` in sparse-checkout, `datalad get`, and `datalad run -i` with a configurable `common_paths` list on each input dataset entry. Defaults to `["dataset_description.json"]` to preserve existing behaviour. An empty list disables all common-path inclusion. Closes PennLINC#374 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add common_paths to the section overview list and optional sections list, add a stub required_files section, and add a full common_paths section with examples and usage notes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add common paths to the submit script test inputs. No need to add to the zipped ones, they are skipped.
|
Opened djarecka#1 to fixup the tests. Otherwise looks good to me! |
Fixup tests
|
I've found an additional requirement for how this needs to function that changes how the PR should be implemented. The problem I hitsparse-checkout doesn't include root-level bids sidecars in the per-subject job. (BIDS Inheritance Principle), e.g. a top-level what we need to do
In inheritance terms the tiers are root → AI-generated pseudocode (one way to implement) + gotchasGotchas found while testing, that shape the approach:
Default (in Template ( {# Resolve common_paths (globs allowed) at runtime, from git. ls-tree reads HEAD
-> independent of checkout/sparse state, and non-recursive -> root-anchored.
We resolve to LITERAL paths because `datalad get -n` can't take a glob. #}
resolve_tier() { # $1 = dir relative to dataset root ('' = dataset root)
git -C "{{ input_dataset['path_in_babs'] }}" ls-tree HEAD "$1" \
| awk '$2 == "blob" { print $4 }' \
| while IFS= read -r f; do
for pat in {% for p in input_dataset['common_paths'] %}'{{ p }}' {% endfor %}; do
case "${f##*/}" in $pat) printf '%s\n' "$f"; break ;; esac
done
done
}
mapfile -t common_paths < <(resolve_tier '') # root tier (always)
{% if processing_level == 'session' %}
mapfile -O "${#common_paths[@]}" -t common_paths < <(resolve_tier "${subid}") # subject tier
{% endif %}
# Hand the resolved LITERAL paths to all three consumers:
for c in "${common_paths[@]}"; do
datalad get -n "{{ input_dataset['path_in_babs'] }}/${c}" # no-op for in-git, fetches annexed
done
{ echo "${subid}{% if processing_level == 'session' %}/${sesid}{% endif %}"
printf '%s\n' "${common_paths[@]}"
} | ( cd "{{ input_dataset['path_in_babs'] }}" && git sparse-checkout set --stdin )
inputs=(); for c in "${common_paths[@]}"; do inputs+=( -i "{{ input_dataset['path_in_babs'] }}/${c}" ); done
# datalad containers-run ... "${inputs[@]}" ...Notes: this must run after the input subdataset is installed (so |
closes #374
Adding
common_pathsto input dataset config for files from dataset top level (notnon-subject).Replaces the hard-coded
dataset_description.jsonin sparse-checkout.