Skip to content

Add extra_paths to input dataset config to retain non-subject files (e.g. nidm.ttl) #374

@asmacdo

Description

@asmacdo

Add extra_paths to input dataset config to retain non-subject files (e.g. nidm.ttl)

Problem

babs init-generated participant_job.sh hard-codes the per-subject sparse-checkout
of each non-zipped input dataset to only ${subid} (and ${sesid}) plus
dataset_description.json:

babs/templates/participant_job.sh.jinja2:102

That means any other dataset-root files needed by the BIDS app are dropped from
the working tree.

Concrete case discovered with @djarecka: an NIDM input dataset has
sourcedata/NIDM/nidm.ttl which is symlinked into each subject. After the
per-subject sparse-checkout runs, nidm.ttl is no longer in the working tree,
the symlinks dangle, and datalad run fails with "no such file".

There should be a way to configure extra_paths so these files are included
in the sparse-checkout set.

The existing required_files field in the input dataset YAML doesn't help.
It's a per-subject inclusion filter, and per
docs/preparation_config_yaml_file.rst:115 it's not implemented yet anyway.

Proposal

Add an optional extra_paths list to each entry in input_datasets, e.g.:

input_datasets:
  NIDM:
    is_zipped: false
    origin_url: ...
    path_in_babs: sourcedata/NIDM
    extra_paths:
      - nidm.ttl

In babs/templates/participant_job.sh.jinja2, each extra_path should be:

  1. Added to the per-subject git sparse-checkout set --stdin list (line 102).
    This is the load-bearing change — it's what makes the file visible to
    the BIDS app.
  2. Optionally added to datalad run -i so annex content (if any) is fetched
    and provenance is recorded.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions