Skip to content

Default .gitattributes should be BIDS-friendly  #378

@asmacdo

Description

@asmacdo

Context

PR #369 lets a babs project be initialized in-place at the final derivatives
location (analysis_path: . inside <study>/derivatives/<Pipeline-Ver>/).
After babs merge + unzip, the project root contains the BIDS-derivatives
shape: sub-XXXX/, dataset_description.json, code/, etc. — i.e., it's
intended to look like a normal BIDS-derivatives dataset.

But babs init configures the dataset with cfg_proc: yoda, which gives it
yoda's default .gitattributes:

* annex.backend=MD5E
**/.git* annex.largefiles=nothing
CHANGELOG.md annex.largefiles=nothing
README.md annex.largefiles=nothing

Everything not matched by those three rules goes to annex. In particular:

Reference: what Joe does

The .gitattributes Joe ships in published OpenNeuroDerivatives fmriprep
datasets (e.g.
https://github.com/OpenNeuroDerivatives/ds005374-fmriprep/blob/main/.gitattributes):

* annex.backend=MD5E
**/.git* annex.largefiles=nothing
**/.reproman/**/* annex.largefiles=largerthan=40kb
*.tsv annex.largefiles=largerthan=40kb
*.json annex.largefiles=largerthan=40kb
*.bvec annex.largefiles=largerthan=40kb
*.bval annex.largefiles=largerthan=40kb
README annex.largefiles=nothing
README.md annex.largefiles=nothing
CHANGES annex.largefiles=nothing
dataset_description.json annex.largefiles=nothing
desc-aparcaseg_dseg.tsv annex.largefiles=nothing
desc-aseg_dseg.tsv annex.largefiles=nothing
logs/CITATION* annex.largefiles=nothing
.bidsignore annex.largefiles=nothing

Key insight: size-thresholded annexing (largerthan=40kb) for the
extension-globs catches all small BIDS metadata sidecars (*.json, *.tsv,
*.bvec, *.bval) in git automatically, while still annexing the big JSON/TSV

Compare with datalad-neuroimaging's cfg_bids procedure
(https://github.com/datalad/datalad-neuroimaging/blob/master/datalad_neuroimaging/resources/procedures/cfg_bids.py),
which uses exhaustive enumeration instead of size thresholds:

force_in_git = [
    'README*', 'CHANGES', 'LICENSE',
    'dataset_description.json',
    '.bids-validator-config.json',
    '.bidsignore',
    'code/**',
    # '*.tsv' deliberately omitted — privacy
]

Proposed fix

Adopt Joe's patterns for datalad-neuroimaging and then use that for babs default .gitattributes plus a babs-specific
addition for the BIDS-study layout introduced in #369:

.babs/** annex.largefiles=nothing

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions