Context
PR #369 lets a babs project be initialized in-place at the final derivatives
location (analysis_path: . inside <study>/derivatives/<Pipeline-Ver>/).
After babs merge + unzip, the project root contains the BIDS-derivatives
shape: sub-XXXX/, dataset_description.json, code/, etc. — i.e., it's
intended to look like a normal BIDS-derivatives dataset.
But babs init configures the dataset with cfg_proc: yoda, which gives it
yoda's default .gitattributes:
* annex.backend=MD5E
**/.git* annex.largefiles=nothing
CHANGELOG.md annex.largefiles=nothing
README.md annex.largefiles=nothing
Everything not matched by those three rules goes to annex. In particular:
Reference: what Joe does
The .gitattributes Joe ships in published OpenNeuroDerivatives fmriprep
datasets (e.g.
https://github.com/OpenNeuroDerivatives/ds005374-fmriprep/blob/main/.gitattributes):
* annex.backend=MD5E
**/.git* annex.largefiles=nothing
**/.reproman/**/* annex.largefiles=largerthan=40kb
*.tsv annex.largefiles=largerthan=40kb
*.json annex.largefiles=largerthan=40kb
*.bvec annex.largefiles=largerthan=40kb
*.bval annex.largefiles=largerthan=40kb
README annex.largefiles=nothing
README.md annex.largefiles=nothing
CHANGES annex.largefiles=nothing
dataset_description.json annex.largefiles=nothing
desc-aparcaseg_dseg.tsv annex.largefiles=nothing
desc-aseg_dseg.tsv annex.largefiles=nothing
logs/CITATION* annex.largefiles=nothing
.bidsignore annex.largefiles=nothing
Key insight: size-thresholded annexing (largerthan=40kb) for the
extension-globs catches all small BIDS metadata sidecars (*.json, *.tsv,
*.bvec, *.bval) in git automatically, while still annexing the big JSON/TSV
Compare with datalad-neuroimaging's cfg_bids procedure
(https://github.com/datalad/datalad-neuroimaging/blob/master/datalad_neuroimaging/resources/procedures/cfg_bids.py),
which uses exhaustive enumeration instead of size thresholds:
force_in_git = [
'README*', 'CHANGES', 'LICENSE',
'dataset_description.json',
'.bids-validator-config.json',
'.bidsignore',
'code/**',
# '*.tsv' deliberately omitted — privacy
]
Proposed fix
Adopt Joe's patterns for datalad-neuroimaging and then use that for babs default .gitattributes plus a babs-specific
addition for the BIDS-study layout introduced in #369:
.babs/** annex.largefiles=nothing
Context
PR #369 lets a babs project be initialized in-place at the final derivatives
location (
analysis_path: .inside<study>/derivatives/<Pipeline-Ver>/).After
babs merge+ unzip, the project root contains the BIDS-derivativesshape:
sub-XXXX/,dataset_description.json,code/, etc. — i.e., it'sintended to look like a normal BIDS-derivatives dataset.
But babs init configures the dataset with
cfg_proc: yoda, which gives ityoda's default
.gitattributes:Everything not matched by those three rules goes to annex. In particular:
dataset_description.json— annexed.babs/babs_init_config.yaml(committed via Adding an option to fit BIDS-study layout (updated) #369) — annexed*.jsonnext toderivatives) — annexed
Reference: what Joe does
The
.gitattributesJoe ships in published OpenNeuroDerivatives fmriprepdatasets (e.g.
https://github.com/OpenNeuroDerivatives/ds005374-fmriprep/blob/main/.gitattributes):
Key insight: size-thresholded annexing (
largerthan=40kb) for theextension-globs catches all small BIDS metadata sidecars (
*.json,*.tsv,*.bvec,*.bval) in git automatically, while still annexing the big JSON/TSVCompare with datalad-neuroimaging's
cfg_bidsprocedure(https://github.com/datalad/datalad-neuroimaging/blob/master/datalad_neuroimaging/resources/procedures/cfg_bids.py),
which uses exhaustive enumeration instead of size thresholds:
Proposed fix
Adopt Joe's patterns for datalad-neuroimaging and then use that for babs default .gitattributes plus a babs-specific
addition for the BIDS-study layout introduced in #369: