This is the base repository for the Schizophrenia Canadian Neuroimaging Database preprocessing and sharing workflow. Clone or fork this repo once per study cohort, then run the staged pipelines on SciNet.
New here? Start with the stage overview table or Workflow automation (stage scripts).
The diagram below shows how major pipelines are grouped across stages (structural, functional, diffusion, and share/export). For step-by-step commands, use the stage overview table or Workflow automation (stage scripts).
| Resource | Purpose | |
|---|---|---|
| π | docs/quick-start-workflow.md | Workflow automation with stage_*.sh |
| π | docs/qc-guide.md | Visual QC criteria for pipeline HTML reports |
| β | docs/share-folder-checklist.md | Checklist for data/share before consortium handoff |
After cloning, your study workspace is the SCanD_project folder (repo root).
${SCRATCH}/SCanD_project/
βββ assets/
β βββ figures/ # QC reference images for docs/qc-guide.md
β βββ pipeline-overview.png
βββ docs/
β βββ qc-guide.md
β βββ quick-start-workflow.md
β βββ share-folder-checklist.md
βββ LICENSE
βββ README.md
βββ stage_1.sh β¦ stage_6.sh
βββ templates/
β βββ parcellations/ # pre-downloaded fMRIPrep templates (see setup)
βββ code/ # pipeline scripts and configs
βββ containers/ # Singularity images (alphabetical)
β βββ fmriprep-25.2.4.simg
β βββ fmriprep_ciftity-v1.3.2-2.3.3.simg
β βββ freesurfer-7.4.1.simg
β βββ glm-0.0.1.simg
β βββ magetbrain.sif
β βββ mriqc-24.0.0.simg
β βββ nipoppy.sif
β βββ noddi_postproc-v.1.0.simg
β βββ qsiprep-0.22.0.sif
β βββ tbss_2023-10-10.simg
β βββ xcp_d-0.7.3.simg
βββ data/
β βββ local/
β β βββ bids/ # defaced BIDS dataset
β β βββ derivatives/ # pipeline outputs (alphabetical)
β β β βββ ciftify/
β β β βββ fmriprep/
β β β βββ freesurfer/
β β β βββ MAGeTbrain/
β β β βββ mriqc/
β β β βββ qsiprep/
β β β βββ smriprep/
β β β βββ xcp_d/
β β β βββ xcp_noGSR/
β β βββ dtifit/
β β βββ enigmaDTI/
β β βββ qsirecon/
β β βββ qsirecon-FSL/
β βββ share/ # consortium handoff subset (see docs/share-folder-checklist.md)
β βββ amico_noddi/
β βββ ciftify/
β βββ enigmaDTI/
β βββ fmriprep/25.2.4/
β βββ freesurfer_group/
β βββ glm/0.0.1/
β βββ magetbrain/
β βββ manifest.tsv
β βββ mriqc/24.0.0/
β βββ noddireg/
β βββ participants.tsv
β βββ processing_status.tsv
β βββ processing_status_fmriprep.tsv
β βββ processing_status_qsiprep.tsv
β βββ qsiprep/0.22.0/
β βββ smriprep/25.2.4/
β βββ tractify/
β βββ xcp_d/0.7.3/
β βββ xcp_noGSR/
βββ logs/ # cluster job logs
βββ Neurobagel/
βββ project_id/
This branch targets the SciNet Fir cluster. Use the matching git branch when cloning (Fir, nibi, or trillium for other clusters).
| Stage | Step | Task | Estimated runtime |
|---|---|---|---|
| π οΈ stage 0 | 0a | Setting up the SciNet environment | ~30 minutes |
| 0b | Organize your data into BIDS | Varies | |
| 0c | Deface the BIDS data (if not done during BIDS conversion) | β | |
| 0d | Move BIDS data and label participants.tsv | Varies | |
| 0e | Initializing nipoppy trackers | ~2 minutes | |
| 0f | Edit fmap files | ~2 minutes | |
| stage 1οΈβ£ | 01a | Run MRIQC | ~8 hours on Slurm |
| 01b | Run QSIPrep | ~6 hours on Slurm | |
| 01c | Run fMRIPrep fit | ~16 hours on Slurm | |
| 01d | Run FreeSurfer | ~23 hours on Slurm | |
| 01e | Run smriprep | ~10 hours on Slurm | |
| 01f | Run MAGeTbrain init | ~1 hour on Slurm | |
| 01g | Check TSV file | β | |
| stage 2οΈβ£ | 02a | Run fMRIPrep apply | ~3 hours on Slurm |
| 02b | Run FreeSurfer atlas parcellation | ~6 hours on Slurm | |
| 02c | Run qsirecon FSL | ~20 minutes on Slurm | |
| 02d | Run AMICO NODDI | ~2 hours on Slurm | |
| 02e | Run tractography | ~12 hours on Slurm | |
| 02f | Run ciftify-anat | ~3 hours on Slurm | |
| 02g | Run MAGeTbrain register | ~24 hours on Slurm | |
| 02h | Check TSV file | β | |
| stage 3οΈβ£ | 03a | Run xcp-d | ~5 hours on Slurm |
| 03b | Run xcp-noGSR | ~5 hours on Slurm | |
| 03c | Run qsirecon dtifit | ~1 hour on Slurm | |
| 03d | Run noddi-registration | ~2 hours on Slurm | |
| 03e | Run GLM surface | ~30 minutes on Slurm | |
| 03f | Run MAGeTbrain vote | ~10 hours on Slurm | |
| 03g | Check TSV file | β | |
| stage 4οΈβ£ | 04a | Run ENIGMA-DTI | ~1 hour on Slurm |
| 04b | Check TSV file | β | |
| stage 5οΈβ£ | 05a | Run extract-NODDI | ~3 hours on Slurm |
| 05b | Check TSV file | β | |
| π€ stage 6 | 06a | Extract and share to consortium folder | ~8 hours on Slurm (Slurm + login-node scripts together) |
Note: Steps 01aβ02g follow the prompt order in
stage_1.shandstage_2.sh(ciftify-anat runs after tractography in stage 2).
Clone creates a folder named SCanD_project in $SCRATCH. That folder is the repository root β run all stage_*.sh and ./code/... commands from there.
Each study should be kept in a separate SCanD_project folder to prevent overwriting or mixing data between studies.
Before starting a new study:
- Either rename the existing
SCanD_projectfolder (e.g.,SCanD_project_study1), - Or move it elsewhere before cloning the repository again.
cd $SCRATCH
# Use the branch that matches your SciNet cluster:
git clone -b Fir --single-branch https://github.com/TIGRLab/SCanD_project.git # Fir
# git clone -b nibi --single-branch https://github.com/TIGRLab/SCanD_project.git # Nibi
# git clone -b trillium --single-branch https://github.com/TIGRLab/SCanD_project.git # Trilliumcd ${SCRATCH}/SCanD_project
source ./code/00_setup_data_directories.shThis is the longest - most human intensive - step. But it will make everything else possible! BIDS is really a naming convention for your MRI data that will make it easier for other people in the consortium (as well as the software/ pipeline that you are using) to understand what your data is (e.g. what scan types, how many participants, how many sessions). Converting your data into BIDS may require some renaming and reorganizing. No coding is required, but there are now a lot of different software projects out there to help with the process.
For amazing tools and tutorials for learning how to BIDS convert your data, check out the BIDS starter kit.
A useful tool is this BIDSonym BIDS app.
We want to put your data into:
./data/local/bids
You can copy (scp -r), link (ln -s), or move the data to this location β your choice.
If you are copying data from another computer or server, you should use the SciNet datamover (dm) node, not the login node!
To switch into the dm node:
ssh <cc_username>@fir.alliancecan.ca
rsync -av <local_server>@<local_server_address>:/<local>/<server>/<path>/<bids> ${SCRATCH}/SCanD_project/data/local/To link existing data from another location on SciNet Fir to this folder:
ln -s /your/data/on/scinet/bids ${SCRATCH}/SCanD_project/data/local/bidsAfter organizing the BIDS folder, populate participant labels (for example, sub-CMH0047) in ${SCRATCH}/SCanD_project/data/local/bids/participants.tsv. The first row must be participant_id; list one subject ID per row below it.
For example:
participant_id
sub-CMH00000005
sub-CMH00000007
sub-CMH00000012Also, make sure dataset_description.json exists inside your BIDS folder.
In this step, we initialize the nipoppy trackers and set up a folder structure based on the nipoppy directory specification:
cd ${SCRATCH}/SCanD_project
source ./code/00_nipoppy_trackers.shmkdir bidsbackup_json
rsync -zarv --include "*/" --include="*.json" --exclude="*" data/local/bids bidsbackup_jsonIn some cases, dcm2niix conversion fails to add the BIDS IntendedFor field in fmap JSON files, which causes errors in the fmriprep_apply step. Edit those fieldmaps (or run the script below) using a YAML configuration that matches your dataset naming.
This script automatically fills the "IntendedFor" field in BIDS fieldmap JSON files. It reads a YAML configuration file that describes your dataset's naming patterns, then links each fieldmap to correct fMRI or DWI files.
This helps make your dataset ready for tools like fMRIPrep, QSIPrep, and other BIDS-app pipelines.
## First load a python module
module load python/3.11.5
## Create a directory for virtual environments if it doesn't exist
mkdir ~/.virtualenvs
cd ~/.virtualenvs
virtualenv --system-site-packages ~/.virtualenvs/myenv
## Activate the virtual environment
source ~/.virtualenvs/myenv/bin/activate
python3 -m pip install pybids==0.15.6
cd $SCRATCH/SCanD_project
python3 ./code/fmap_intended_for.py ./data/local/bids --participant-label ./data/local/bids/participants.tsv --config ./code/config/EPIPHANI_query_config.yaml- Searches your BIDS dataset for fieldmaps (/fmap)
- Searches for fMRI and/or DWI data (/func, /dwi)
- Uses patterns defined in your YAML config to determine which fieldmap file should be used for functional or diffusion scans
- Writes a correct "IntendedFor" entry into each fieldmap JSON
fmap/sub-001_ses-01_acq-rest_run-01_epi.json
β IntendedFor: ["func/sub-001_ses-01_task-rest_run-01_bold.nii.gz",
"func/sub-001_ses-01_task-rest_run-02_bold.nii.gz"]
sub-001/
ses-01/
fmap/
sub-001_ses-01_acq-rest_dir-AP_run-01_epi.json
sub-001_ses-01_acq-rest_dir-AP_run-02_epi.json
func/
sub-001_ses-01_task-rest_run-01_bold.nii.gz
sub-001_ses-01_task-rest_run-02_bold.nii.gz
sub-001_ses-01_task-rest_run-03_bold.nii.gz
dwi/
sub-001_ses-01_dwi.nii.gzYou customize how your dataset is structured by editing the YAML file. An example of the config file can be found here
Query blocks tell the script which files to search for in the dataset. Each key-value pair corresponds to a BIDS entity β the script builds a filename filter from these values and returns all matching files across subjects and sessions.
Rules:
- Use a single string when all relevant files share one value for that entity (e.g.
task: rest) - Use a list when files may carry different values for that entity (e.g.
task: [rest, nback, gng]) - Set a field to
nullto omit it from the filter (i.e. match any value for that entity) - Do not rename the block keys (
bold_query,dwi_query,fmap_fmri_query,fmap_dwi_query) β only change the values
BOLD query
bold_query:
datatype: func
suffix: bold
task: rest # single task β change to a list if you have multiple: [rest, nback, gng]
extension: nii.gzThis matches BOLD files like:
sub-XXX_ses-01_task-rest_run-XX_bold.nii.gzFor a multi-task dataset, use:
bold_query:
datatype: func
suffix: bold
task: [rest, nback, gng]
extension: nii.gzThis would match all three task variants:
sub-XXX_ses-01_task-rest_run-XX_bold.nii.gz
sub-XXX_ses-01_task-nback_run-XX_bold.nii.gz
sub-XXX_ses-01_task-gng_run-XX_bold.nii.gzDWI query
dwi_query:
datatype: dwi
suffix: dwi
extension: nii.gzThis matches DWI files like:
sub-XXX_ses-01_dwi.nii.gzFor a dataset with multiple DWI acquisitions (e.g. multi-shell labelled with acq-), add the acquisition entity:
dwi_query:
datatype: dwi
suffix: dwi
acquisition: [multishell, singleshell]
extension: nii.gzThis would match:
sub-XXX_ses-01_acq-multishell_run-01_dwi.nii.gz
sub-XXX_ses-01_acq-singleshell_run-01_dwi.nii.gzNote: Most datasets have a single unlabelled DWI acquisition β in that case, the default config above (no
acquisitionfield) is correct. Addacquisitiononly if your DWI filenames include anacq-entity.
Fieldmap query (fMRI)
fmap_fmri_query:
datatype: fmap
suffix: [epi, phasediff, phase1, fieldmap] # list covers all common fieldmap types
acquisition: rest # matches the acq-rest label in the filename; set to null if absent
extension: jsonThis matches fieldmap JSON sidecar files like:
sub-XXX_ses-01_acq-rest_dir-AP_run-XX_epi.json
sub-XXX_ses-01_acq-rest_dir-PA_run-XX_epi.jsonNote: The
acquisitionfield here corresponds to theacq-<label>entity in the fieldmap filename β it is not related to the task label. If your fieldmap filenames do not include anacq-entity, setacquisition: null.
Fieldmap query (DWI)
fmap_dwi_query:
datatype: fmap
suffix: [epi, phasediff, phase1, fieldmap] # list covers all common fieldmap types
acquisition: dwi # matches the acq-dwi label in the filename; set to null if absent
extension: jsonThis matches fieldmap JSON sidecar files like:
sub-XXX_ses-01_acq-dwi_dir-AP_epi.json
sub-XXX_ses-01_acq-dwi_dir-PA_epi.jsonFor a single-shell dataset with no acq- label in the fieldmap filename, use:
fmap_dwi_query:
datatype: fmap
suffix: [epi, phasediff, phase1, fieldmap]
acquisition: null
extension: jsonThis would match:
sub-XXX_ses-01_dir-AP_epi.json
sub-XXX_ses-01_dir-PA_epi.jsonNote: The
acquisitionfield here corresponds to theacq-<label>entity in the fieldmap filename β it is not the DWI acquisition label. If your fieldmap filenames share anacq-label with your fMRI fieldmaps (e.g. both useacq-rest), you must use a distinct label (e.g.acq-dwi) on the DWI fieldmaps so the two queries return separate file sets. If noacq-entity is present in the DWI fieldmap filenames, setacquisition: null.
Important: You must include an entry for every session in
fmap_to_boldand/orfmap_to_dwi. The script only assigns fieldmaps to sessions that are explicitly listed β any session omitted from the config will be skipped and its fieldmaps will not be assigned to any functional or diffusion data.
The fmap_to_bold block tells the script which fieldmap(s) should be assigned to which BOLD run(s), on a per-session basis.
How it works:
- Each entry under a session lists a fieldmap key (
fmap) and one or more BOLD keys (bold_keys). - The
fmapvalue is a substring of the fieldmap filename β the script matches any fieldmap file whose name contains that string. - The
bold_keysvalues are substrings of the BOLD filenames β each matched BOLD file will have its path added to that fieldmap'sIntendedForfield.
Example: Given this dataset structure:
Fieldmap: fmap/sub-XXX_ses-01_acq-rest_dir-AP_run-01_epi.json
fmap/sub-XXX_ses-01_acq-rest_dir-AP_run-02_epi.json
fmap/sub-XXX_ses-01_acq-rest_dir-PA_run-01_epi.json
fmap/sub-XXX_ses-01_acq-rest_dir-PA_run-02_epi.json
BOLD: func/sub-XXX_ses-01_task-rest_run-01_bold.nii.gz
func/sub-XXX_ses-01_task-rest_run-02_bold.nii.gz
func/sub-XXX_ses-01_task-rest_run-03_bold.nii.gzThe following config assigns the AP and PA run-01 fieldmaps to BOLD runs 01 & 02, and the AP and PA run-02 fieldmaps to BOLD run 03 β for both sessions:
fmap_to_bold:
ses-01:
- fmap: "acq-rest_dir-AP_run-01"
bold_keys: ["task-rest_run-01", "task-rest_run-02"]
- fmap: "acq-rest_dir-AP_run-02"
bold_keys: ["task-rest_run-03"]
- fmap: "acq-rest_dir-PA_run-01"
bold_keys: ["task-rest_run-01", "task-rest_run-02"]
- fmap: "acq-rest_dir-PA_run-02"
bold_keys: ["task-rest_run-03"]
ses-02:
- fmap: "acq-rest_dir-AP_run-01"
bold_keys: ["task-rest_run-01", "task-rest_run-02"]
- fmap: "acq-rest_dir-AP_run-02"
bold_keys: ["task-rest_run-03"]
- fmap: "acq-rest_dir-PA_run-01"
bold_keys: ["task-rest_run-01", "task-rest_run-02"]
- fmap: "acq-rest_dir-PA_run-02"
bold_keys: ["task-rest_run-03"]Reading the mapping:
fmap key |
Matches fieldmap file(s) containing... | Assigned to BOLD runs... |
|---|---|---|
acq-rest_dir-AP_run-01 |
..._acq-rest_dir-AP_run-01_epi.json |
run-01, run-02 |
acq-rest_dir-AP_run-02 |
..._acq-rest_dir-AP_run-02_epi.json |
run-03 |
acq-rest_dir-PA_run-01 |
..._acq-rest_dir-PA_run-01_epi.json |
run-01, run-02 |
acq-rest_dir-PA_run-02 |
..._acq-rest_dir-PA_run-02_epi.json |
run-03 |
Tip: If your study only has one session, include only
ses-01underfmap_to_bold. If all sessions share the same mapping, duplicate the block for each session.
Important: You must include an entry for every session in
fmap_to_dwi. The script only assigns fieldmaps to sessions that are explicitly listed β any session omitted from the config will be skipped and its fieldmaps will not be assigned to any diffusion data.
The fmap_to_dwi block tells the script which fieldmap(s) should be assigned to which DWI run(s), on a per-session basis. It works the same way as fmap_to_bold, but targets diffusion-weighted imaging files instead of BOLD.
How it works:
- Each entry under a session lists a fieldmap key (
fmap) and one or more DWI keys (dwi_keys). - The
fmapvalue is a substring of the fieldmap filename β the script matches any fieldmap file whose name contains that string. - The
dwi_keysvalues are substrings of the DWI filenames β each matched DWI file will have its path added to that fieldmap'sIntendedForfield. - Use a string (not a list) for
dwi_keyswhen there is only one DWI file per session; use a list when there are multiple.
Example: Given this dataset structure:
Fieldmap: fmap/sub-XXX_ses-01_acq-dwi_dir-AP_run-01_epi.json
fmap/sub-XXX_ses-01_acq-dwi_dir-AP_run-02_epi.json
DWI: dwi/sub-XXX_ses-01_acq-multishell_run-01_dwi.nii.gz
dwi/sub-XXX_ses-01_acq-multishell_run-02_dwi.nii.gzThe following config assigns both the AP and PA fieldmaps to both DWI runs β for both sessions:
fmap_to_dwi:
ses-01:
- fmap: "acq-dwi_dir-AP_run-01"
dwi_keys: ["acq-multishell_run-01"]
- fmap: "acq-dwi_dir-AP_run-01"
dwi_keys: ["acq-multishell_run-02"]
ses-02:
- fmap: "acq-dwi_dir-AP_run-01"
dwi_keys: ["acq-multishell_run-01"]
- fmap: "acq-dwi_dir-AP_run-02"
dwi_keys: ["acq-multishell_run-02"]Reading the mapping:
fmap key |
Matches fieldmap file(s) containing... | Assigned to DWI runs... |
|---|---|---|
acq-dwi_dir-AP |
..._acq-dwi_dir-AP_epi.json |
run-01, run-02 |
acq-dwi_dir-PA |
..._acq-dwi_dir-PA_epi.json |
run-01, run-02 |
Simpler example: If each session has a single DWI acquisition with no run- or acq- label:
Fieldmap: fmap/sub-XXX_ses-01_acq-dwi_dir-AP_epi.json
DWI: dwi/sub-XXX_ses-01_dwi.nii.gzfmap_to_dwi:
ses-01:
- fmap: "acq-dwi_dir-AP"
dwi_keys: "dwi"
ses-02:
- fmap: "acq-dwi_dir-AP"
dwi_keys: "dwi"Tip: If your study only has one session, include only
ses-01underfmap_to_dwi. If all sessions share the same mapping, duplicate the block for each session.
The script updates each fieldmap JSON like:
{
"PhaseEncodingDirection": "j-",
"IntendedFor": [
"/ses-01/func/sub-001_ses-01_task-rest_run-01_bold.nii.gz",
"/ses-01/func/sub-001_ses-01_task-rest_run-02_bold.nii.gz"
]
}-
Only edit values, not keys
Do not rename sections likebold_queryorfmap_to_bold. Change only values, e.g.,task: restorbold_keys: ["task-rest_run-01","task-rest_run-02"]. -
bold_keys / dwi_keys
These are filename patterns, not arbitrary numbers.
Example: if a file issub-001_ses-01_task-rest_run-01_bold.nii.gz, usebold_keys: ["task-rest_run-01"]. -
Acquisition field
- Use the label if present in filenames:
acquisition: rest - Set to
nullif not in filenames:acquisition: null
- Use the label if present in filenames:
If your study collected fieldmaps for diffusion data and you plan to use them for distortion correction, you must ensure the IntendedFor field in your fieldmap files is correctly specified before running stage 1 Run fMRIPrep fit, Run fMRIPrep apply, and Run QSIPrep.
If IntendedFor is missing, QSIPrep and fMRIPrep will still run, but they will ignore your fieldmap and apply a synthetic fieldmap instead.
This guide shows:
- A correct example of a fieldmap file with an
IntendedForfield - How to check all participants before running QSIPrep
1. Verify a fieldmap manually
cd ${SCRATCH}/SCanD_project
grep "IntendedFor" -A10 data/local/bids/sub-CMH00000027/ses-01/fmap/sub-CMH00000027_ses-01_acq-dwi_dir-AP_epi.json # Replace this with actual pathYou should see something like
"IntendedFor": [
"ses-01/dwi/sub-CMH00000005_ses-01_dwi.nii.gz"
]This confirms that the fieldmap is correctly linked to your DWI scan.
2. Run the QC script
## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull #in case you need to pull new code
## Create a directory for virtual environments if it doesn't exist
mkdir ~/.virtualenvs
cd ~/.virtualenvs
virtualenv --system-site-packages ~/.virtualenvs/myenv
## Activate the virtual environment
source ~/.virtualenvs/myenv/bin/activate
python3 -m pip install pybids==0.15.6 rich
## Go to the repo
cd ${SCRATCH}/SCanD_project
python3 ./code/check_fmap_json.py ./data/local/bids ./data/local/bids/participants.tsv3. Interpret the output
You will see a summary table like this in the terminal:
| FileName | DataType | IntendedFor |
|---|---|---|
| sub-CMH0014_ses-01_dwi.nii.gz | dwi | β Invalid/Missing |
| sub-CMH0014_ses-01_task-rest_run-01_bold.nii.gz | func | β Invalid/Missing |
| sub-CMH0014_ses-01_task-rest_run-02_bold.nii.gz | func | β Invalid/Missing |
| sub-CMH0014_ses-01_task-rest_run-03_bold.nii.gz | func | β Invalid/Missing |
| sub-CMH0014_ses-02_dwi.nii.gz | dwi | β Invalid/Missing |
| sub-CMH0014_ses-02_task-rest_run-01_bold.nii.gz | func | β Invalid/Missing |
| sub-CMH0014_ses-02_task-rest_run-02_bold.nii.gz | func | β Invalid/Missing |
| sub-CMH0014_ses-02_task-rest_run-03_bold.nii.gz | func | β Invalid/Missing |
Summary:
- β Passed: 0
- β Failed: 8
- Total: 8
Action: If a β in the IntendedFor column, edit their fieldmap JSON to include the correct BOLD/DWI file paths before running fMRIPrep and QSIPrep.
Log File:
The same summary is saved in a log file for later reference:
cat ${SCRATCH}/SCanD_project/logs/fieldmap_qc_summary.logAfter setting up the SciNet environment and organizing your BIDS folder and participants.tsv file, you can run pipelines by stage using Workflow automation (stage scripts), or run individual pipelines as described below.
- Note: if you are running xcp-d pipeline (stage 3) for the first time, just make sure to run the codes to download the templateflow files before running the automated codes. You can find these codes below in xcp-d section.
Participant-array pipelines chunk participants.tsv the same way as the stage_*.sh scripts. When submitting manually, source the shared helper first:
cd ${SCRATCH}/SCanD_project
source ./code/lib/slurm_array.sh
scand_submit_participant_array ./code/01_mriqc_scinet.sh 1The examples below use scand_submit_participant_array with SUB_SIZE=1 unless noted otherwise.
## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull #in case you need to pull new code
source ./code/lib/slurm_array.sh
scand_submit_participant_array ./code/01_mriqc_scinet.sh 1## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull #in case you need to pull new code
source ./code/lib/slurm_array.sh
scand_submit_participant_array ./code/01_freesurfer_long_scinet.sh 1Note - the script enclosed uses some interesting extra options:
- it defaults to running all the fmri tasks - the
--task-idflag can be used to filter from there - it is running
synthetic distortioncorrection by default - instead of trying to work with the datasets available fieldmaps - because fieldmaps correction can go wrong - but this does require that the phase encoding direction is specified in the json files (for example"PhaseEncodingDirection": "j-").
# module load singularity/3.8.0 - singularity already on most nodes
## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull #in case you need to pull new code
source ./code/lib/slurm_array.sh
scand_submit_participant_array ./code/01_fmriprep_fit_scinet.sh 1## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull
source ./code/lib/slurm_array.sh
scand_submit_participant_array ./code/01_qsiprep_scinet.sh 1After QSIPrep completes, a per-pipeline sidecar with the distortion-correction method for each subject is written to:
./Neurobagel/derivatives/processing_status_qsiprep.tsvThis file is updated by the QSIPrep nipoppy tracker block (not the main Neurobagel status under .processing_statuses/). It contains:
- participant_id
- session_id
- qsiprep_sdc_method
If you want to only run structural data, you will need this pipeline. Otherwise, skip this pipeline.
## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull
source ./code/lib/slurm_array.sh
scand_submit_participant_array ./code/01_smriprep_scinet.sh 1The 01_magetbrain_init_scinet.sh script selects 21 template brains for MAGeTbrain registration. When possible, provide a demographic file so templates match your cohort by age and sex.
Recommended: Create data/local/bids/participants_demographic.tsv with the same subjects as participants.tsv plus two extra columns:
| Column | Header | Example |
|---|---|---|
| 1 | participant_id |
sub-CMH00000005 |
| 2 | age |
32 |
| 3 | sex |
Male or Female |
The script selects 10 male and 11 female templates stratified by age. If participants_demographic.tsv is missing, it randomly selects 21 subjects from participants.tsv and prints a warning in the job log.
By default, the labels in data/local/derivatives/MAGeTbrain/magetbrain_data/input/atlases/labels are based on hippocampus segmentation.
To change the segmentation to cerebellum, amygdala, or another region:
- Remove existing labels:
rm data/local/derivatives/MAGeTbrain/magetbrain_data/input/atlases/labels/* - Copy the desired labels from the shared directory:
cp /scratch/arisvoin/shared/templateflow/atlases_all4/labels/* data/local/derivatives/MAGeTbrain/magetbrain_data/input/atlases/labels/
## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull
## submit the array job to the queue
sbatch ./code/01_magetbrain_init_scinet.shNote - the script enclosed uses some interesting extra options:
- it defaults to running all the fmri tasks - the
--task-idflag can be used to filter from there - it is running
synthetic distortioncorrection by default - instead of trying to work with the datasets available fieldmaps - because fieldmaps correction can go wrong - but this does require that the phase encoding direction is specified in the json files (for example"PhaseEncodingDirection": "j-").
## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull
source ./code/lib/slurm_array.sh
scand_submit_participant_array ./code/02_fmriprep_apply_scinet.sh 1After fMRIPrep apply completes, a per-pipeline sidecar with the distortion-correction method for each subject is written to:
./Neurobagel/derivatives/processing_status_fmriprep.tsvThis file is updated by the fMRIPrep apply nipoppy tracker block (not the main Neurobagel status under .processing_statuses/). It contains:
- participant_id
- fmriprep_method (e.g., topup fieldmaps, synthetic fieldmaps, or no sdc done)
## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull
source ./code/lib/slurm_array.sh
scand_submit_participant_array ./code/02_qsirecon_FSL_scinet.sh 1In case your data is multi-shell you need to run amico noddi pipeline, otherwise skip this step.
## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull
source ./code/lib/slurm_array.sh
scand_submit_participant_array ./code/02_amico_noddi_scinet.sh 1To complete the final step for amico noddi, you need a graphical user interface like VNC to connect to a remote desktop. This interface allows you to create the necessary figures and HTML files for QC purposes. To connect to the remote desktop, follow these steps:
- Install and connect to VNC using login nodes.
- Open a terminal on VNC: navigate to Application > System Tools > MATE Terminal.
- Run the following command:
## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull
source ./code/03_amico_VNC.sh# module load singularity/3.8.0 - singularity already on most nodes
## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull #in case you need to pull new code
source ./code/lib/slurm_array.sh
scand_submit_participant_array ./code/02_freesurfer_atlas_parcellate_scinet.sh 1If you do not plan to run stage 6 (data sharing) and only wish to obtain the FreeSurfer group outputs, follow these steps to run the FreeSurfer group merge code after completing the FreeSurfer atlas parcellate processing:
## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull
bash ./code/freesurfer_group_merge.shFor multi-shell data, run the following code. For single-shell data, use the single-shell version of the code.
The final output for the tractography pipeline will be a .mat file containing various brain connectivity matrices and associated metadata for different parcellation schemes. The variables include region IDs (e.g., aal116_region_ids), region labels (aal116_region_labels), and multiple connectivity matrices such as aal116_radius2_count_connectivity and aal116_sift_radius2_count_connectivity. These matrices represent connectivity values between brain regions, measured using different methods or preprocessing steps. Similar sets of variables exist for other parcellations, including AAL116, AICHA384, Brainnetome246, Gordon333, and Schaefer100/200/400. If you want to inspect the contents further, you can use the scipy.io library in Python to load and analyze the data, or you can load the file directly in MATLAB.
Multishell:
## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull
source ./code/lib/slurm_array.sh
scand_submit_participant_array ./code/02_tractography_multi_scinet.sh 1
Singleshell:
## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull
source ./code/lib/slurm_array.sh
scand_submit_participant_array ./code/02_tractography_single_scinet.sh 1
## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull
source ./code/lib/slurm_array.sh
SUBJECTS_DIR=./data/local/derivatives/freesurfer/7.4.1
if compgen -G "${SUBJECTS_DIR}/*long*" > /dev/null; then
N_SUBJECTS=$(ls -d ${SUBJECTS_DIR}/*long* | wc -l)
else
N_SUBJECTS=$(ls -d ${SUBJECTS_DIR}/sub-* | wc -l)
fi
max_task=$(scand_slurm_array_max "$N_SUBJECTS" 1)
echo "Submitting ciftify_anat with array 0-${max_task}"
sbatch --array=0-${max_task} ./code/02_ciftify_anat_scinet.sh## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull
## submit the array job to the queue
sbatch ./code/02_magetbrain_register_scinet.shIf you're initiating the pipeline for the first time, it's crucial to acquire specific files from templateflow. Keep in mind that login nodes have internet access, while compute nodes operate in isolation. Therefore, make sure to download the required files as compute nodes lack direct internet connectivity. Here are the steps for pre-download:
# First load a python module
module load python/3.6.8
# Create a directory for virtual environments if it doesn't exist
mkdir ~/.virtualenvs
cd ~/.virtualenvs
virtualenv --system-site-packages ~/.virtualenvs/myenv
# Activate the virtual environment
source ~/.virtualenvs/myenv/bin/activate
python3 -m pip install -U templateflow
# Run a Python script to import specified templates using the 'templateflow' package
python -c "from templateflow.api import get; get(['fsaverage','fsLR', 'Fischer344','MNI152Lin','MNI152NLin2009aAsym','MNI152NLin2009aSym','MNI152NLin2009bAsym','MNI152NLin2009bSym','MNI152NLin2009cAsym','MNI152NLin2009cSym','MNI152NLin6Asym','MNI152NLin6Sym'])"# First load a python module
module load python/3.11.5
# Create a directory for virtual environments if it doesn't exist
mkdir ~/.virtualenvs
cd ~/.virtualenvs
virtualenv --system-site-packages ~/.virtualenvs/myenv
# Activate the virtual environment
source ~/.virtualenvs/myenv/bin/activate
python3 -m pip install -U templateflow
# Run a Python script to import specified templates using the 'templateflow' package
python -c "from templateflow.api import get; get(['fsLR', 'Fischer344','MNI152Lin'])"If you've already set up the pipeline before, bypass the previously mentioned instructions and proceed directly to executing the XCP pipeline:
## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull
source ./code/lib/slurm_array.sh
scand_submit_participant_array ./code/03_xcp_scinet.sh 1## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull
source ./code/lib/slurm_array.sh
scand_submit_participant_array ./code/03_xcp_noGSR_scinet.sh 1## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull
source ./code/lib/slurm_array.sh
scand_submit_participant_array ./code/03_noddi_reg_scinet.sh 1Note: To run the GLM script correctly, you need:
- Task events files
- A study-specific model JSON
π IMPORTANT:
You must provide a valid path to your MODEL file.
This file defines the regressors in your design matrix and contrasts β the GLM will not run correctly if this path is missing or incorrect.
- Use an absolute path (recommended)
- Ensure all of the requirement files exists before submitting the job
- Example models are available in:
code/glm/examples/models/
For proper set-up, please read these instructions carefully which can be found in here
## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull #in case you need to pull new code
## Provide a path to STUDY-specific model
MODEL=${PWD}/code/glm/examples/models/RTMSWM/model-001_smdl.json
## sanity check (recommended)
if [ ! -f "$MODEL" ]; then
echo "ERROR: MODEL file not found at $MODEL"
exit 1
fi
source ./code/lib/slurm_array.sh
## submit the array job to the queue, passing your task-specific model JSON as an argument
scand_submit_participant_array ./code/03_glm_surface_scinet.sh 1 "${MODEL}"## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull
source ./code/lib/slurm_array.sh
N_SUBJECTS=$(ls ./data/local/derivatives/MAGeTbrain/magetbrain_data/input/subjects/brains/*.mnc | wc -l)
max_task=$(scand_slurm_array_max "$N_SUBJECTS" 1)
echo "Submitting MAGeTbrain vote with array 0-${max_task}"
sbatch --array=0-${max_task} ./code/03_magetbrain_vote_scinet.sh
## Running enigma extract
```sh
## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull
source ./code/ENIGMA_ExtractCortical.sh
## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull
source ./code/lib/slurm_array.sh
scand_submit_participant_array ./code/03_qsirecon_dtifit_scinet.sh 1## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull
## submit the array job to the queue
sbatch ./code/04_enigma_dti_scinet.sh## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull
## submit the array job to the queue
sbatch ./code/05_extract_noddi_scinet.shAt any stage, before proceeding to the next stage and executing the codes for the subsequent phase, review the latest Neurobagel processing status file under Neurobagel/derivatives/.processing_statuses/processing_status-*.tsv (or the copy in data/share/processing_status.tsv after stage 6) for all pipelines from the previous stage. For instance, if you intend to execute stage 3 code, you must examine the processing status for all the pipelines in stage 2. If no participants have encountered failures, you may proceed with running the next stage. You can also upload your file to Neurobagel Digest to gain more insight into the status of your pipelines and to filter them for easier review.
If any participant has failed, amend data/local/bids/participants.tsv by excluding the IDs of failed participants (keep only subjects you want to rerun). After rectifying the errors, rerun the pipeline with the updated participant list.
This step calls group-level BIDS apps to build summary sheets and HTML index pages. It also copies metadata, QC pages, and a smaller subset of summary results into data/share.
Submit the Slurm extract job and run the login-node terminal script together (as in stage_6.sh and the commands below). The Slurm job may take up to ~8 hours on Slurm depending on dataset size; the terminal script runs immediately on the login node for MAGeTbrain QC, qsiprep metrics, and related steps.
## go to the repo and pull new changes
cd ${SCRATCH}/SCanD_project
git pull
sbatch ./code/06_extract_to_share_slurm.sh
source ./code/06_extract_to_share_terminal.shWhen all pipelines are complete, verify data/share against docs/share-folder-checklist.md. Use docs/qc-guide.md for visual review of HTML QC reports before handoff. Replace <groupName_studyName> with your consortium group and study identifier (for example, CMH_study2024), then sync results to the shared space:
π You're done! Hand off your data/share folder to the consortium.
cd ${SCRATCH}/SCanD_project
mkdir -p /scratch/arisvoin/shared/<groupName_studyName>
rsync -av data/share /scratch/arisvoin/shared/<groupName_studyName>/