This repository contains the code and notebooks required to run the analysis and generate the figures presented in our manuscript "The intrinsic geometry of reading."
Install all Python dependencies via:
pip install -r requirements.txtTwo dependencies are not on PyPI and are installed directly from GitHub. They are included in requirements.txt as editable installs, but note them explicitly here:
- brain2behaviour — core dataset class and CPM utilities used throughout the analysis
- surfdist — cortical surface geodesic distance computation
If pip install -r requirements.txt does not install them (e.g. in a fresh environment), install manually:
pip install git+https://github.com/neurabenn/brain2behaviour.git
pip install git+https://github.com/neurabenn/surfdist.gitSeveral files required to recreate the analysis are not ours to distribute, but are freely available through the Human Connectome Project:
- HCP unrestricted behavioral data — download here
- HCP restricted behavioral data — access instructions here
- Required to respect family structure in cross-validation fold splits
- Also contains confounds: age, height, weight, blood pressure
The analysis data (parcellated surface area, cortical distance matrices, and functional connectivity matrices) are hosted on Zenodo due to file size:
https://zenodo.org/records/20558695
Download and unpack the archive, then update the path variables at the top of each notebook to point to your local copy. In most notebooks this is the data_dir variable:
data_dir = '/path/to/unpacked/data'Sets up the analysis dataset using the brain2behaviour dataset class. Loads HCP behavioral and demographic data, selects confounds, and serializes subject-level data objects. The data_dir variable must point to your unpacked Zenodo data.
These scripts use the datasets we build in nb1 to run the full CPM pipeline. Steps run (feature selection, permutation testing), parallelized cluster scripts are provided alongside the notebooks:
End2EndCPM_wPerms.py— end-to-end CPM with permutation testing is the main script- Call it with a dataset the permutations and the task to be tested.
select_features_batch.py— batch feature selection across CV folds -- called by end2endCollectFeaturesandPredict.py— collects fold results and runs prediction -- called by end2end
Permutation exchangeability blocks for HCP data are precomputed and stored in fold_permutations/. For background on HCP-compatible permutation testing see Winkler et al. 2015 and the PALM exchangeability block guide.
Runs significance calculations for model performance (cross-validation) and tests for differences between models. Plots Figure 1a and 1b. Based on outputs of CPM run with the dataset construction in nb1.
Generates the circle plot and heatmap visualizations of stable selected features (Figure 1c). Features visualized are those with consistent associations across all folds of cross-validation.
Projects CPM features onto the cortical surface and writes gifti metric files. To use the outputs: set the cortex structure and palette via Connectome Workbench (wb_command -set-structure, wb_command -metric-palette), or open surfaces_and_scene_files/FeaturesSchaefer400.scene directly in Connectome Workbench to view precalculated surface visualizations. Alternatively, load the workbench scene stored in the surfaces_and_scene_files/ directory.
Shows how generalization from the discovery (HCP Young Adult) to the validation cohort (HCP Lifespan Aging) was done.
Replicates the supplementary figures. Due to file size limits on Zenodo, the model outputs used to calculate between-model comparisons and model significance are available upon request.
Uses the dataset files to plot surface area, cortical distance, and functional connectivity stable features in the discovery cohort with the oral reading recognition test scores and confounds.