Skip to content

brainhack-school2026/TSH-predict-CogPerformance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

BrainHack2026_project|Can brain structure and Thyroid Stimulating Hormone levels predict cognitive performance? Prediction models and SHAP-based variable contribution analyses

This file is the main repository overview. The top-level README.md is intentionally left empty so that this file remains the single canonical overview document.

We used an OpenNeuro dataset to apply machine learning to the connection between hormone, brain, and cognition. Specifically, we utilized The NIMH Healthy Research Volunteer Dataset to answer these questions:

  1. Primary: Can brain structure, combined with circulating levels of thyroid stimulating hormone (TSH), predict cognitive performance?

  2. Exploratory: Which variables contribute most to the model predictions?

The initial plan also considered asking whether prediction improves when sex is included in the cognition-prediction model. As the project developed, this strategy was replaced by using SHAP values to assess variable contributions. Sex classification was instead used as a positive-control exercise to check whether the machine-learning pipeline could recover a signal for a better-established outcome.

Repository structure and code locations

The repository is organized to follow the analysis workflow.

  • 01_preprocessing/ contains the steps used before modeling: MRI and phenotypic data curation, MRI preprocessing notes, FreeSurfer output extraction, and preparation of the final machine-learning input dataset.

  • 02_ML_model/ contains the elastic-net modeling stage: environment setup, the final modeling dataset, cognition-prediction scripts, sex-classification positive-control scripts, and generated model outputs.

  • 03_results/ contains curated summaries of the final model results, including selected embedded figures. The raw tables and figures generated by the Python scripts remain under 02_ML_model/results/.

Each folder contains documentation that explains the purpose of that step and how to use the code in that folder. The machine-learning folder also includes more detailed documentation of the elastic-net models, predictors, outcomes, outputs, and generated figures.

README.md                         # intentionally empty
REPO_OVERVIEW.md                  # main repository overview

01_preprocessing/
├── A_data_curation/
│   ├── MRI_data_curation.md
│   ├── phenotypic_data_curation.md
│   └── code/
│       └── make_SCIC_T1w_BIDS.sh
├── B_MRI_preprocessing/
│   ├── MRI_preprocessing.md
│   └── code/
│       └── make_freesurfer_qc_images_and_html.sh
└── C_ML_input_preparation/
    ├── MRI_data_extraction.md
    ├── final_dataset_merge.md
    └── code/
        └── extract_DK_CTh_SA_GMV_ICV_wide.sh

02_ML_model/
├── README.md
├── environment/
│   ├── README.md
│   └── create_brainhack_ml_environment.sh
├── data/
│   └── elastic_net_complete_cases_wide.csv
├── cognition/
│   ├── README.md
│   ├── 01_elastic_net_full_readable.py
│   └── 02_elastic_net_cognition_residualized_age_ICV_with_SHAP.py
├── sex_classification/
│   ├── README.md
│   ├── 01_sex_classification_without_TSH_with_SHAP.py
│   ├── 02_sex_classification_with_TSH_with_SHAP.py
│   ├── 03_sex_classification_residualized_age_ICV_without_TSH_with_SHAP.py
│   └── 04_sex_classification_residualized_age_ICV_with_TSH_with_SHAP.py
└── results/
    ├── cognition/
    └── sex_classification_exercise/

03_results/
├── README.md
├── cognition/
│   ├── results.md
│   └── figures/
│       ├── observed_vs_predicted/
│       └── SHAP_global_bar/
└── sex_classification_exercise/
    ├── results.md
    └── figures/
        ├── ROC_curves/
        ├── SHAP_global_bar/
        └── confusion_matrices/

The machine-learning scripts resolve paths relative to the repository, so they can be run after downloading the repository and creating the conda environment described under 02_ML_model/environment/.

Cognition outcomes

The dataset uses the NIH Toolbox cognition battery to measure cognitive ability in the participants. The scores used in this project were:

  1. Attention and executive functioning using Flanker Inhibitory Control and Attention Task

  2. Executive functioning using a Dimensional Change Card Sort Task

  3. Working memory using a List Sorting Working Memory Task

Prior literature suggested that TSH may be related to attention, working memory, and executive function; therefore, these three domains were selected as cognition outcomes.

Models

The final project includes the originally planned cognition-prediction model, an additional cognition-prediction model using age/ICV-residualized brain variables, and a sex-classification positive-control exercise.

Model 1: original cognition model

This was the original planned model to test whether TSH, sex, age, ICV, and brain features could predict cognitive subdomain scores.

Outputs:

  1. Flanker Inhibitory Control and Attention Task score
  2. Dimensional Change Card Sort Task score
  3. List Sorting Working Memory Task score

Predictors: log(TSH), sex, 68 cortical-thickness variables, 68 surface-area variables, ICV, and age.

Model 2: age/ICV-residualized brain cognition model

This model modifies Model 1 by residualizing the brain variables to account for head size and age. Age and ICV are used for residualization and are not direct predictors in the elastic-net model.

Outputs:

  1. Flanker Inhibitory Control and Attention Task score
  2. Dimensional Change Card Sort Task score
  3. List Sorting Working Memory Task score

Predictors: log(TSH), sex, 68 cortical-thickness residuals after adjusting for age and ICV, and 68 surface-area residuals after adjusting for age and ICV.

Model 3: sex-classification positive-control exercise

This model serves as a positive-control exercise to check whether the pipeline can recover a signal for variables and outcomes that have stronger prior literature evidence.

Output: sex.

  • Model 3.1: original CT/SA sex-classification model without TSH. Predictors: age, ICV, cortical thickness, and surface area.
  • Model 3.2: original CT/SA sex-classification model with log(TSH). Predictors: log(TSH), age, ICV, cortical thickness, and surface area.
  • Model 3.3: age/ICV-residualized CT/SA sex-classification model without TSH. Predictors: cortical thickness and surface area residualized for age and ICV.
  • Model 3.4: age/ICV-residualized CT/SA sex-classification model with log(TSH). Predictors: log(TSH), cortical thickness and surface area residualized for age and ICV.

Curated results

Final curated result summaries are stored under 03_results/.

  • 03_results/cognition/results.md summarizes the cognition-prediction models and embeds the observed-versus-predicted plots, with SHAP global-bar plots in a collapsible section.

  • 03_results/sex_classification_exercise/results.md summarizes the sex-classification positive-control models and embeds ROC curves and SHAP global-bar plots, with confusion matrices in a collapsible section.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors