[From MathePhysics/FINDER] Suite of Classification Algorithms based on the following article:
Trajan Murphy*, Akshunna S. Dogra*, Hanfeng Gu, Caleb Meredith, Mark Kon, Julio Enrique Castrillion-Candas*, for the Alzheimer’s Disease Neuroimaging Initiative, FINDER: Feature Inference on Noisy Datasets using Eigenspace Residuals, 2025. (*Equal Contributions, correspondence should be directed to: adogra@nyu.edu).
Below is the step by step instruction to reproduce the results as shown.
Step 1. Download Data (instructions are at the bottom in the %%Download data section)
Step 2. Open MATLAB, and make sure that the current path in MATLAB is (your download path)/source. Type 'paths' (without quotes) in the command window to add all folder subpaths to current path
Step 3. Open InitializeParameters.m. Fill out fields in InitializeParameters. Guidelines:
Set parameters.data.validationType = 'Kfold' and parameters.Kfold = 1 to perform LPOCV.
set parameters.data.path = (the location of your data file). If you download your datasets to the 'data' folder, you may put the empty string ''.
set parameters.data.normalize = 1
set parameters.multilevel.chooseTrunc = 'false';
set parameters.parallel.on = true to run parallel loop (only if you have the parallel processing toolbox)
set parameters.gpuarray.on = true to run on GPU (only if you have GPU), it is not recommended to set both
parameters.parallel.on and parameters.gpuarray.on to true
set parameters.snapshots.k1 = (The truncation parameter for each data set). We use the following
ADNI: 5
CSF: 8
newAD: 8
GCM: 39
set parameters.data.label = 'Plasma_M12_ADCN' (ADNI AD vs. CN)
'Plasma_M12_CNLMCI' (ADNI CN vs. LMCI)
'Plasma_M12_ADLMCI' (ADNI AD vs. LMCI)
'SOMAscan7k_KNNimputed_AD_CN' (CSF)
'newAD' (newAD)
'GCM' = GCM;
set parameters.multilevel.Mres = 20:20:140 (ADNI)
700:700:7000 (CSF and newAD)
1600:1600:16000 (GCM)
Step 4. (Optional Batch Processing).
To perform batch processing of results for a single dataset, you may run CompMultiSVM2 instead of CompMultiSVM (Step 5.) Make sure that parameters.snapshots.k1 and parameters.multilevle.Mres are defined appropriately in InitializeParameters for each dataset when running CompMultiSVM2.
Step 5. If you want to generate results one by one, simply type CompMultiSVM in the command window. This will generate the results based off of your user specified fields in the InitializeParameters.m file
Step 6. Check the results
The results including table of accuracy and AUC, and plots in the paper will be stored within
/(your download path)/results/Manual_Hyperparameter_Selection
The results structure contains the following fields:
'array': 5-D array containing indices related to the (i,j)th iteration of LPOCV, the actual class value, predicted class value, and raw machine score
'notes': string array explaining each of the dimensions in 'array'
'DimRunTime': defunct field (ignore)
'AUC': AUC obtained for each value of Mres for MLS and ACA, and for each learner in Benchmark
'ROCs': ROC curves corresponding to each AUC value
'accuracy': same as AUC but includes the accuracy
'accuracyBalanced':contains the balanced accuracy instead of the accuracy. This is the same as accuracy for LPOCV
'run_time': total elapsed time to create result structure
'creation_time': date and time at which result structure was saved.
Step 7. Plot the results
The function call plotAUCs2 should plot each of the figures. Run this only after you have created the results for
-Plasma_M12_ADCN [ADNI (AD vs. CN) ]
-Plasma_M12_ADLMCI [ ADNI (AD vs. LMCI)]
-Plasma_M12_CNLMCI [ADNI (CN vs. LMCI)]
-GCM
-newAD
-SOMAscan_7KNN_imputed_AD_CN [CSF (AD vs. CN)]
otherwise, MATLAB will throw an error, having not produced the files you are trying to load.
%%======================== %% Downloading Data Sets %%========================
Step 1. Get access to confidential ADNI data Visit the source website https://ida.loni.usc.edu/login.jsp?project=ADNI.
Complete the data use agreement and submit your application.
Once approved, you'll receive login credentials for the ADNI Image & Data Archive (IDA).
Step 2. Download plasma data Choose 'select study' to be ADNI. In the "Search & Download" dropdown menu, select "Study Files".
In the sidebar on the left, choose "Biospecimen" -> "Biospecimen Results".
Find and download "Biomarkers Consortium Plasma Proteomics Project RBM Multiplex Data and Primer (Zip file)"
From the folder, extract "adni_plasma_qc_multiplex_11Nov2010.csv" and save to /data/
================== Step 3. Download phenotype data ==================================
Also in the "Search & Download" section, find "ADNIMERGE - Packages for R" and download "ADNIMERGE_0.0.1.tar.gz".
run the following code in R:
install.packages("Hmisc")
install.packages("ADNIMERGE_0.0.1.tar.gz", repos = NULL, type = "source")
library(ADNIMERGE)
data("adnimerge")
m12 <- subset(adnimerge, VISCODE=='m12')
bl <- subset(adnimerge, VISCODE=='bl')
write.csv(m12, "/data/adni_phenotype_m12.csv", quote = F, row.names = F)
write.csv(bl, "/data/adni_phenotype_bl.csv", quote = F, row.names = F)
================== Step 4. Generate the data =====================================
Now we have original data ready for use:
"/data/adni_plasma_qc_multiplex_11Nov2010.csv"
"/data/adni_phenotype_m12.csv"
Go to /source/ and type paths
Type PrepADNI in the command window
The binary datasets will be stored in /data/
=================================
CSF
Step 1. and Step 2 are the same as for ADNI data
Step 3. Download CSF data
Choose 'select study' to be ADNI. In the "Search & Download" dropdown menu, select "Study Files".
In the sidebar on the left, choose "Biospecimen" -> "Biospecimen Results".
Find and download "CruchagaLab CSF SOMAscan7k Protein matrix postQC"
Save the file "CruchagaLab_CSF_SOMAscan7k_Protein_matrix_postQC_20230620.csv" to /data/
Step 4. Generate the data
Now we have original data ready for use:
"/data/CruchagaLab_CSF_SOMAscan7k_Protein_matrix_postQC_20230620.csv"
"/data/adni_phenotype_bl.csv"
Then run PrepCSF.m. The binary datasets will be stored in /source/data
===================================
GCM
This data set is already included in the 'data' folder; in InitializeParameters.m you may therefore set parameters.data.path = ''
Remote Sensing