diff --git a/README.md b/README.md index b0f6ed3..c986033 100755 --- a/README.md +++ b/README.md @@ -1,44 +1,61 @@ -# [project name] - -## Project Goal -:point_right: Replace this text with a pithy description of the goal of the data analysis project. - - -## Background & Design -:point_right: Replace this text with a slightly longer (but still only max 250 words) description of the project background and its design. - - -## Roadmap -:point_right: Insert graphic of planned project releases. [This site](https://app.diagrams.net/) allows you to build a graphic that is connected to GitHub, so that changes you make are treated as GitHub commits. - - -## Contents -:point_right: This section should introduce the content that is published on `main`. Example: - - -#### 2021-07-01 SPR Poster -This is a description of the poster. There is a matching folder in the repo. - -:point_right: When you first set up this readme file, you might replace everything under "Contents" here with something like "Watch for our first release!" - - -## Work in Development -This `main` branch contains completed releases for this project. For all work-in-progress, please switch over to the `dev` branches. - -:point_right: Keep the "Work in Development" text above as the default NDCLab "blurb" to help visitors navigate the repo. But delete this note before publishing the readme file. - - -## Contributors -| Role | Name | -| --- | --- | -| add role | insert team member(s) | -| add role | insert team member(s) | -| add role | insert team member(s) | - -Learn more about us [here](https://www.ndclab.com/people). - - -## Contributing -If you are interested in contributing, please read our [CONTRIBUTING.md](CONTRIBUTING.md) file. - -:point_right: Keep the "Contributing" text above as the default NDCLab "blurb" to help visitors navigate the repo. But delete this note before publishing the readme file. +# Thrive: Theta DDM Analysis + +## Project Goal +This repository contains the data analysis pipeline for the "Thrive" project. The goal is to investigate the neural mechanisms underlying social influence on decision-making using the Shrinking Spotlight Protocol (SSP) Drift Diffusion Model (DDM) and EEG time-frequency analysis. We aim to understand how social observation affects cognitive control processes, specifically examining ERP components (ERN, Pe) and theta band oscillations. + +## Background & Design +The study employs a Flanker task with social (Observed) and non-social (Alone) conditions. +Key analyses include: +1. **Behavioral Analysis**: Analyzing accuracy and reaction times (RT) to model decision-making processes. +2. **Computational Modeling**: Fitting the SSP-DDM to behavioral data to estimate parameters like boundary separation, non-decision time, and attentional focus. +3. **EEG Analysis**: + * **Preprocessing**: Automated pipeline (MADE) including filtering, artifact rejection (FASTER/ADJUST), and ICA. + * **ERP Analysis**: Computing Event-Related Potentials (ERN, Pe) for error and correct trials. + * **Time-Frequency Analysis**: Examining power and phase synchrony (ITPS, ICPS) in the theta and delta bands. + +## Directory Structure + +```yml +project-name +├── code +│ ├── behavior # Scripts for behavioral data analysis and cleaning +│ ├── ddm # Scripts for SSP-DDM model fitting (R, C++, Python) +│ ├── figures # Scripts to generate plots (ERPs, Topographies) +│ ├── matlab # Helper MATLAB functions +│ ├── postprocessing # Scripts for aggregating EEG metrics (ERP, TF, ICPS) +│ ├── preprocessing-eeg # EEG preprocessing pipeline (MADE) +│ └── statistics # R scripts for statistical analysis (LMM) +├── derivatives # Generated data (preprocessed EEG, summary CSVs) +├── sourcedata # Raw input data (checked) +└── results # Figures and statistical outputs +``` + +## Setup & Usage + +### Prerequisites +* **Python**: `pandas`, `numpy`, `scipy`, `mne`, `matplotlib`, `h5py` +* **R**: `DEoptim`, `Rcpp`, `dplyr`, `flextable` +* **MATLAB**: EEGLAB with plugins (FASTER, ADJUST, firfilt) +* **C++**: Compiler compatible with Rcpp + +### Workflow +1. **Behavioral Processing**: + * Run `code/behavior/behavior_analysis.py` to aggregate PsychoPy data. + * Run `code/behavior/create_valid_behav.py` to filter subjects and create summary CSVs. + +2. **DDM Fitting**: + * Compile the C++ model: `code/ddm/simSSP_model_GB_noScale.cpp`. + * Run `code/ddm/run_ddm_batch.py` to submit fitting jobs. + * Aggregate results with `code/ddm/fitted.py`. + +3. **EEG Preprocessing**: + * Use `code/preprocessing-eeg/run_MADE_batch.py` to submit preprocessing jobs (calls `MADE_pipeline.m`). + * Verify outputs with `code/preprocessing-eeg/check_preprocessed_files.py`. + +4. **Post-Processing & Statistics**: + * Compute ERP means: `code/postprocessing/compute_erp_means.py`. + * Compute TF metrics: `code/postprocessing/compute_means_TF.py`. + * Run statistical models using R scripts in `code/statistics/`. + +## Contributors +* NDCLab Team diff --git a/code/behavior/behavior_analysis.py b/code/behavior/behavior_analysis.py index 472116e..fcc202d 100755 --- a/code/behavior/behavior_analysis.py +++ b/code/behavior/behavior_analysis.py @@ -7,9 +7,41 @@ import time import datetime +""" +Behavioral analysis script for processing PsychoPy data. + +This script reads behavioral data from PsychoPy CSV files, performs data cleaning, +calculates various behavioral metrics (accuracy, reaction times, error rates), +and generates summary CSV files for further analysis. It processes data for both +social and non-social conditions. + +Usage: + python behavior_analysis.py + +Arguments: + session_id (str): The session identifier (e.g., 's1_r1'). + +Outputs: + - Summary CSV file containing aggregated metrics for each subject. + - Individual trial data CSV files for each subject. + - A combined CSV file containing trial data for all subjects. + - A log file recording the processing details. +""" + pd.options.mode.chained_assignment = None def convert_to_list_rt(series): + """ + Converts a pandas Series of reaction times (RT) to a list of floats. + + Handles string representations of lists and single values, as well as actual lists. + + Args: + series (pd.Series): A pandas Series containing RT data, potentially as strings or lists. + + Returns: + list: A list of float RT values, with np.nan for missing or invalid entries. + """ float_list = [] for value in series: if isinstance(value, str): @@ -24,6 +56,17 @@ def convert_to_list_rt(series): return float_list def convert_to_list_resp(series): + """ + Converts a pandas Series of response keys to a list of response values. + + Parses string representations of lists to extract integer response keys. + + Args: + series (pd.Series): A pandas Series containing response data. + + Returns: + list: A list of lists containing integer response keys, or np.nan for missing entries. + """ resp_list = [] for value in series: if isinstance(value, str): @@ -35,6 +78,15 @@ def convert_to_list_resp(series): def sort_csvs_by_date_pd(csv_paths): + """ + Sorts a list of CSV file paths based on the date in the 'date' column of the first row. + + Args: + csv_paths (list): A list of file paths to CSV files. + + Returns: + list: The list of file paths sorted by date. + """ date_format = "%Y-%m-%d_%Hh%M.%S.%f" def get_first_date(path): diff --git a/code/behavior/check_status.py b/code/behavior/check_status.py index d27264e..aa91d4b 100755 --- a/code/behavior/check_status.py +++ b/code/behavior/check_status.py @@ -3,6 +3,20 @@ import re from pathlib import Path +""" +Script to check the processing status of behavioral data for a given session. + +This script scans the source directory for raw PsychoPy files, checks for +deviation files, identifies successfully processed subjects, and reports +which subjects are pending processing or have deviations. + +Usage: + python check_status.py + +Arguments: + session_id (str): The session identifier (e.g., 's1_r1'). +""" + def get_args(): """Parses command line arguments.""" parser = argparse.ArgumentParser(description="Check behavior processing status.") diff --git a/code/behavior/check_subject_csv.py b/code/behavior/check_subject_csv.py index ebbecb8..c01b760 100755 --- a/code/behavior/check_subject_csv.py +++ b/code/behavior/check_subject_csv.py @@ -2,6 +2,17 @@ import re import glob +""" +Script to verify the integrity of subject CSV files in the source directory. + +It checks if each subject has the expected number of files (3) and detects +any deviation files. It asserts that the main data CSV file exists and does +not have 'deviation' in its filename. + +Usage: + python check_subject_csv.py +""" + input_dataset_path = "/home/data/NDClab/datasets/thrive-dataset/" data_path = "sourcedata/checked/" sub_path = "s1_r1/psychopy/" diff --git a/code/behavior/create_valid_behav.py b/code/behavior/create_valid_behav.py index 910ef2b..2a86501 100755 --- a/code/behavior/create_valid_behav.py +++ b/code/behavior/create_valid_behav.py @@ -4,12 +4,35 @@ from glob import glob import os +""" +Script to filter and clean behavioral data, creating valid datasets for analysis. + +This script loads the behavioral summary data, removes excluded subjects and +outliers based on reaction time and accuracy criteria, and saves separate CSV +files for social and non-social conditions containing only valid data. + +Usage: + python create_valid_behav.py + +Arguments: + session_id (str): The session identifier (e.g., 's1_r1'). +""" + def replace_outliers_with_nan_cols(df, columns_to_check, sd_thresh=3): """ - Replaces outliers with NaN. - Raises KeyError if a column is missing. - Returns the modified dataframe if successful. + Replaces outliers in specified columns with NaN based on standard deviation threshold. + + Args: + df (pd.DataFrame): The input pandas DataFrame. + columns_to_check (list): A list of column names to check for outliers. + sd_thresh (float, optional): The number of standard deviations to use as the threshold. Defaults to 3. + + Returns: + pd.DataFrame: The DataFrame with outliers replaced by NaN. + + Raises: + KeyError: If a column in columns_to_check is missing from the DataFrame. """ # 1. Validation: Check all columns exist first # If this fails, the error is raised and nothing is returned. diff --git a/code/ddm/SSP_DDM_fitting.R b/code/ddm/SSP_DDM_fitting.R index 85b0602..4e3c653 100755 --- a/code/ddm/SSP_DDM_fitting.R +++ b/code/ddm/SSP_DDM_fitting.R @@ -1,3 +1,22 @@ +#' SSP-DDM Model Fitting Script +#' +#' This script loads human behavioral data and fits the Shrinking Spotlight Protocol (SSP) +#' Drift Diffusion Model (DDM) parameters to the data using Differential Evolution optimization. +#' It relies on an Rcpp function for model simulation. +#' +#' @description +#' The script defines a fitting function that calls the C++ simulation model, +#' compares simulated data with human data using Chi-Square statistics, and optimizes +#' parameters (a, ter, p, rd, sda) to minimize the difference. +#' It processes subjects in batches as defined by command line arguments. +#' +#' @usage +#' Rscript SSP_DDM_fitting.R +#' +#' @param startIdx The starting index of subjects to process. +#' @param endIdx The ending index of subjects to process. +#' @param session The session identifier (e.g., 's1_r1'). + ## This script loads in human data and then proceeds through the fitting Script to fit parameters to the data. ## The first section of code is a function that call the rcpp code (separate file), which contains the actual model, ## simulates data based on a set of parameters, and outputs a fit stat. The second section of code is the "main script" @@ -16,6 +35,24 @@ endIdx <- args[2] session <- args[3] #------------------------------------------------------------------------------ # Fit function for the SSP model + +#' Fit Function for the SSP Model +#' +#' Calculates the Chi-Square goodness-of-fit statistic between human data and +#' simulated data generated by the SSP model for a given set of parameters. +#' +#' @param parms Numeric vector of length 5 containing the model parameters: +#' 1. a (boundary separation) +#' 2. ter (non-decision time) +#' 3. p (perceptual strength) +#' 4. rd (shrinking rate) +#' 5. sda (attentional window width) +#' @param nTrials Integer. Number of trials to simulate per condition. +#' @param cutPoints List of length 4. RT bin boundaries for each accuracy/congruency condition. +#' @param humanProps List of length 4. Proportions of human data in each bin. +#' @param HumanTrialCounts Numeric vector of length 2. Total trial counts for congruent and incongruent conditions. +#' +#' @return A numeric value representing the Chi-Square statistic. Returns a very large number if infinite. fitFunctionSSP <- function( parms, # generated by DEoptim within specified boundaries when DEoptim is called nTrials, # how many trials to simulate per condition diff --git a/code/ddm/bulk_scancel.py b/code/ddm/bulk_scancel.py index 85b9599..14257bd 100755 --- a/code/ddm/bulk_scancel.py +++ b/code/ddm/bulk_scancel.py @@ -1,5 +1,15 @@ import subprocess +""" +Script to bulk cancel SLURM jobs within a specified range of job IDs. + +This script generates a list of job IDs from a start ID to an end ID and +executes the 'scancel' command for all of them. + +Usage: + python bulk_scancel.py +""" + start_id = 2648371 end_id = 2648451 diff --git a/code/ddm/fitted.py b/code/ddm/fitted.py index 12c5d66..b60a27b 100755 --- a/code/ddm/fitted.py +++ b/code/ddm/fitted.py @@ -3,6 +3,19 @@ from datetime import datetime import sys +""" +Script to aggregate DDM fitting results into a single CSV file. + +This script searches for output files from DDM fitting, concatenates them, +and saves a summary file listing the subjects that have been successfully fitted. + +Usage: + python fitted.py + +Arguments: + session_id (str): The session identifier. +""" + session = sys.argv[1] current_datetime = datetime.now() formatted_date = current_datetime.strftime("%Y_%m_%d_%H_%M_%S") diff --git a/code/ddm/run_ddm_batch.py b/code/ddm/run_ddm_batch.py index 647d6dd..0126b46 100755 --- a/code/ddm/run_ddm_batch.py +++ b/code/ddm/run_ddm_batch.py @@ -6,6 +6,19 @@ from pathlib import Path import pandas as pd +""" +Script to submit DDM fitting jobs to the SLURM scheduler. + +This script identifies subjects that have not yet been fitted for the DDM model +and submits batch jobs for them. It allows for submitting jobs in batches. + +Usage: + python run_ddm_batch.py + +Arguments: + session_id (str): The session identifier. +""" + session = sys.argv[1] #session = "s2_r1" data_dir = "/home/data/NDClab/analyses/thrive-theta-ddm/" diff --git a/code/ddm/simSSP_model_GB_noScale.cpp b/code/ddm/simSSP_model_GB_noScale.cpp index 2be3455..d108a0b 100755 --- a/code/ddm/simSSP_model_GB_noScale.cpp +++ b/code/ddm/simSSP_model_GB_noScale.cpp @@ -1,7 +1,39 @@ +/** + * @file simSSP_model_GB_noScale.cpp + * @brief Implementation of the Shrinking Spotlight Protocol (SSP) Drift Diffusion Model (DDM) simulation. + * + * This file contains the Rcpp function for simulating trial-level data for the SSP-DDM model. + * It simulates the decision-making process by modeling drift diffusion with a shrinking attentional spotlight. + */ + #include #include using namespace Rcpp; +/** + * @brief Simulates trial-level results for the SSP-DDM model. + * + * This function simulates the response times and accuracy for a set of trials + * based on the provided model parameters. It uses a diffusion process where + * the drift rate is modulated by a shrinking attentional spotlight. + * + * @param parms Numeric vector of length 5 containing the SSP parameters: + * - parms[0]: A (boundary separation) + * - parms[1]: tEr (non-decision time) + * - parms[2]: p (perceptual strength) + * - parms[3]: rd (shrinking rate) + * - parms[4]: sda (initial attentional window width) + * @param trialType Integer indicating the trial type: + * - 1: Congruent + * - 2: Incongruent + * @param nTrials Integer. The number of trials to simulate. + * @param dt Double. The time step size for the diffusion process simulation. + * @param vari Double. The variance of the drift rate for each step. + * + * @return NumericMatrix A matrix with `nTrials` rows and 2 columns: + * - Column 0: Simulated Reaction Time (RT) + * - Column 1: Simulated Accuracy (1 for correct, 0 for error) + */ // [[Rcpp::export]] NumericMatrix simSSP_model_GBnoScale(NumericVector parms, int trialType, int nTrials, double dt, double vari) { diff --git a/code/figures/plot_erp.m b/code/figures/plot_erp.m index 0d1ed8a..9f4266e 100755 --- a/code/figures/plot_erp.m +++ b/code/figures/plot_erp.m @@ -1,3 +1,10 @@ +% PLOT_ERP Plot Grand Average ERPs +% +% Usage: +% Run this script to plot Grand Average ERP waveforms for Social vs Non-Social +% and Error vs Correct conditions. It handles both standard and CSD data. +% It relies on precomputed .mat files containing ERP data. + clear % clear matlab workspace clc % clear matlab command window diff --git a/code/figures/plot_erp.py b/code/figures/plot_erp.py index 86f80f4..7027c33 100755 --- a/code/figures/plot_erp.py +++ b/code/figures/plot_erp.py @@ -11,7 +11,32 @@ import sys import os +""" +Script to plot Grand Average ERP waveforms. + +This script loads aggregated ERP data, applies baseline correction, +filters subjects based on valid behavioral data, and plots the Grand Average ERPs +for Social vs Non-Social and Error vs Correct conditions. It specifically targets +components like ERN and Pe. + +Usage: + python plot_erp.py + +Arguments: + session_id (str): The session identifier. + laplacian_flag (int): 1 for CSD data, 0 for voltage data. +""" + def find_newest_file(path): + """ + Finds the most recently modified file matching a glob pattern. + + Args: + path (str): Glob pattern to match files. + + Returns: + str: The path to the newest file found. + """ matching_files = glob(path) # Check if any files were found diff --git a/code/matlab/parsave.m b/code/matlab/parsave.m index 82f3dae..b9e6e0e 100755 --- a/code/matlab/parsave.m +++ b/code/matlab/parsave.m @@ -1,4 +1,17 @@ function parsave(fname, varargin) +% PARSAVE Save variables to a file inside a parfor loop. +% +% Usage: +% parsave(fname, 'var1', val1, 'var2', val2, ...) +% +% Inputs: +% fname - Name of the file to save (string/char). +% varargin - Alternating variable names (strings) and their values. +% +% Description: +% Matlab's 'save' function cannot be called directly inside a parfor loop +% because it accesses the workspace. This function serves as a wrapper. + % varargin contains alternating variable names and values % Extract names and values for i = 1:2:length(varargin) diff --git a/code/postprocessing/compute_erp_means.py b/code/postprocessing/compute_erp_means.py index 07a7cbf..7bbbaca 100755 --- a/code/postprocessing/compute_erp_means.py +++ b/code/postprocessing/compute_erp_means.py @@ -10,7 +10,31 @@ import sys import os +""" +Script to compute mean ERP amplitudes from preprocessed EEG data. + +This script loads ERP data from a MAT file, applies baseline correction, +and calculates mean amplitudes for specific time windows and electrode clusters +(e.g., ERN, Pe) for social and non-social conditions. It outputs the results to a CSV file. + +Usage: + python compute_erp_means.py + +Arguments: + session_id (str): The session identifier (e.g., 's1_r1'). + laplacian_flag (int): 1 to use CSD (Current Source Density) transformed data, 0 for standard voltage. +""" + def find_newest_file(path): + """ + Finds the most recently modified file matching a glob pattern. + + Args: + path (str): Glob pattern to match files. + + Returns: + str: The path to the newest file found. + """ matching_files = glob(path) # Check if any files were found diff --git a/code/postprocessing/compute_means_ICPS.py b/code/postprocessing/compute_means_ICPS.py index 1b1a205..d01997b 100755 --- a/code/postprocessing/compute_means_ICPS.py +++ b/code/postprocessing/compute_means_ICPS.py @@ -11,6 +11,21 @@ import os import sys +""" +Script to compute mean Inter-Channel Phase Synchrony (ICPS) values. + +This script loads precomputed ICPS arrays (from MAT/HDF5 files), extracts mean values +for specific frequency bands (theta, delta), time windows (early, late), and +electrode clusters (DLPFC, OCC, MOTOR). It merges these values with behavioral +summary data and saves the result to a CSV file. + +Usage: + python compute_means_ICPS.py + +Arguments: + session_id (str): The session identifier. +""" + session = sys.argv[1] dataset_path = "/home/data/NDClab/analyses/thrive-theta-ddm/" diff --git a/code/postprocessing/compute_means_TF.py b/code/postprocessing/compute_means_TF.py index 0305813..183bc6c 100755 --- a/code/postprocessing/compute_means_TF.py +++ b/code/postprocessing/compute_means_TF.py @@ -11,6 +11,20 @@ import os import sys +""" +Script to compute mean Time-Frequency (TF) power and Inter-Trial Phase Synchrony (ITPS). + +Similar to the ICPS script, this extracts mean values for TF power and ITPS +from precomputed arrays for defined frequency bands, time windows, and channel clusters. +Results are merged with behavioral data and saved to CSV. + +Usage: + python compute_means_TF.py + +Arguments: + session_id (str): The session identifier. +""" + session = sys.argv[1] dataset_path = "/home/data/NDClab/analyses/thrive-theta-ddm/" diff --git a/code/postprocessing/create_tf_arrays.py b/code/postprocessing/create_tf_arrays.py index 060354d..47867b2 100755 --- a/code/postprocessing/create_tf_arrays.py +++ b/code/postprocessing/create_tf_arrays.py @@ -8,6 +8,20 @@ import re import sys +""" +Script to aggregate individual Time-Frequency (TF) data into group-level arrays. + +This script iterates through all subjects and conditions, loads their individual +TF output files (HDF5/MAT), and concatenates them into large matrices (Subject x Channel x Time x Freq). +These aggregate arrays are saved for faster group-level analysis. + +Usage: + python create_tf_arrays.py + +Arguments: + session_id (str): The session identifier. +""" + # This code creates big arrays of n_sub * chan * times * freqs data from individual time-frequecy data arrays by concatenating individual participants' data. #session = "s2_r1" diff --git a/code/postprocessing/run_erp_processing.py b/code/postprocessing/run_erp_processing.py index aad4b6d..05d1d9f 100755 --- a/code/postprocessing/run_erp_processing.py +++ b/code/postprocessing/run_erp_processing.py @@ -1,6 +1,16 @@ import subprocess import time +""" +Script to submit batch jobs for ERP processing and plotting. + +This script iterates over defined sessions and conditions (Laplacian vs standard) +and submits SLURM jobs for computing ERP means and generating ERP plots. + +Usage: + python run_erp_processing.py +""" + # Define your iteration parameters sessions = [ "s1_r1", diff --git a/code/postprocessing/run_tf_processing.py b/code/postprocessing/run_tf_processing.py index 031bcbc..fc840b8 100755 --- a/code/postprocessing/run_tf_processing.py +++ b/code/postprocessing/run_tf_processing.py @@ -1,6 +1,16 @@ import subprocess import time +""" +Script to submit batch jobs for Time-Frequency mean computation. + +This script submits SLURM jobs to run `compute_means_TF.py` and `compute_means_ICPS.py` +for defined sessions. + +Usage: + python run_tf_processing.py +""" + # Define your iteration parameters sessions = [ "s1_r1", diff --git a/code/preprocessing-eeg/MADE_pipeline.m b/code/preprocessing-eeg/MADE_pipeline.m index 02c8ce0..fd83829 100755 --- a/code/preprocessing-eeg/MADE_pipeline.m +++ b/code/preprocessing-eeg/MADE_pipeline.m @@ -1,5 +1,18 @@ % Define the MADE processing pipeline as a function function [] = MADE_pipeline(dataset, subjects, session) +% MADE_PIPELINE Main processing pipeline for EEG data (MADE). +% +% Usage: +% MADE_pipeline(dataset, subjects, session) +% +% Inputs: +% dataset - Name of the dataset folder (e.g., 'thrive-dataset'). +% subjects - String with subject IDs separated by slashes (e.g., '3000001/3000002'). +% session - Session ID (e.g., 's1_r1'). +% +% Description: +% Executes the full preprocessing pipeline including filtering, artifact rejection (FASTER/ADJUST), +% ICA, and epoching. % dataset: str like 'thrive-dataset' % subjects: str like '3000001/3000002/3000003/3000004' diff --git a/code/preprocessing-eeg/check_preprocessed_files.py b/code/preprocessing-eeg/check_preprocessed_files.py index 3a71fa8..cabc02c 100755 --- a/code/preprocessing-eeg/check_preprocessed_files.py +++ b/code/preprocessing-eeg/check_preprocessed_files.py @@ -2,7 +2,27 @@ from pathlib import Path import sys +""" +Script to verify the existence and count of preprocessed EEG files. + +This script scans the derivatives directory for preprocessed EEG data (.set and .fdt files) +and alerts if a subject has more than the expected number of files, which might indicate +duplicates or processing issues. + +Usage: + python check_preprocessed_files.py [root_dir] + +Arguments: + root_dir (str, optional): Root directory of the dataset. Defaults to standard path. +""" + def check_eeg_files(root_dir): + """ + Scans the directory structure for EEG files and reports counts per subject. + + Args: + root_dir (str): The root directory to start the scan from. + """ root_path = Path(root_dir) if not root_path.exists(): diff --git a/code/preprocessing-eeg/compare_file_sizes.py b/code/preprocessing-eeg/compare_file_sizes.py index a910f61..1735e31 100755 --- a/code/preprocessing-eeg/compare_file_sizes.py +++ b/code/preprocessing-eeg/compare_file_sizes.py @@ -3,6 +3,13 @@ from pathlib import Path import pandas as pd +""" +Script to compare file sizes between two directory trees. + +Useful for verifying data integrity after copying or moving files, specifically +checking if processed files in the analysis directory match those in the dataset directory. +""" + def compare_subject_files(root_path_a, root_path_b, specific_subpath="**/*"): """ Iterates through subject folders (sub-*) in root_path_a, finds the corresponding @@ -14,6 +21,9 @@ def compare_subject_files(root_path_a, root_path_b, specific_subpath="**/*"): specific_subpath (str): Glob pattern to limit search inside subject folders. Use "**/*" for everything. Use "**/eeg/*" to only look inside eeg folders. + + Returns: + pd.DataFrame: A DataFrame containing details of any size mismatches or missing files. """ base_a = Path(root_path_a) base_b = Path(root_path_b) diff --git a/code/preprocessing-eeg/count_trials.py b/code/preprocessing-eeg/count_trials.py index 806a239..82ceb63 100755 --- a/code/preprocessing-eeg/count_trials.py +++ b/code/preprocessing-eeg/count_trials.py @@ -10,19 +10,33 @@ import datetime import time +""" +Script to count trials in preprocessed EEG data based on various conditions. + +This script loads EEG data (.set files), extracts events, and counts trials +categorized by accuracy, congruency, and observation condition (social/non-social). +It outputs a CSV with trial counts for each subject. + +Usage: + python count_trials.py +""" + def eeg_point2lat(lat_array, epoch_array, srate, timewin=None, timeunit=1): """ Convert latency in data points to latency in ms relative to the time locking. - Parameters: - lat_array (array-like): Latency array in data points assuming concatenated data epochs. - epoch_array (array-like or None): Epoch number corresponding to each latency value. - srate (float): Data sampling rate in Hz. - timewin (list or None): [min, max] time limits in 'timeunit' units. Default is None. - timeunit (float): Time unit in seconds. Default is 1 (seconds). + Args: + lat_array (array-like): Latency array in data points assuming concatenated data epochs. + epoch_array (array-like or None): Epoch number corresponding to each latency value. + srate (float): Data sampling rate in Hz. + timewin (list or None): [min, max] time limits in 'timeunit' units. Default is None. + timeunit (float): Time unit in seconds. Default is 1 (seconds). Returns: - numpy.ndarray: Converted latency values (in 'timeunit' units) for each epoch. + numpy.ndarray: Converted latency values (in 'timeunit' units) for each epoch. + + Raises: + ValueError: If input arrays have inconsistent lengths or timewin is invalid. """ lat_array = np.array(lat_array) @@ -57,6 +71,14 @@ def eeg_point2lat(lat_array, epoch_array, srate, timewin=None, timeunit=1): return np.round(newlat * 1e9) / 1e9 def disp_diff_arr(arr1, arr2, round = 5): + """ + Displays differences between two arrays. + + Args: + arr1 (np.ndarray): First array. + arr2 (np.ndarray): Second array. + round (int): Number of decimal places to round to before comparison. + """ mask = np.round(arr1, round) != np.round(arr2, round) diff_elements_1 = arr1[mask] diff --git a/code/preprocessing-eeg/create_mat_csd_s1.m b/code/preprocessing-eeg/create_mat_csd_s1.m index 69c8f13..9f83d6b 100755 --- a/code/preprocessing-eeg/create_mat_csd_s1.m +++ b/code/preprocessing-eeg/create_mat_csd_s1.m @@ -1,3 +1,9 @@ +% CREATE_MAT_CSD_S1 Extract ERPs from CSD data +% +% Usage: +% Run this script to load preprocessed CSD data (.set files), epoch based on +% conditions (social/nonsocial, error/correct), and save averaged ERPs to a .mat file. + % % Modified on 2024/04/11 to process thrive dataset flanker data % diff --git a/code/preprocessing-eeg/create_mat_s1.m b/code/preprocessing-eeg/create_mat_s1.m index 4f89a8d..3335616 100755 --- a/code/preprocessing-eeg/create_mat_s1.m +++ b/code/preprocessing-eeg/create_mat_s1.m @@ -1,3 +1,9 @@ +% CREATE_MAT_S1 Extract ERPs from standard EEG data +% +% Usage: +% Run this script to load preprocessed EEG data (.set files), epoch based on +% conditions, and save averaged ERPs to a .mat file. + % % Modified on 2024/04/11 to process thrive dataset flanker data % diff --git a/code/preprocessing-eeg/create_valid_eeg.py b/code/preprocessing-eeg/create_valid_eeg.py index 6743027..5baf411 100755 --- a/code/preprocessing-eeg/create_valid_eeg.py +++ b/code/preprocessing-eeg/create_valid_eeg.py @@ -4,12 +4,35 @@ from glob import glob import os +""" +Script to create valid EEG datasets by filtering subjects and trials. + +This script filters behavioral summary data to exclude subjects based on specific +EEG exclusion criteria and performance thresholds. It creates cleaned CSV files +for social and non-social conditions. + +Usage: + python create_valid_eeg.py + +Arguments: + session_id (str): The session identifier. +""" + def replace_outliers_with_nan_cols(df, columns_to_check, sd_thresh=3): """ - Replaces outliers with NaN. - Raises KeyError if a column is missing. - Returns the modified dataframe if successful. + Replaces outliers in specified columns with NaN based on standard deviation threshold. + + Args: + df (pd.DataFrame): The input pandas DataFrame. + columns_to_check (list): A list of column names to check for outliers. + sd_thresh (float, optional): The number of standard deviations to use as the threshold. Defaults to 3. + + Returns: + pd.DataFrame: The DataFrame with outliers replaced by NaN. + + Raises: + KeyError: If a column in columns_to_check is missing from the DataFrame. """ # 1. Validation: Check all columns exist first # If this fails, the error is raised and nothing is returned. diff --git a/code/preprocessing-eeg/deviation_output.py b/code/preprocessing-eeg/deviation_output.py index ab73821..33ddb23 100755 --- a/code/preprocessing-eeg/deviation_output.py +++ b/code/preprocessing-eeg/deviation_output.py @@ -4,6 +4,20 @@ import pandas as pd import sys +""" +Script to aggregate deviation reports from subject directories. + +This script scans for 'deviation.txt' files in the EEG directories of subjects, +reads their content, and compiles a CSV report listing deviations and the number +of EEG files present for each subject. + +Usage: + python deviation_output.py + +Arguments: + session_id (str): The session identifier. +""" + session = sys.argv[1] base_dir = "/home/data/NDClab/datasets/thrive-dataset/sourcedata/checked/" subjects = sorted(glob.glob(os.path.join(base_dir, "sub-3*"))) diff --git a/code/preprocessing-eeg/diff_vhdr.py b/code/preprocessing-eeg/diff_vhdr.py index bf3b59a..220d8cb 100755 --- a/code/preprocessing-eeg/diff_vhdr.py +++ b/code/preprocessing-eeg/diff_vhdr.py @@ -5,7 +5,24 @@ from rich.table import Table from rich.syntax import Syntax +""" +Script to compare two BrainVision header (.vhdr) files. + +This script displays a rich, color-coded unified diff of two text files, +intended for comparing VHDR files. + +Usage: + python diff_vhdr.py +""" + def print_diff_rich(file1_path, file2_path): + """ + Prints a color-coded diff of two files to the console. + + Args: + file1_path (str): Path to the first file. + file2_path (str): Path to the second file. + """ console = Console() try: diff --git a/code/preprocessing-eeg/edit_event_markers_thrive.m b/code/preprocessing-eeg/edit_event_markers_thrive.m index ddc1b13..2b2d6d1 100755 --- a/code/preprocessing-eeg/edit_event_markers_thrive.m +++ b/code/preprocessing-eeg/edit_event_markers_thrive.m @@ -1,4 +1,15 @@ function EEG = edit_event_markers_thrive(EEG) +% EDIT_EVENT_MARKERS_THRIVE Add detailed event labels for the Thrive task. +% +% Usage: +% EEG = edit_event_markers_thrive(EEG) +% +% Inputs: +% EEG - EEGLAB data structure. +% +% Outputs: +% EEG - EEGLAB data structure with updated event field containing: +% observation, eventType, congruency, accuracy, rt, etc. % %%%%%% Label event markers%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % diff --git a/code/preprocessing-eeg/erp_check.m b/code/preprocessing-eeg/erp_check.m index 74b2b35..008c53d 100755 --- a/code/preprocessing-eeg/erp_check.m +++ b/code/preprocessing-eeg/erp_check.m @@ -1,3 +1,9 @@ +% ERP_CHECK Check ERPs and Count Trials +% +% Usage: +% Run this script to count trials per condition for each subject and +% optionally plot ERPs and topographies for quality checks. + % % Modified on 2024/04/11 to process thrive dataset flanker data % diff --git a/code/preprocessing-eeg/load_and_merge_2.m b/code/preprocessing-eeg/load_and_merge_2.m index 219d3f7..ca348fb 100755 --- a/code/preprocessing-eeg/load_and_merge_2.m +++ b/code/preprocessing-eeg/load_and_merge_2.m @@ -1,4 +1,15 @@ function [EEG] = load_and_merge(path, force_cut_list) +% LOAD_AND_MERGE Load and merge raw EEG files. +% +% Usage: +% EEG = load_and_merge(path, force_cut_list) +% +% Inputs: +% path - Directory containing .vhdr/.vmrk files. +% force_cut_list - Cell array of conditions to forcibly remove ('social', 'nonsocial'). +% +% Outputs: +% EEG - Merged EEGLAB data structure. if nargin < 2 force_cut_list = {}; diff --git a/code/preprocessing-eeg/make_eeg_pieces.m b/code/preprocessing-eeg/make_eeg_pieces.m index 81798c6..f5d5242 100755 --- a/code/preprocessing-eeg/make_eeg_pieces.m +++ b/code/preprocessing-eeg/make_eeg_pieces.m @@ -1,4 +1,14 @@ function [eeg_pieces] = make_eeg_pieces(EEG) +% MAKE_EEG_PIECES Split EEG data into segments based on boundary markers. +% +% Usage: +% eeg_pieces = make_eeg_pieces(EEG) +% +% Inputs: +% EEG - EEGLAB data structure. +% +% Outputs: +% eeg_pieces - Cell array of EEGLAB structures, each representing a segment. % Step 1: Extract boundary event latencies [~, boundary_indices] = pop_selectevent(EEG, 'type', 'boundary'); diff --git a/code/preprocessing-eeg/not_processed_checked.py b/code/preprocessing-eeg/not_processed_checked.py index 4413809..d4e1d37 100755 --- a/code/preprocessing-eeg/not_processed_checked.py +++ b/code/preprocessing-eeg/not_processed_checked.py @@ -5,6 +5,20 @@ from pathlib import Path import sys +""" +Script to identify subjects that still need preprocessing. + +This script compares the list of subjects in the source directory (checked data) +against the list of subjects in the derivatives directory (preprocessed data). +It reports which subjects are pending, and which of those have recorded deviations. + +Usage: + python not_processed_checked.py + +Arguments: + session_id (str): The session identifier. +""" + session = sys.argv[1] dataset_path = "/home/data/NDClab/datasets/thrive-dataset/" data_path = "derivatives/preprocessed/" diff --git a/code/preprocessing-eeg/plot_topo.m b/code/preprocessing-eeg/plot_topo.m index 198aa12..10bfdff 100755 --- a/code/preprocessing-eeg/plot_topo.m +++ b/code/preprocessing-eeg/plot_topo.m @@ -1,3 +1,9 @@ +% PLOT_TOPO Plot Topographic Maps of ERPs +% +% Usage: +% Run this script to load ERP data and plot topographic maps for specific +% time windows (e.g., 0-100 ms for ERN). + clear % clear matlab workspace clc % clear matlab command window diff --git a/code/preprocessing-eeg/preprocess_eeg_piece.m b/code/preprocessing-eeg/preprocess_eeg_piece.m index e8eb2a3..4d0ffdc 100755 --- a/code/preprocessing-eeg/preprocess_eeg_piece.m +++ b/code/preprocessing-eeg/preprocess_eeg_piece.m @@ -1,6 +1,19 @@ % this function is run inside MADE, so all EEGLab dependencies should be already imported and initialized function [bad_channels] = preprocess_eeg_piece(eeg_piece, channel_locations, stimulus_timeoffset) +% PREPROCESS_EEG_PIECE Preprocess a single segment of EEG data. +% +% Usage: +% bad_channels = preprocess_eeg_piece(eeg_piece, channel_locations, stimulus_timeoffset) +% +% Inputs: +% eeg_piece - EEGLAB structure for the segment. +% channel_locations - Channel locations structure/file. +% stimulus_timeoffset - Time offset to apply to stimulus markers (ms). +% +% Outputs: +% bad_channels - Indices of channels identified as bad by FASTER. + down_sample = 1; sampling_rate = 1000; adjust_time_offset = 1; diff --git a/code/preprocessing-eeg/run_MADE_batch.py b/code/preprocessing-eeg/run_MADE_batch.py index 1f84d86..70a0c42 100755 --- a/code/preprocessing-eeg/run_MADE_batch.py +++ b/code/preprocessing-eeg/run_MADE_batch.py @@ -7,6 +7,20 @@ from glob import glob from pathlib import Path +""" +Script to manage and submit batch jobs for EEG preprocessing (MADE pipeline). + +This script identifies subjects that need processing (source exists but derivatives do not), +categorizes them by deviation status, and allows the user to submit SLURM jobs +for a list of subjects. + +Usage: + python run_MADE_batch.py + +Arguments: + session_id (str): The session identifier. +""" + # --- Configuration --- if len(sys.argv) < 2: print("Usage: python script.py ") @@ -23,7 +37,13 @@ def get_subjects(glob_pattern): """ - Globs files and returns a set of unique subject IDs. + Finds unique subject IDs from file paths matching a glob pattern. + + Args: + glob_pattern (str): The pattern to match files. + + Returns: + set: A set of unique subject ID strings. """ subjects = set() files = glob(glob_pattern) @@ -70,6 +90,14 @@ def get_subjects(glob_pattern): # --- 3. Output --- def print_group(title, data_set, show_ids=True): + """ + Prints a summary of a group of subjects. + + Args: + title (str): The title of the group. + data_set (set): A set of subject IDs. + show_ids (bool, optional): Whether to list the individual IDs. Defaults to True. + """ sorted_list = sorted(list(data_set)) print("") print(f"{title}: {len(sorted_list)}") diff --git a/code/preprocessing-eeg/stim_cnt_check_csv.m b/code/preprocessing-eeg/stim_cnt_check_csv.m index d951d8b..6d64970 100755 --- a/code/preprocessing-eeg/stim_cnt_check_csv.m +++ b/code/preprocessing-eeg/stim_cnt_check_csv.m @@ -1,3 +1,9 @@ +% STIM_CNT_CHECK_CSV Count Stimulus Markers from Raw Data +% +% Usage: +% Run this script to scan raw VHDR files (specifically those with deviations) +% and count stimulus markers to verify data integrity. Saves results to CSV. + %clear all; %clc; diff --git a/code/preprocessing-eeg/stim_cnt_check_processed.m b/code/preprocessing-eeg/stim_cnt_check_processed.m index d68d16d..aa8e51f 100755 --- a/code/preprocessing-eeg/stim_cnt_check_processed.m +++ b/code/preprocessing-eeg/stim_cnt_check_processed.m @@ -1,3 +1,9 @@ +% STIM_CNT_CHECK_PROCESSED Count Trials in Processed Data +% +% Usage: +% Run this script to scan processed .set files and count valid trials +% per condition. Saves results to CSV. + %clear all; %clc; diff --git a/code/preprocessing-eeg/stim_cnt_check_processed_csd.m b/code/preprocessing-eeg/stim_cnt_check_processed_csd.m index 3a50a8a..066c54f 100755 --- a/code/preprocessing-eeg/stim_cnt_check_processed_csd.m +++ b/code/preprocessing-eeg/stim_cnt_check_processed_csd.m @@ -1,4 +1,10 @@ +% STIM_CNT_CHECK_PROCESSED_CSD Count Trials in Processed CSD Data +% +% Usage: +% Run this script to scan processed CSD .set files and count valid trials. +% Saves results to CSV. + %clear all; %clc; diff --git a/code/preprocessing-eeg/stim_cnt_check_thrive.m b/code/preprocessing-eeg/stim_cnt_check_thrive.m index b1c9d0a..196089f 100755 --- a/code/preprocessing-eeg/stim_cnt_check_thrive.m +++ b/code/preprocessing-eeg/stim_cnt_check_thrive.m @@ -1,4 +1,15 @@ function [stim_count_nonsoc, stim_count_soc] = stim_cnt_check_thrive(path) +% STIM_CNT_CHECK_THRIVE Count stimulus markers in a directory. +% +% Usage: +% [stim_count_nonsoc, stim_count_soc] = stim_cnt_check_thrive(path) +% +% Inputs: +% path - Directory containing .vhdr files. +% +% Outputs: +% stim_count_nonsoc - Count of non-social stimulus markers. +% stim_count_soc - Count of social stimulus markers. stim_events_nonsoc = {'S 41', 'S 42', 'S 43', 'S 44'}; stim_events_soc = {'S 51', 'S 52', 'S 53', 'S 54'}; diff --git a/code/preprocessing-eeg/transform_csd_s1.m b/code/preprocessing-eeg/transform_csd_s1.m index e9eb338..a8e4d28 100755 --- a/code/preprocessing-eeg/transform_csd_s1.m +++ b/code/preprocessing-eeg/transform_csd_s1.m @@ -1,4 +1,10 @@ +% TRANSFORM_CSD_S1 Apply CSD Transformation +% +% Usage: +% Run this script to apply Current Source Density (CSD) transformation +% to preprocessed EEG data. + %clear % clear matlab workspace cluster = parcluster('local'); %% Setting up other things diff --git a/code/readme.md b/code/readme.md index 2355644..4a6443e 100755 --- a/code/readme.md +++ b/code/readme.md @@ -1,7 +1,30 @@ # Code -### Instructions -This folder contains the code used to create post-processed derivatives, statistics, and figures. +## Overview +This directory contains all source code for the project, organized by analysis stage and modality. +## Subdirectories -### Project Notes +### `behavior/` +Python scripts for parsing, cleaning, and analyzing behavioral data from the Flanker task. Includes quality control checks (`check_status.py`, `check_subject_csv.py`). + +### `ddm/` +Code for the Shrinking Spotlight Protocol (SSP) Drift Diffusion Model. +* **R/C++**: `SSP_DDM_fitting.R` and `simSSP_model_GB_noScale.cpp` implement the model fitting and simulation. +* **Python**: Scripts for batch submission and result aggregation. + +### `preprocessing-eeg/` +The MADE (Maryland Analysis of Developmental EEG) preprocessing pipeline. +* **MATLAB**: Core pipeline functions (`MADE_pipeline.m`, `preprocess_eeg_piece.m`). +* **Python**: Batch management scripts (`run_MADE_batch.py`, `check_preprocessed_files.py`). + +### `postprocessing/` +Scripts to extract measures from preprocessed EEG data. +* **ERPs**: `compute_erp_means.py` +* **Time-Frequency**: `create_tf_arrays.py`, `compute_means_TF.py`, `compute_means_ICPS.py`. + +### `figures/` +Code to generate visualizations, such as Grand Average ERP plots (`plot_erp.py`, `plot_erp.m`). + +### `statistics/` +R scripts for running Linear Mixed-Effects Models (LMM) and exporting results to APA-style tables. diff --git a/code/statistics/find_newest_file.R b/code/statistics/find_newest_file.R index 7025b87..5741696 100644 --- a/code/statistics/find_newest_file.R +++ b/code/statistics/find_newest_file.R @@ -1,3 +1,16 @@ +#' Find Newest File Utility +#' +#' Provides a function to locate the most recently modified file matching a specific pattern. + +#' Find the Newest File Matching a Pattern +#' +#' Searches for files matching the given glob pattern and returns the path +#' to the file with the most recent modification time. +#' +#' @param path_pattern A string containing the glob pattern to match files (e.g., "path/to/*.csv"). +#' +#' @return A string containing the path to the newest file, or NULL if no files are found. +#' @export find_newest_file <- function(path_pattern) { # Sys.glob expands wildcards similarly to python's glob matching_files <- Sys.glob(path_pattern) diff --git a/code/statistics/lmer_export_apa.R b/code/statistics/lmer_export_apa.R index 8d3dfab..854c113 100644 --- a/code/statistics/lmer_export_apa.R +++ b/code/statistics/lmer_export_apa.R @@ -2,6 +2,21 @@ library(dplyr) library(tibble) library(flextable) +#' LMER Export to APA Table +#' +#' Functions to format and export results from Linear Mixed-Effects Models (lmer) +#' into APA-style Word documents. + +#' Export LMER Model Summary to APA-Style Word Document +#' +#' Extracts coefficients, confidence intervals, and p-values from an lmer model, +#' formats them into a nice table, and saves it as a .docx file. +#' +#' @param model An lmer model object. +#' @param path String. The file path where the .docx table should be saved. +#' +#' @return No return value, called for side effect of creating a file. +#' @export lmer_export_apa <- function(model, path) { output_model <- as.data.frame(summary(model)$coefficients) @@ -38,6 +53,14 @@ lmer_export_apa <- function(model, path) { } +#' Rename Model Parameters for Publication +#' +#' Renames raw variable names from the model output into human-readable labels +#' suitable for publication (e.g., "acc" -> "Accuracy"). Handles interaction terms. +#' +#' @param parameter_vector A character vector of parameter names to rename. +#' +#' @return A character vector of renamed parameters. rename_parameters <- function(parameter_vector) { # Define the mapping for single term renaming