NDCLab · google-labs-jules · Jan 19, 2026
diff --git a/README.md b/README.md
@@ -1,44 +1,61 @@
-# [project name]
-
-## Project Goal
-:point_right: Replace this text with a pithy description of the goal of the data analysis project.
-
-
-## Background & Design
-:point_right: Replace this text with a slightly longer (but still only max 250 words) description of the project background and its design. 
-
-
-## Roadmap
-:point_right: Insert graphic of planned project releases. [This site](https://app.diagrams.net/) allows you to build a graphic that is connected to GitHub, so that changes you make are treated as GitHub commits.
-
-
-## Contents
-:point_right: This section should introduce the content that is published on `main`. Example:
-
-
-#### 2021-07-01 SPR Poster
-This is a description of the poster. There is a matching folder in the repo.
-
-:point_right: When you first set up this readme file, you might replace everything under "Contents" here with something like "Watch for our first release!"
-
-
-## Work in Development
-This `main` branch contains completed releases for this project. For all work-in-progress, please switch over to the `dev` branches.
-
-:point_right: Keep the "Work in Development" text above as the default NDCLab "blurb" to help visitors navigate the repo. But delete this note before publishing the readme file.
-
-
-## Contributors
-| Role | Name |
-| ---  | ---  |
-| add role | insert team member(s) |
-| add role | insert team member(s) |
-| add role | insert team member(s) |
-
-Learn more about us [here](https://www.ndclab.com/people).
-
-
-## Contributing
-If you are interested in contributing, please read our [CONTRIBUTING.md](CONTRIBUTING.md) file.
-
-:point_right: Keep the "Contributing" text above as the default NDCLab "blurb" to help visitors navigate the repo. But delete this note before publishing the readme file.
+# Thrive: Theta DDM Analysis
+
+## Project Goal
+This repository contains the data analysis pipeline for the "Thrive" project. The goal is to investigate the neural mechanisms underlying social influence on decision-making using the Shrinking Spotlight Protocol (SSP) Drift Diffusion Model (DDM) and EEG time-frequency analysis. We aim to understand how social observation affects cognitive control processes, specifically examining ERP components (ERN, Pe) and theta band oscillations.
+
+## Background & Design
+The study employs a Flanker task with social (Observed) and non-social (Alone) conditions.
+Key analyses include:
+1.  **Behavioral Analysis**: Analyzing accuracy and reaction times (RT) to model decision-making processes.
+2.  **Computational Modeling**: Fitting the SSP-DDM to behavioral data to estimate parameters like boundary separation, non-decision time, and attentional focus.
+3.  **EEG Analysis**:
+    *   **Preprocessing**: Automated pipeline (MADE) including filtering, artifact rejection (FASTER/ADJUST), and ICA.
+    *   **ERP Analysis**: Computing Event-Related Potentials (ERN, Pe) for error and correct trials.
+    *   **Time-Frequency Analysis**: Examining power and phase synchrony (ITPS, ICPS) in the theta and delta bands.
+
+## Directory Structure
+
+```yml
+project-name
+├── code
+│   ├── behavior            # Scripts for behavioral data analysis and cleaning
+│   ├── ddm                 # Scripts for SSP-DDM model fitting (R, C++, Python)
+│   ├── figures             # Scripts to generate plots (ERPs, Topographies)
+│   ├── matlab              # Helper MATLAB functions
+│   ├── postprocessing      # Scripts for aggregating EEG metrics (ERP, TF, ICPS)
+│   ├── preprocessing-eeg   # EEG preprocessing pipeline (MADE)
+│   └── statistics          # R scripts for statistical analysis (LMM)
+├── derivatives             # Generated data (preprocessed EEG, summary CSVs)
+├── sourcedata              # Raw input data (checked)
+└── results                 # Figures and statistical outputs
+```
+
+## Setup & Usage
+
+### Prerequisites
+*   **Python**: `pandas`, `numpy`, `scipy`, `mne`, `matplotlib`, `h5py`
+*   **R**: `DEoptim`, `Rcpp`, `dplyr`, `flextable`
+*   **MATLAB**: EEGLAB with plugins (FASTER, ADJUST, firfilt)
+*   **C++**: Compiler compatible with Rcpp
+
+### Workflow
+1.  **Behavioral Processing**:
+    *   Run `code/behavior/behavior_analysis.py` to aggregate PsychoPy data.
+    *   Run `code/behavior/create_valid_behav.py` to filter subjects and create summary CSVs.
+
+2.  **DDM Fitting**:
+    *   Compile the C++ model: `code/ddm/simSSP_model_GB_noScale.cpp`.
+    *   Run `code/ddm/run_ddm_batch.py` to submit fitting jobs.
+    *   Aggregate results with `code/ddm/fitted.py`.
+
+3.  **EEG Preprocessing**:
+    *   Use `code/preprocessing-eeg/run_MADE_batch.py` to submit preprocessing jobs (calls `MADE_pipeline.m`).
+    *   Verify outputs with `code/preprocessing-eeg/check_preprocessed_files.py`.
+
+4.  **Post-Processing & Statistics**:
+    *   Compute ERP means: `code/postprocessing/compute_erp_means.py`.
+    *   Compute TF metrics: `code/postprocessing/compute_means_TF.py`.
+    *   Run statistical models using R scripts in `code/statistics/`.
+
+## Contributors
+*   NDCLab Team
diff --git a/code/behavior/behavior_analysis.py b/code/behavior/behavior_analysis.py
@@ -7,9 +7,41 @@
 import time
 import datetime
 
+"""
+Behavioral analysis script for processing PsychoPy data.
+
+This script reads behavioral data from PsychoPy CSV files, performs data cleaning,
+calculates various behavioral metrics (accuracy, reaction times, error rates),
+and generates summary CSV files for further analysis. It processes data for both
+social and non-social conditions.
+
+Usage:
+    python behavior_analysis.py <session_id>
+
+Arguments:
+    session_id (str): The session identifier (e.g., 's1_r1').
+
+Outputs:
+    - Summary CSV file containing aggregated metrics for each subject.
+    - Individual trial data CSV files for each subject.
+    - A combined CSV file containing trial data for all subjects.
+    - A log file recording the processing details.
+"""
+
 pd.options.mode.chained_assignment = None
 
 def convert_to_list_rt(series):
+    """
+    Converts a pandas Series of reaction times (RT) to a list of floats.
+
+    Handles string representations of lists and single values, as well as actual lists.
+
+    Args:
+        series (pd.Series): A pandas Series containing RT data, potentially as strings or lists.
+
+    Returns:
+        list: A list of float RT values, with np.nan for missing or invalid entries.
+    """
     float_list = []
     for value in series:
         if isinstance(value, str):
@@ -24,6 +56,17 @@ def convert_to_list_rt(series):
     return float_list
 
 def convert_to_list_resp(series):
+    """
+    Converts a pandas Series of response keys to a list of response values.
+
+    Parses string representations of lists to extract integer response keys.
+
+    Args:
+        series (pd.Series): A pandas Series containing response data.
+
+    Returns:
+        list: A list of lists containing integer response keys, or np.nan for missing entries.
+    """
     resp_list = []
     for value in series:
         if isinstance(value, str):
@@ -35,6 +78,15 @@ def convert_to_list_resp(series):
 
 
 def sort_csvs_by_date_pd(csv_paths):
+    """
+    Sorts a list of CSV file paths based on the date in the 'date' column of the first row.
+
+    Args:
+        csv_paths (list): A list of file paths to CSV files.
+
+    Returns:
+        list: The list of file paths sorted by date.
+    """
     date_format = "%Y-%m-%d_%Hh%M.%S.%f"
 
     def get_first_date(path):

diff --git a/code/behavior/check_status.py b/code/behavior/check_status.py
@@ -3,6 +3,20 @@
 import re
 from pathlib import Path
 
+"""
+Script to check the processing status of behavioral data for a given session.
+
+This script scans the source directory for raw PsychoPy files, checks for
+deviation files, identifies successfully processed subjects, and reports
+which subjects are pending processing or have deviations.
+
+Usage:
+    python check_status.py <session_id>
+
+Arguments:
+    session_id (str): The session identifier (e.g., 's1_r1').
+"""
+
 def get_args():
     """Parses command line arguments."""
     parser = argparse.ArgumentParser(description="Check behavior processing status.")

diff --git a/code/behavior/check_subject_csv.py b/code/behavior/check_subject_csv.py
@@ -2,6 +2,17 @@
 import re
 import glob
 
+"""
+Script to verify the integrity of subject CSV files in the source directory.
+
+It checks if each subject has the expected number of files (3) and detects
+any deviation files. It asserts that the main data CSV file exists and does
+not have 'deviation' in its filename.
+
+Usage:
+    python check_subject_csv.py
+"""
+
 input_dataset_path = "/home/data/NDClab/datasets/thrive-dataset/"
 data_path = "sourcedata/checked/"
 sub_path = "s1_r1/psychopy/"

diff --git a/code/behavior/create_valid_behav.py b/code/behavior/create_valid_behav.py
@@ -4,12 +4,35 @@
 from glob import glob
 import os
 
+"""
+Script to filter and clean behavioral data, creating valid datasets for analysis.
+
+This script loads the behavioral summary data, removes excluded subjects and
+outliers based on reaction time and accuracy criteria, and saves separate CSV
+files for social and non-social conditions containing only valid data.
+
+Usage:
+    python create_valid_behav.py <session_id>
+
+Arguments:
+    session_id (str): The session identifier (e.g., 's1_r1').
+"""
+
 
 def replace_outliers_with_nan_cols(df, columns_to_check, sd_thresh=3):
     """
-    Replaces outliers with NaN.
-    Raises KeyError if a column is missing.
-    Returns the modified dataframe if successful.
+    Replaces outliers in specified columns with NaN based on standard deviation threshold.
+
+    Args:
+        df (pd.DataFrame): The input pandas DataFrame.
+        columns_to_check (list): A list of column names to check for outliers.
+        sd_thresh (float, optional): The number of standard deviations to use as the threshold. Defaults to 3.
+
+    Returns:
+        pd.DataFrame: The DataFrame with outliers replaced by NaN.
+
+    Raises:
+        KeyError: If a column in columns_to_check is missing from the DataFrame.
     """
     # 1. Validation: Check all columns exist first
     # If this fails, the error is raised and nothing is returned.

diff --git a/code/ddm/SSP_DDM_fitting.R b/code/ddm/SSP_DDM_fitting.R
@@ -1,3 +1,22 @@
+#' SSP-DDM Model Fitting Script
+#'
+#' This script loads human behavioral data and fits the Shrinking Spotlight Protocol (SSP)
+#' Drift Diffusion Model (DDM) parameters to the data using Differential Evolution optimization.
+#' It relies on an Rcpp function for model simulation.
+#'
+#' @description
+#' The script defines a fitting function that calls the C++ simulation model,
+#' compares simulated data with human data using Chi-Square statistics, and optimizes
+#' parameters (a, ter, p, rd, sda) to minimize the difference.
+#' It processes subjects in batches as defined by command line arguments.
+#'
+#' @usage
+#' Rscript SSP_DDM_fitting.R <startIdx> <endIdx> <session>
+#'
+#' @param startIdx The starting index of subjects to process.
+#' @param endIdx The ending index of subjects to process.
+#' @param session The session identifier (e.g., 's1_r1').
+
 ## This script loads in human data and then proceeds through the fitting Script to fit parameters to the data. 
 ## The first section of code is a function that call the rcpp code (separate file), which contains the actual model,
 ## simulates data based on a set of parameters, and outputs a fit stat. The second section of code is the "main script"
@@ -16,6 +35,24 @@ endIdx <- args[2]
 session <- args[3]
 #------------------------------------------------------------------------------
 # Fit function for the SSP model
+
+#' Fit Function for the SSP Model
+#'
+#' Calculates the Chi-Square goodness-of-fit statistic between human data and
+#' simulated data generated by the SSP model for a given set of parameters.
+#'
+#' @param parms Numeric vector of length 5 containing the model parameters:
+#'   1. a (boundary separation)
+#'   2. ter (non-decision time)
+#'   3. p (perceptual strength)
+#'   4. rd (shrinking rate)
+#'   5. sda (attentional window width)
+#' @param nTrials Integer. Number of trials to simulate per condition.
+#' @param cutPoints List of length 4. RT bin boundaries for each accuracy/congruency condition.
+#' @param humanProps List of length 4. Proportions of human data in each bin.
+#' @param HumanTrialCounts Numeric vector of length 2. Total trial counts for congruent and incongruent conditions.
+#'
+#' @return A numeric value representing the Chi-Square statistic. Returns a very large number if infinite.
 fitFunctionSSP <- function(
     parms, # generated by DEoptim within specified boundaries when DEoptim is called
     nTrials, # how many trials to simulate per condition 

diff --git a/code/ddm/bulk_scancel.py b/code/ddm/bulk_scancel.py
@@ -1,5 +1,15 @@
 import subprocess
 
+"""
+Script to bulk cancel SLURM jobs within a specified range of job IDs.
+
+This script generates a list of job IDs from a start ID to an end ID and
+executes the 'scancel' command for all of them.
+
+Usage:
+    python bulk_scancel.py
+"""
+
 start_id = 2648371
 end_id = 2648451
 

diff --git a/code/ddm/fitted.py b/code/ddm/fitted.py
@@ -3,6 +3,19 @@
 from datetime import datetime
 import sys
 
+"""
+Script to aggregate DDM fitting results into a single CSV file.
+
+This script searches for output files from DDM fitting, concatenates them,
+and saves a summary file listing the subjects that have been successfully fitted.
+
+Usage:
+    python fitted.py <session_id>
+
+Arguments:
+    session_id (str): The session identifier.
+"""
+
 session = sys.argv[1]
 current_datetime = datetime.now()
 formatted_date = current_datetime.strftime("%Y_%m_%d_%H_%M_%S")

diff --git a/code/ddm/run_ddm_batch.py b/code/ddm/run_ddm_batch.py
@@ -6,6 +6,19 @@
 from pathlib import Path
 import pandas as pd
 
+"""
+Script to submit DDM fitting jobs to the SLURM scheduler.
+
+This script identifies subjects that have not yet been fitted for the DDM model
+and submits batch jobs for them. It allows for submitting jobs in batches.
+
+Usage:
+    python run_ddm_batch.py <session_id>
+
+Arguments:
+    session_id (str): The session identifier.
+"""
+
 session = sys.argv[1]
 #session = "s2_r1"
 data_dir = "/home/data/NDClab/analyses/thrive-theta-ddm/"