Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 61 additions & 44 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,44 +1,61 @@
# [project name]

## Project Goal
:point_right: Replace this text with a pithy description of the goal of the data analysis project.


## Background & Design
:point_right: Replace this text with a slightly longer (but still only max 250 words) description of the project background and its design.


## Roadmap
:point_right: Insert graphic of planned project releases. [This site](https://app.diagrams.net/) allows you to build a graphic that is connected to GitHub, so that changes you make are treated as GitHub commits.


## Contents
:point_right: This section should introduce the content that is published on `main`. Example:


#### 2021-07-01 SPR Poster
This is a description of the poster. There is a matching folder in the repo.

:point_right: When you first set up this readme file, you might replace everything under "Contents" here with something like "Watch for our first release!"


## Work in Development
This `main` branch contains completed releases for this project. For all work-in-progress, please switch over to the `dev` branches.

:point_right: Keep the "Work in Development" text above as the default NDCLab "blurb" to help visitors navigate the repo. But delete this note before publishing the readme file.


## Contributors
| Role | Name |
| --- | --- |
| add role | insert team member(s) |
| add role | insert team member(s) |
| add role | insert team member(s) |

Learn more about us [here](https://www.ndclab.com/people).


## Contributing
If you are interested in contributing, please read our [CONTRIBUTING.md](CONTRIBUTING.md) file.

:point_right: Keep the "Contributing" text above as the default NDCLab "blurb" to help visitors navigate the repo. But delete this note before publishing the readme file.
# Thrive: Theta DDM Analysis

## Project Goal
This repository contains the data analysis pipeline for the "Thrive" project. The goal is to investigate the neural mechanisms underlying social influence on decision-making using the Shrinking Spotlight Protocol (SSP) Drift Diffusion Model (DDM) and EEG time-frequency analysis. We aim to understand how social observation affects cognitive control processes, specifically examining ERP components (ERN, Pe) and theta band oscillations.

## Background & Design
The study employs a Flanker task with social (Observed) and non-social (Alone) conditions.
Key analyses include:
1. **Behavioral Analysis**: Analyzing accuracy and reaction times (RT) to model decision-making processes.
2. **Computational Modeling**: Fitting the SSP-DDM to behavioral data to estimate parameters like boundary separation, non-decision time, and attentional focus.
3. **EEG Analysis**:
* **Preprocessing**: Automated pipeline (MADE) including filtering, artifact rejection (FASTER/ADJUST), and ICA.
* **ERP Analysis**: Computing Event-Related Potentials (ERN, Pe) for error and correct trials.
* **Time-Frequency Analysis**: Examining power and phase synchrony (ITPS, ICPS) in the theta and delta bands.

## Directory Structure

```yml
project-name
├── code
│ ├── behavior # Scripts for behavioral data analysis and cleaning
│ ├── ddm # Scripts for SSP-DDM model fitting (R, C++, Python)
│ ├── figures # Scripts to generate plots (ERPs, Topographies)
│ ├── matlab # Helper MATLAB functions
│ ├── postprocessing # Scripts for aggregating EEG metrics (ERP, TF, ICPS)
│ ├── preprocessing-eeg # EEG preprocessing pipeline (MADE)
│ └── statistics # R scripts for statistical analysis (LMM)
├── derivatives # Generated data (preprocessed EEG, summary CSVs)
├── sourcedata # Raw input data (checked)
└── results # Figures and statistical outputs
```

## Setup & Usage

### Prerequisites
* **Python**: `pandas`, `numpy`, `scipy`, `mne`, `matplotlib`, `h5py`
* **R**: `DEoptim`, `Rcpp`, `dplyr`, `flextable`
* **MATLAB**: EEGLAB with plugins (FASTER, ADJUST, firfilt)
* **C++**: Compiler compatible with Rcpp

### Workflow
1. **Behavioral Processing**:
* Run `code/behavior/behavior_analysis.py` to aggregate PsychoPy data.
* Run `code/behavior/create_valid_behav.py` to filter subjects and create summary CSVs.

2. **DDM Fitting**:
* Compile the C++ model: `code/ddm/simSSP_model_GB_noScale.cpp`.
* Run `code/ddm/run_ddm_batch.py` to submit fitting jobs.
* Aggregate results with `code/ddm/fitted.py`.

3. **EEG Preprocessing**:
* Use `code/preprocessing-eeg/run_MADE_batch.py` to submit preprocessing jobs (calls `MADE_pipeline.m`).
* Verify outputs with `code/preprocessing-eeg/check_preprocessed_files.py`.

4. **Post-Processing & Statistics**:
* Compute ERP means: `code/postprocessing/compute_erp_means.py`.
* Compute TF metrics: `code/postprocessing/compute_means_TF.py`.
* Run statistical models using R scripts in `code/statistics/`.

## Contributors
* NDCLab Team
52 changes: 52 additions & 0 deletions code/behavior/behavior_analysis.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,41 @@
import time
import datetime

"""
Behavioral analysis script for processing PsychoPy data.

This script reads behavioral data from PsychoPy CSV files, performs data cleaning,
calculates various behavioral metrics (accuracy, reaction times, error rates),
and generates summary CSV files for further analysis. It processes data for both
social and non-social conditions.

Usage:
python behavior_analysis.py <session_id>

Arguments:
session_id (str): The session identifier (e.g., 's1_r1').

Outputs:
- Summary CSV file containing aggregated metrics for each subject.
- Individual trial data CSV files for each subject.
- A combined CSV file containing trial data for all subjects.
- A log file recording the processing details.
"""

pd.options.mode.chained_assignment = None

def convert_to_list_rt(series):
"""
Converts a pandas Series of reaction times (RT) to a list of floats.

Handles string representations of lists and single values, as well as actual lists.

Args:
series (pd.Series): A pandas Series containing RT data, potentially as strings or lists.

Returns:
list: A list of float RT values, with np.nan for missing or invalid entries.
"""
float_list = []
for value in series:
if isinstance(value, str):
Expand All @@ -24,6 +56,17 @@ def convert_to_list_rt(series):
return float_list

def convert_to_list_resp(series):
"""
Converts a pandas Series of response keys to a list of response values.

Parses string representations of lists to extract integer response keys.

Args:
series (pd.Series): A pandas Series containing response data.

Returns:
list: A list of lists containing integer response keys, or np.nan for missing entries.
"""
resp_list = []
for value in series:
if isinstance(value, str):
Expand All @@ -35,6 +78,15 @@ def convert_to_list_resp(series):


def sort_csvs_by_date_pd(csv_paths):
"""
Sorts a list of CSV file paths based on the date in the 'date' column of the first row.

Args:
csv_paths (list): A list of file paths to CSV files.

Returns:
list: The list of file paths sorted by date.
"""
date_format = "%Y-%m-%d_%Hh%M.%S.%f"

def get_first_date(path):
Expand Down
14 changes: 14 additions & 0 deletions code/behavior/check_status.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,20 @@
import re
from pathlib import Path

"""
Script to check the processing status of behavioral data for a given session.

This script scans the source directory for raw PsychoPy files, checks for
deviation files, identifies successfully processed subjects, and reports
which subjects are pending processing or have deviations.

Usage:
python check_status.py <session_id>

Arguments:
session_id (str): The session identifier (e.g., 's1_r1').
"""

def get_args():
"""Parses command line arguments."""
parser = argparse.ArgumentParser(description="Check behavior processing status.")
Expand Down
11 changes: 11 additions & 0 deletions code/behavior/check_subject_csv.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,17 @@
import re
import glob

"""
Script to verify the integrity of subject CSV files in the source directory.

It checks if each subject has the expected number of files (3) and detects
any deviation files. It asserts that the main data CSV file exists and does
not have 'deviation' in its filename.

Usage:
python check_subject_csv.py
"""

input_dataset_path = "/home/data/NDClab/datasets/thrive-dataset/"
data_path = "sourcedata/checked/"
sub_path = "s1_r1/psychopy/"
Expand Down
29 changes: 26 additions & 3 deletions code/behavior/create_valid_behav.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,35 @@
from glob import glob
import os

"""
Script to filter and clean behavioral data, creating valid datasets for analysis.

This script loads the behavioral summary data, removes excluded subjects and
outliers based on reaction time and accuracy criteria, and saves separate CSV
files for social and non-social conditions containing only valid data.

Usage:
python create_valid_behav.py <session_id>

Arguments:
session_id (str): The session identifier (e.g., 's1_r1').
"""


def replace_outliers_with_nan_cols(df, columns_to_check, sd_thresh=3):
"""
Replaces outliers with NaN.
Raises KeyError if a column is missing.
Returns the modified dataframe if successful.
Replaces outliers in specified columns with NaN based on standard deviation threshold.

Args:
df (pd.DataFrame): The input pandas DataFrame.
columns_to_check (list): A list of column names to check for outliers.
sd_thresh (float, optional): The number of standard deviations to use as the threshold. Defaults to 3.

Returns:
pd.DataFrame: The DataFrame with outliers replaced by NaN.

Raises:
KeyError: If a column in columns_to_check is missing from the DataFrame.
"""
# 1. Validation: Check all columns exist first
# If this fails, the error is raised and nothing is returned.
Expand Down
37 changes: 37 additions & 0 deletions code/ddm/SSP_DDM_fitting.R
Original file line number Diff line number Diff line change
@@ -1,3 +1,22 @@
#' SSP-DDM Model Fitting Script
#'
#' This script loads human behavioral data and fits the Shrinking Spotlight Protocol (SSP)
#' Drift Diffusion Model (DDM) parameters to the data using Differential Evolution optimization.
#' It relies on an Rcpp function for model simulation.
#'
#' @description
#' The script defines a fitting function that calls the C++ simulation model,
#' compares simulated data with human data using Chi-Square statistics, and optimizes
#' parameters (a, ter, p, rd, sda) to minimize the difference.
#' It processes subjects in batches as defined by command line arguments.
#'
#' @usage
#' Rscript SSP_DDM_fitting.R <startIdx> <endIdx> <session>
#'
#' @param startIdx The starting index of subjects to process.
#' @param endIdx The ending index of subjects to process.
#' @param session The session identifier (e.g., 's1_r1').

## This script loads in human data and then proceeds through the fitting Script to fit parameters to the data.
## The first section of code is a function that call the rcpp code (separate file), which contains the actual model,
## simulates data based on a set of parameters, and outputs a fit stat. The second section of code is the "main script"
Expand All @@ -16,6 +35,24 @@ endIdx <- args[2]
session <- args[3]
#------------------------------------------------------------------------------
# Fit function for the SSP model

#' Fit Function for the SSP Model
#'
#' Calculates the Chi-Square goodness-of-fit statistic between human data and
#' simulated data generated by the SSP model for a given set of parameters.
#'
#' @param parms Numeric vector of length 5 containing the model parameters:
#' 1. a (boundary separation)
#' 2. ter (non-decision time)
#' 3. p (perceptual strength)
#' 4. rd (shrinking rate)
#' 5. sda (attentional window width)
#' @param nTrials Integer. Number of trials to simulate per condition.
#' @param cutPoints List of length 4. RT bin boundaries for each accuracy/congruency condition.
#' @param humanProps List of length 4. Proportions of human data in each bin.
#' @param HumanTrialCounts Numeric vector of length 2. Total trial counts for congruent and incongruent conditions.
#'
#' @return A numeric value representing the Chi-Square statistic. Returns a very large number if infinite.
fitFunctionSSP <- function(
parms, # generated by DEoptim within specified boundaries when DEoptim is called
nTrials, # how many trials to simulate per condition
Expand Down
10 changes: 10 additions & 0 deletions code/ddm/bulk_scancel.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,15 @@
import subprocess

"""
Script to bulk cancel SLURM jobs within a specified range of job IDs.

This script generates a list of job IDs from a start ID to an end ID and
executes the 'scancel' command for all of them.

Usage:
python bulk_scancel.py
"""

start_id = 2648371
end_id = 2648451

Expand Down
13 changes: 13 additions & 0 deletions code/ddm/fitted.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,19 @@
from datetime import datetime
import sys

"""
Script to aggregate DDM fitting results into a single CSV file.

This script searches for output files from DDM fitting, concatenates them,
and saves a summary file listing the subjects that have been successfully fitted.

Usage:
python fitted.py <session_id>

Arguments:
session_id (str): The session identifier.
"""

session = sys.argv[1]
current_datetime = datetime.now()
formatted_date = current_datetime.strftime("%Y_%m_%d_%H_%M_%S")
Expand Down
13 changes: 13 additions & 0 deletions code/ddm/run_ddm_batch.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,19 @@
from pathlib import Path
import pandas as pd

"""
Script to submit DDM fitting jobs to the SLURM scheduler.

This script identifies subjects that have not yet been fitted for the DDM model
and submits batch jobs for them. It allows for submitting jobs in batches.

Usage:
python run_ddm_batch.py <session_id>

Arguments:
session_id (str): The session identifier.
"""

session = sys.argv[1]
#session = "s2_r1"
data_dir = "/home/data/NDClab/analyses/thrive-theta-ddm/"
Expand Down
Loading
Loading