Machine learning pipeline for studying covert-awareness-related fMRI functional connectivity during anesthesia.
Disclosure: This software was developed with AI assistance under human supervision. The public release path is verified with the pinned lockfile and the automated test suite documented below.
This repository contains a Python research pipeline built on the Michigan Human Anesthesia fMRI Dataset (OpenNeuro ds006623).
It does not reproduce the 2018 paper's main task-fMRI analysis. Instead, it trains a separate connectivity-based classifier on the public derivatives.
The ISD feature is a Python approximation based on the linked MATLAB reference. It keeps the same basic idea, efficiency - clustering, but the graph calculations are simpler, so the values will not exactly match the MATLAB/BCT output.
This release requires Python 3.11+. It was verified with Python 3.12.3 on Ubuntu 24.04 WSL.
If python3 --version reports an older interpreter, install Python 3.11+ before creating .venv.
# Clone and set up
git clone https://github.com/byteshiftlabs/covert-awareness-detector.git
cd covert-awareness-detector
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements-lock.txt
# Local verification: same pytest entrypoint used in CI
pytest
# Quick smoke run: downloads only the first 5 supported subjects
./run_quick_training.sh
# Full run: downloads the supported dataset and trains the default model
./run_full_training.shIf you are new to this repository, use this order:
- Create the pinned virtual environment and run
pytestbefore touching the training scripts. - Run
./run_quick_training.shbefore the full dataset path so you can confirm the public smoke workflow on a small supported subject set. - Read
src/train.py,src/data_loader.py, andsrc/features.pyin that order to understand the pipeline from orchestration to raw feature construction. - Use the Testing and Default Model sections below to separate the verified current release contract from follow-up research work.
The steps below describe the current Python classifier, not the original paper's activation-based method.
The pipeline processes fMRI brain scans through several stages to classify consciousness states:
We start with preprocessed brain scans from the Michigan Human Anesthesia fMRI Dataset. Each scan captures brain activity across hundreds of brain regions over time—think of it as a recording of which brain areas are "lighting up" together at each moment. We work with data from people who were scanned while transitioning between conscious and unconscious states under controlled anesthesia.
- Source: XCP-D preprocessed fMRI timeseries from OpenNeuro ds006623
- What we have: Brain activity measurements across 446 regions, recorded over time for 25 people in different states of consciousness
This is where we transform raw brain signals into meaningful patterns that distinguish consciousness from unconsciousness.
Connectivity Matrices: We measure how synchronized different brain regions are with each other. If two regions consistently activate together, they're "connected." This creates a map of functional connections across the entire brain—essentially a snapshot of how the brain's regions communicate.
ISD (Integration-Segregation Difference): This measures the balance between integration (how efficiently information flows across the entire brain network) and segregation (how well distinct brain regions maintain specialized, local processing). ISD quantifies this by computing the difference between these two properties (efficiency minus clustering).
Network Summary Statistics: We also compute basic graph properties from the connectivity matrix—mean degree, strength, and density—to capture the overall topology of the brain network at each state.
Raw connectivity data is massive—nearly 100,000 individual connections between brain regions. Most of this information is redundant or noisy. Principal Component Analysis (PCA) is like finding the "essence" of the data: it identifies the main patterns that explain most of the variation, compressing the data down to the most important features while throwing away the noise. This prevents the model from overfitting to irrelevant details.
We use XGBoost, a powerful machine learning algorithm that builds an "ensemble" of decision trees. Think of it as training many simple classifiers that each learn different aspects of the data, then combining their votes for a final prediction.
Handling Class Imbalance with SMOTE: Our dataset has more unconscious examples than conscious ones (people spend more time sedated). SMOTE (Synthetic Minority Oversampling) creates synthetic examples of the underrepresented class, ensuring the model learns to recognize both states equally well rather than just guessing "unconscious" most of the time.
Leave-One-Subject-Out Cross-Validation: We train the model on data from all subjects except one, then test it on the left-out subject. We repeat this for every subject. This ensures the model learns general patterns about consciousness, not just memorizing specific individuals' brain signatures.
Give the trained model a new brain scan, and it outputs a probability: how likely is this person to be conscious or unconscious? The model draws on the full set of features it learned during training—connectivity patterns compressed via PCA, ISD metrics, and network summary statistics—to make its prediction.
src/
config.py # Dataset paths, subject list, scan parameters
data_loader.py # Load timeseries, motion filtering, connectivity matrices
download_dataset.py # OpenNeuro dataset downloader
features.py # Approximate ISD, graph metrics, connectivity feature extraction
train.py # Full training pipeline: XGBoost + PCA + SMOTE
validate_model.py # Overfitting checks and permutation tests
tests/ # pytest test suite
docs/ # Sphinx documentation
run_full_training.sh # Automated training pipeline (START HERE)
pyproject.toml # Project metadata and dependencies
requirements.txt # Minimum-spec dependencies
requirements-lock.txt # Release-verified lockfile
Public CI runs on GitHub Actions for every push and pull request using Python 3.12 and the pinned requirements-lock.txt environment.
It now proves both public entrypoints:
pytest./run_quick_training.sh
Run the same entrypoint locally:
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements-lock.txt
pytest
# Documented quick smoke path, also exercised in CI
./run_quick_training.shThe reproducible public-release contract is the pinned lockfile path above: create a clean virtual environment, install requirements-lock.txt, run pytest, and then run ./run_quick_training.sh. GitHub Actions verifies that exact sequence.
- Python 3.11+
- Bash-compatible shell environment for the helper scripts
- Roughly 350 MB of free disk space for the quick run
- Roughly 1.8 GB of free disk space for the full download
The default training pipeline (src/train.py / ./run_full_training.sh) trains and validates the XGBoost classifier only (full connectivity + PCA + SMOTE + threshold tuning).
For installation details and longer documentation, see docs/ and docs/installation.rst.
Original Research: Huang, Hudetz, Mashour et al. — University of Michigan
Dataset: OpenNeuro ds006623 (CC0 Public Domain)
MATLAB Reference: Jang et al.
This Implementation: Independent Python ML pipeline by byteshiftlabs, built with AI assistance
@article{huang2018covert,
title = {Brain imaging reveals covert consciousness during behavioral unresponsiveness},
author = {Huang, Zirui and others},
journal = {Scientific Reports},
volume = {8},
pages = {13195},
year = {2018},
doi = {10.1038/s41598-018-31436-z}
}MIT License — see LICENSE.
Dataset: CC0 (Public Domain).
Documentation: docs/