NEST: Neural EEG Sequence Transducer

Open-Vocabulary EEG-to-Text Decoding with Deep Learning

Translating EEG brain signals into natural language text

Quick Start | Architecture | Training | Paper

Overview

NEST decodes EEG brain signals recorded during reading into natural language text. It uses pre-processed frequency-domain EEG features from the ZuCo corpus and a Transformer + BART architecture to achieve open-vocabulary decoding.

Input:  Word-level EEG (105 channels × 8 frequency bands = 840 features per word)
        Brain activity recorded while a subject reads each word

Output: "presents a good case while failing to provide a reason..."
        Full sentence decoded from EEG alone

Applications:

Silent speech interfaces for communication-impaired individuals
Brain-computer interface (BCI) research
Cognitive neuroscience (reading comprehension, semantic processing)
Assistive technology for ALS, locked-in syndrome

Dataset

ZuCo (Zurich Cognitive Language Processing Corpus):

11-12 subjects, 3 reading tasks (SR, NR, TSR)
~12,884 sentence-subject pairs
EEG: 105 channels, pre-processed into 8 frequency bands
Feature format: 840-dim vector per fixated word (105 ch × 8 bands)

Download from OSF or use the provided download script.

Architecture

NEST v2 uses frequency-domain EEG features — not raw time-series — following the approach of Wang et al. (2022, ACL):

Word-level EEG features: (batch, max_words, 840)
  105 channels × 8 frequency bands (θ1/θ2, α1/α2, β1/β2, γ1/γ2)
         |
         v
EEG Projection: Linear(840 → 768) + LayerNorm + GELU
         |
         v
Positional Encoding (sinusoidal)
         |
         v
Transformer Encoder
  6 layers, d_model=768, 8 heads, pre-norm, GELU FFN
         |
         v
BART Decoder (facebook/bart-base)
  Cross-attention to EEG encoder output
  First 4 layers frozen, last 2 + cross-attention fine-tuned
         |
         v
Text output (beam search, beam_size=4)

Component	Config
EEG Input dim	840 (105ch × 8 bands)
d_model	768
Encoder layers	6
Attention heads	8
BART model	facebook/bart-base
Trainable params	~30M

Quick Start

# 1. Clone and setup
git clone https://github.com/wazder/NEST.git
cd NEST
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# 2. Verify dataset
python src/data/zuco_pickle_dataset.py ZuCo_Dataset/ZuCo

# 3. Quick sanity check (2 epochs, 50 samples)
python scripts/train_nest_v2.py --quick-test --model ctc

# 4. Full training — CTC baseline (fast, CPU/GPU)
python scripts/train_nest_v2.py --model ctc --epochs 200

# 5. Full training — BART (best quality, requires GPU)
python scripts/train_nest_v2.py --model bart --epochs 200 --fp16

Training

Local (GPU recommended)

python scripts/train_nest_v2.py \
    --model bart \
    --epochs 200 \
    --batch-size 16 \
    --fp16 \
    --d-model 768 \
    --num-layers 6 \
    --tasks task1-SR task2-NR task3-TSR

Cloud (Google Colab / Lambda / Vast.ai)

See notebooks/NEST_CloudTraining.ipynb for a ready-to-run Colab notebook with Google Drive integration.

Recommended cloud config: A100 40GB, ~8-12 hours for 200 epochs.

Subject-Independent Evaluation

By default, NEST trains on 8 subjects and evaluates on 2 held-out subjects (ZMG, ZPH) that are never seen during training or validation. This is the strictest and most realistic evaluation protocol.

Evaluation

# Evaluate a trained model
python scripts/train_nest_v2.py \
    --resume results/nest_v2_bart_*/best_model.pt \
    --epochs 0  # eval only

Results are saved to results/nest_v2_*/results.json:

{
  "model": "bart",
  "evaluation": "subject-independent",
  "test_wer": 0.XXX,
  "test_cer": 0.XXX,
  "note": "Real experimental results, honest evaluation"
}

Upload to Hugging Face

python scripts/upload_to_huggingface.py \
    --model-path results/best_model.pt \
    --repo-id your-username/NEST-EEG-to-Text \
    --token $HF_TOKEN

Project Structure

NEST/
├── src/
│   ├── data/
│   │   ├── zuco_pickle_dataset.py   # ZuCo dataset loader (pickle-based)
│   │   └── zuco_dataset.py          # Legacy MATLAB loader
│   ├── models/
│   │   ├── nest_v2.py               # NEST v2 (primary: CTC + BART)
│   │   ├── nest_bart.py             # NEST-Conformer-BART (raw EEG)
│   │   └── nest.py                  # Legacy NEST models
│   └── training/
│       └── trainer.py               # Training utilities
├── scripts/
│   ├── train_nest_v2.py             # Primary training script
│   ├── train_nest_bart.py           # BART training (raw EEG version)
│   └── upload_to_huggingface.py     # HF Hub upload
├── notebooks/
│   └── NEST_CloudTraining.ipynb     # Google Colab notebook
├── papers/
│   └── NEST_manuscript.md           # Research paper draft
├── configs/                         # YAML model configurations
├── results/                         # Training outputs
└── ZuCo_Dataset/                    # Dataset (not in git)

Citation

@article{nest2026,
  title     = {NEST: Open-Vocabulary EEG-to-Text Decoding via Sequence Transduction},
  author    = {},
  journal   = {arXiv preprint},
  year      = {2026},
  url       = {https://github.com/wazder/NEST}
}

License

MIT — see LICENSE.

Contributing

See CONTRIBUTING.md.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.github		.github
.phase-work		.phase-work
COPILOT_REVIEW		COPILOT_REVIEW
configs		configs
docs		docs
examples		examples
experiments/user_study		experiments/user_study
notebooks		notebooks
papers		papers
scripts		scripts
src		src
tests		tests
website		website
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CITATION.md		CITATION.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
M2_OPTIMIZATION_INFO.sh		M2_OPTIMIZATION_INFO.sh
Makefile		Makefile
NEST-web.pen		NEST-web.pen
PHASE1_SUMMARY.md		PHASE1_SUMMARY.md
PHASE3_SUMMARY.md		PHASE3_SUMMARY.md
README.md		README.md
RELEASE_NOTES.md		RELEASE_NOTES.md
ROADMAP.md		ROADMAP.md
STRUCTURE.md		STRUCTURE.md
Screenshot 2026-02-20 at 18.04.52.png		Screenshot 2026-02-20 at 18.04.52.png
TECHNICAL_AUDIT_REPORT.md		TECHNICAL_AUDIT_REPORT.md
VERSION		VERSION
check_training.py		check_training.py
detailed_analysis.py		detailed_analysis.py
evaluate_results.py		evaluate_results.py
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.cfg		setup.cfg
start_full_training.sh		start_full_training.sh
test_m2_gpu.py		test_m2_gpu.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NEST: Neural EEG Sequence Transducer

Overview

Dataset

Architecture

Quick Start

Training

Local (GPU recommended)

Cloud (Google Colab / Lambda / Vast.ai)

Subject-Independent Evaluation

Evaluation

Upload to Hugging Face

Project Structure

Citation

License

Contributing

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NEST: Neural EEG Sequence Transducer

Overview

Dataset

Architecture

Quick Start

Training

Local (GPU recommended)

Cloud (Google Colab / Lambda / Vast.ai)

Subject-Independent Evaluation

Evaluation

Upload to Hugging Face

Project Structure

Citation

License

Contributing

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages