Skip to content

Latest commit

 

History

History
83 lines (58 loc) · 3.25 KB

File metadata and controls

83 lines (58 loc) · 3.25 KB

NuCLR

NuCLR Logo

Official codebase for NuCLR as presented in "Know Thyself by Knowing Others: Learning Neuron Identity from Population Context"

[ Project Page ] [ Paper ] [ Poster ] [ OpenReview ] [ Tweet Thread ]

NuCLR Architecture Diagram

Usage

This project has been developed on Python3.10, with environment management using uv. To setup the environment, do:

uv venv venv -p 3.10
source venv/bin/activate
uv pip install -r requirements.txt

Follow the steps below to train and evaluate your own NuCLR model.

1. Preprocessing Data

  1. To preprocess datasets, please follow the steps in preprocess/README.md.

  2. Download metadata (csv files) about neurons in all four datasets from this link and unzip into ./neuron_metadata.

2. Training

To train on Electrophysiology data (IBL, Allen, Steinmetz et. al.):

python train.py --config-name train_ephys data=<data-config> num_epochs=<num_epochs>

To train on Calcium Imaging data (Bugeon et. al.):

python train.py --config-name train_ca data=<data-config> num_epochs=<num_epochs>
  • We use Hydra for managing configs.
  • Options for <data-config> can be found in configs/data/*.yaml. E.g. data=ibl_bwm_probes_dev
  • Set num_epochs such that the total number of training steps is roughly 50,000.
  • The checkpoints would be stored in ../ckpt by default.
  • Other available configurations can be found in configs/train_ephys.yaml and configs/train_ca.yaml

3. Evaluation

A final forward pass over the entire data is needed to get the embeddings from a particular checkpoint. The training script would print a "run_id" for the corresponding run. Use this to run the following command:

bash utils/forward_all_epochs.sh <run_id> <data-config-name> [batch_size] [epoch_stride]

This would store the embeddings in ../embs/<run_id>/embs_epoch_<num>.pt. In most cases, you should use the "transductive" versions of each dataset while gathering these embeddings, since we want to compute embeddings for all neurons here.

Once you have the embeddings, you can follow the instructions in eval_scripts/README.md to run our evaluation scipts.

Citation

If you find this repository useful in your research, please consider giving a star ⭐ and a citation

@inproceedings{
    arora2025nuclr,
    title={Know Thyself by Knowing Others: Learning Neuron Identity from Population Context},
    author={Vinam Arora and Divyansha Lachi and Ian J Knight and Mehdi Azabou and Blake Richards and Cole Hurwitz and Joshua H Siegle and Eva L Dyer},
    booktitle={Thirty-ninth Conference on Neural Information Processing Systems},
    year={2025},
    url={https://arxiv.org/abs/2512.01199}
}