Skip to content

hurlab/SCALE-ST

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SCALE-ST Visium Multiscale Pipeline

This repository contains a four-stage pipeline for training and evaluating a multiscale histology-to-expression model on paired Visium HD and Visium V2 data.

Maintained by Hur Lab, University of North Dakota, under Dr. Junguk Hur.

Repository Contents

File Purpose
run_pipeline.py Wrapper that runs the full pipeline stage by stage.
model_defination.py Model architecture and shared helper functions. This is imported by Stages 1-3, not run as a pipeline stage.
Stage_1_FinalMultiscale_Training.py Trains the background-aware HD/V2 model and writes the Stage 2 checkpoint.
Stage_2_Inference_V2_Correlation.py Runs V2 multiscale inference and writes spot/gene correlation outputs.
Stage_3_GeneLevel_Analysis.py Computes additional gene-level metrics such as RMSE and mutual information.
Stage_4_Plotting_Figures.py Generates publication-style figures from Stage 2 outputs.
requirements.txt Python package list for pip-based setup.
environment.yml Conda environment template.
docs/DATA_LAYOUT.md Expected data and model directory layout.
LICENSE MIT License under Hur Lab, University of North Dakota, Dr. Junguk Hur.
CITATION.cff Citation metadata for GitHub and citation managers.
AUTHORS.md Maintainer and authorship information.
examples/ Example commands for full-pipeline and inference-only runs.
input/ Placeholder for local datasets and UNI weights. Contents are ignored by Git.
output/ Placeholder for generated checkpoints, CSVs, and figures. Contents are ignored by Git.

Important External Files

The stage scripts import model_defination.py for the neural network architecture, model constants, masks, checkpoint loading, and image patch helpers. It is not a separate stage to run. This repository includes model_defination.py, and the stage scripts use the local copy by default.

If you want to use a different model_defination.py, pass the folder that contains it:

--training-script-dir /path/to/folder/containing/model_defination.py

The pipeline expects all raw inputs under input/. UNI weights should be placed at:

<root>/input/UNI/pytorch_model.bin

Large datasets, model weights, checkpoints, and generated outputs should not be committed to GitHub. See .gitignore.

Setup

Create an environment with Python 3.10 or newer.

Using conda:

conda env create -f environment.yml
conda activate he2hd-visium

Using pip:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

For GPU training, install the PyTorch build that matches your CUDA version if the default package resolver does not select the right build.

Data Layout

By default, the project root is:

/home/sayed.asaduzzaman/Project-Gene_ImgtoExp

You can override it with:

--root /path/to/Project-Gene_ImgtoExp

The expected layout is documented in docs/DATA_LAYOUT.md.

The wrapper creates input/ and output/ automatically if they do not exist. Put datasets and weights in input/; generated checkpoints, CSV files, and figures go to output/.

Run the Full Pipeline

Preview the commands without running them:

python run_pipeline.py --dry-run

Run training, inference, analysis, and plotting:

python run_pipeline.py \
  --root /home/sayed.asaduzzaman/Project-Gene_ImgtoExp \
  --input-dir /home/sayed.asaduzzaman/Project-Gene_ImgtoExp/input \
  --output-dir /home/sayed.asaduzzaman/Project-Gene_ImgtoExp/output \
  --train-launcher torchrun \
  --nproc-per-node 3 \
  --cuda-visible-devices 0,2,3

The same command is available as examples/run_full_pipeline.sh.

Skip training and use an existing checkpoint:

python run_pipeline.py \
  --skip-training \
  --ckpt /home/sayed.asaduzzaman/Project-Gene_ImgtoExp/output/HE2HD_HDV2_BG_AWARE_TRAIN_ALL_PAIRED/stage2_all_genes/best_model.pt

The same pattern is available as examples/run_inference_only.sh.

Run only selected stages:

python run_pipeline.py --only inference analysis plotting

or:

python run_pipeline.py --start-at inference --stop-after plotting

Common Parameters

Control inference sample count and multiscale bins:

python run_pipeline.py \
  --skip-training \
  --n_spots 5000 \
  --scales 16 8 2 \
  --mi_bins 20

Control training size and schedule:

python run_pipeline.py \
  --stage1-epochs 12 \
  --stage2-epochs 20 \
  --stage1-batch-size 12 \
  --stage2-batch-size 8 \
  --max-hd-16-samples 100000 \
  --max-hd-8-samples 400000 \
  --max-hd-2-samples 6400000 \
  --max-v2-samples 5000

Control patch selection and blank suppression behavior:

python run_pipeline.py \
  --skip-training \
  --min_overlap 0.35 \
  --support_scale 2.5 \
  --grid_shift_fraction 0.0 \
  --use_gate_weight \
  --disable_graph_smooth

Default Outputs

With the default root, outputs are written to:

<root>/output/HE2HD_HDV2_BG_AWARE_TRAIN_ALL_PAIRED/stage2_all_genes/best_model.pt
<root>/output/V2_STRICTBLANK_V3_ALL/
<root>/output/V2_STRICTBLANK_V3_ALL/gene_level_metrics/
<root>/output/V2_STRICTBLANK_V3_ALL/figures/

Reproducibility Notes

  • The wrapper sets shared paths through HE2HD_ROOT, HE2HD_INPUT_DIR, HE2HD_OUTPUT_DIR, and HE2HD_TRAINING_SCRIPT_DIR.
  • Stage 1 uses fixed random seeds internally and supports environment overrides exposed by run_pipeline.py.
  • The exact model architecture and helper functions are provided by model_defination.py; keep this file versioned with the stage scripts.
  • Keep a record of the command printed by run_pipeline.py, the git commit hash, and package versions for each reported experiment.

License

This repository is released under the MIT License. Copyright belongs to Hur Lab, University of North Dakota, Dr. Junguk Hur. See LICENSE.

Minimal Sanity Check

Before launching an expensive run:

python run_pipeline.py --dry-run --skip-training

Then verify that the printed paths point to the intended root, checkpoint, V2 data, and output folders.

About

Multi-Scale Learning for Histology-Based Prediction of Spatial Gene Expression

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages