SCALE-ST Visium Multiscale Pipeline

This repository contains a four-stage pipeline for training and evaluating a multiscale histology-to-expression model on paired Visium HD and Visium V2 data.

Maintained by Hur Lab, University of North Dakota, under Dr. Junguk Hur.

Repository Contents

File	Purpose
`run_pipeline.py`	Wrapper that runs the full pipeline stage by stage.
`model_defination.py`	Model architecture and shared helper functions. This is imported by Stages 1-3, not run as a pipeline stage.
`Stage_1_FinalMultiscale_Training.py`	Trains the background-aware HD/V2 model and writes the Stage 2 checkpoint.
`Stage_2_Inference_V2_Correlation.py`	Runs V2 multiscale inference and writes spot/gene correlation outputs.
`Stage_3_GeneLevel_Analysis.py`	Computes additional gene-level metrics such as RMSE and mutual information.
`Stage_4_Plotting_Figures.py`	Generates publication-style figures from Stage 2 outputs.
`requirements.txt`	Python package list for pip-based setup.
`environment.yml`	Conda environment template.
`docs/DATA_LAYOUT.md`	Expected data and model directory layout.
`LICENSE`	MIT License under Hur Lab, University of North Dakota, Dr. Junguk Hur.
`CITATION.cff`	Citation metadata for GitHub and citation managers.
`AUTHORS.md`	Maintainer and authorship information.
`examples/`	Example commands for full-pipeline and inference-only runs.
`input/`	Placeholder for local datasets and UNI weights. Contents are ignored by Git.
`output/`	Placeholder for generated checkpoints, CSVs, and figures. Contents are ignored by Git.

Important External Files

The stage scripts import model_defination.py for the neural network architecture, model constants, masks, checkpoint loading, and image patch helpers. It is not a separate stage to run. This repository includes model_defination.py, and the stage scripts use the local copy by default.

If you want to use a different model_defination.py, pass the folder that contains it:

--training-script-dir /path/to/folder/containing/model_defination.py

The pipeline expects all raw inputs under input/. UNI weights should be placed at:

<root>/input/UNI/pytorch_model.bin

Large datasets, model weights, checkpoints, and generated outputs should not be committed to GitHub. See .gitignore.

Setup

Create an environment with Python 3.10 or newer.

Using conda:

conda env create -f environment.yml
conda activate he2hd-visium

Using pip:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

For GPU training, install the PyTorch build that matches your CUDA version if the default package resolver does not select the right build.

Data Layout

By default, the project root is:

/home/sayed.asaduzzaman/Project-Gene_ImgtoExp

You can override it with:

--root /path/to/Project-Gene_ImgtoExp

The expected layout is documented in docs/DATA_LAYOUT.md.

The wrapper creates input/ and output/ automatically if they do not exist. Put datasets and weights in input/; generated checkpoints, CSV files, and figures go to output/.

Run the Full Pipeline

Preview the commands without running them:

python run_pipeline.py --dry-run

Run training, inference, analysis, and plotting:

python run_pipeline.py \
  --root /home/sayed.asaduzzaman/Project-Gene_ImgtoExp \
  --input-dir /home/sayed.asaduzzaman/Project-Gene_ImgtoExp/input \
  --output-dir /home/sayed.asaduzzaman/Project-Gene_ImgtoExp/output \
  --train-launcher torchrun \
  --nproc-per-node 3 \
  --cuda-visible-devices 0,2,3

The same command is available as examples/run_full_pipeline.sh.

Skip training and use an existing checkpoint:

python run_pipeline.py \
  --skip-training \
  --ckpt /home/sayed.asaduzzaman/Project-Gene_ImgtoExp/output/HE2HD_HDV2_BG_AWARE_TRAIN_ALL_PAIRED/stage2_all_genes/best_model.pt

The same pattern is available as examples/run_inference_only.sh.

Run only selected stages:

python run_pipeline.py --only inference analysis plotting

or:

python run_pipeline.py --start-at inference --stop-after plotting

Common Parameters

Control inference sample count and multiscale bins:

python run_pipeline.py \
  --skip-training \
  --n_spots 5000 \
  --scales 16 8 2 \
  --mi_bins 20

Control training size and schedule:

python run_pipeline.py \
  --stage1-epochs 12 \
  --stage2-epochs 20 \
  --stage1-batch-size 12 \
  --stage2-batch-size 8 \
  --max-hd-16-samples 100000 \
  --max-hd-8-samples 400000 \
  --max-hd-2-samples 6400000 \
  --max-v2-samples 5000

Control patch selection and blank suppression behavior:

python run_pipeline.py \
  --skip-training \
  --min_overlap 0.35 \
  --support_scale 2.5 \
  --grid_shift_fraction 0.0 \
  --use_gate_weight \
  --disable_graph_smooth

Default Outputs

With the default root, outputs are written to:

<root>/output/HE2HD_HDV2_BG_AWARE_TRAIN_ALL_PAIRED/stage2_all_genes/best_model.pt
<root>/output/V2_STRICTBLANK_V3_ALL/
<root>/output/V2_STRICTBLANK_V3_ALL/gene_level_metrics/
<root>/output/V2_STRICTBLANK_V3_ALL/figures/

Reproducibility Notes

The wrapper sets shared paths through HE2HD_ROOT, HE2HD_INPUT_DIR, HE2HD_OUTPUT_DIR, and HE2HD_TRAINING_SCRIPT_DIR.
Stage 1 uses fixed random seeds internally and supports environment overrides exposed by run_pipeline.py.
The exact model architecture and helper functions are provided by model_defination.py; keep this file versioned with the stage scripts.
Keep a record of the command printed by run_pipeline.py, the git commit hash, and package versions for each reported experiment.

License

This repository is released under the MIT License. Copyright belongs to Hur Lab, University of North Dakota, Dr. Junguk Hur. See LICENSE.

Minimal Sanity Check

Before launching an expensive run:

python run_pipeline.py --dry-run --skip-training

Then verify that the printed paths point to the intended root, checkpoint, V2 data, and output folders.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SCALE-ST Visium Multiscale Pipeline

Repository Contents

Important External Files

Setup

Data Layout

Run the Full Pipeline

Common Parameters

Default Outputs

Reproducibility Notes

License

Minimal Sanity Check

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
scripts_all		scripts_all
.gitignore		.gitignore
AUTHORS.md		AUTHORS.md
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt
scripts		scripts

Folders and files

Latest commit

History

Repository files navigation

SCALE-ST Visium Multiscale Pipeline

Repository Contents

Important External Files

Setup

Data Layout

Run the Full Pipeline

Common Parameters

Default Outputs

Reproducibility Notes

License

Minimal Sanity Check

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages