DRIFT - Diffusion-based Representation Integration for Foundation models in spatial Transcriptomics

DRIFT is a scalable diffusion framework that denoises expression profiles and integrates the spatial topology of ST data into existing pretrained scRNA-seq and ST foundation models without additional retraining. Foundation models that do not explicitly model spatial information benefit from both denoising and spatial integration, while methods that do so leverage DRIFT's denoised output. DRIFT constructs a spatial adjacency graph among tissue spots and applies a heat-kernel diffusion process that propagates gene-expression signals across local neighborhoods while preserving tissue boundaries. This produces spatially coherent yet biologically meaningful representations that can be directly embedded into pretrained foundation models without retraining, making our approach much more computationally scalable and accessible.

Requirements

To run the DRIFT step, you require the following libraries:

scanpy >= 1.9.1

numpy < 2.0.0

scipy

networkx

Python Optimal Transport >= 0.9.1

We suggest generating an environment (such as conda) to run the code. You can create the required conda environment directly by running the following lines sequentially in the shell.

conda create --name <env_name> python==3.11
conda activate <env_name>
pip install scanpy
pip install POT
pip install "numpy<2"
pip install scipy
pip install networkx
pip install pycpd

Foundation Model Requirements

You will require additional libraries dependent on the foundation model you aim to use. Please refer to their code for any additional requirements necessary.

Run Code

To see how to run the code, please check our notebook tutorials.

run_drift.ipynb for running DRIFT to obtain diffused inputs.

run_annotation.ipynb for running annotation code.

run_alignment.ipynb for running alignment code.

For clustering, you need the embeddings from your foundation model. The embeddings can then be used in any clustering algorithm. For our work we used the mclust library in R.

Data Availability (Dataset names reference the manuscript)

Dataset MERMB and MERMBA are available at the https://cellxgene.cziscience.com website. Dataset MERHH is available at the https://cells.ucsc.edu website. Datasets 10xHPC and MERMPH are available on a site hosted by Yuan et al. at http://sdmbench.drai.cn/. Datasets 10xHSI and 10xHOC are available at Chen et al.'s tutorial website: https://guangyuwanglab2021.github.io/Loki/. Dataset StereoME is available at https://db.cngb.org/stomics/mosta/.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
src		src
DRIFT_framework.jpg		DRIFT_framework.jpg
DRIFT_framework.png		DRIFT_framework.png
README.md		README.md
run_DRIFT.ipynb		run_DRIFT.ipynb
run_alignment.ipynb		run_alignment.ipynb
run_annotation.ipynb		run_annotation.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DRIFT - Diffusion-based Representation Integration for Foundation models in spatial Transcriptomics

Requirements

Foundation Model Requirements

Run Code

Data Availability (Dataset names reference the manuscript)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DRIFT - Diffusion-based Representation Integration for Foundation models in spatial Transcriptomics

Requirements

Foundation Model Requirements

Run Code

Data Availability (Dataset names reference the manuscript)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages