Exploratory analysis using Arc Institute's Stack foundation model on the Perturb-Sapiens single-cell perturbation dataset.
This notebook demonstrates:
- Loading the Stack-Large model (~2.6GB) from HuggingFace
- Generating cell embeddings for 500K+ cells from Perturb-Sapiens
- UMAP visualization of embeddings colored by cell type and tissue
- Comparing ADSF (cytokine) vs PBS (control) perturbation conditions
- GPU: A100 80GB recommended (tested on HPC)
- Storage: ~10GB temp space for model + data downloads
- Python: 3.10+
pip install arc-stack scanpy huggingface_hub umap-learn matplotlibfrom huggingface_hub import hf_hub_download
from stack.model import load_model_from_checkpoint
# Download model
checkpoint = hf_hub_download("arcinstitute/Stack-Large", "bc_large.ckpt")
genelist = hf_hub_download("arcinstitute/Stack-Large", "basecount_1000per_15000max.pkl")
# Load model
model = load_model_from_checkpoint(checkpoint, device="cuda")
# Generate embeddings
embeddings, indices = model.get_latent_representation(
adata_path="path/to/data.h5ad",
genelist_path=genelist,
batch_size=32
)| Dataset | Cells | Genes | Embedding Dim |
|---|---|---|---|
| ADSF (cytokine) | 513,870 | 15,012 | 1,600 |
| PBS (control) | 460,848 | 15,012 | 1,600 |
Stack embeddings by cell type and tissue:
ADSF vs PBS comparison:
This work uses the Stack foundation model and Perturb-Sapiens dataset developed by Arc Institute. Thank you to the Arc Institute team for making these resources publicly available for research.
- Stack Model - Arc Institute foundation model trained on 150M single-cell samples
- Perturb-Sapiens - Large-scale human perturbation atlas
Analysis code: MIT
Stack model weights: Arc Research Institute Non-Commercial License
Perturb-Sapiens data: CC BY-NC-SA 4.0

