Skip to content

SarderLab/prop-fusion-gene-expression-overlay

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

FUSION 2.0 — Cell-Level Gene Expression Overlay

This repository contains script(s) to prepare and visualize per-cell gene expression data from Xenium Ranger output overlaid on co-registered H&E whole slide images (WSI) in the FUSION 2.0 pipeline.

It is unlikely this codebase will actually be used in a future plug-in, but it is being maintained to record the required files, where they come from, and how they are processed to create data structured for visualization.

Overview

The goal is to visualize per-cell gene expression overlaid on the co-registered H&E WSI, providing a tissue-level view of spatial expression patterns for user-selected genes — analogous to the cell-level aggregated expression view in Xenium Explorer.

Input Files

Both files come directly from Xenium Ranger output:

File Description
cell_feature_matrix.h5 Cell × gene expression matrix (raw UMI counts)
cells.parquet Cell centroid coordinates in Xenium space (x, y)

Expression Processing

  1. CPK normalization — counts per 10,000 per cell
  2. log1p transformation — applied to normalized counts
  3. Percentile clipping — per gene, clipped to 1st–99th percentile to prevent outlier cells from collapsing the color range

Color Mapping

Normalized, clipped expression values are mapped to the Inferno color scale (black = lowest expression, yellow/white = highest), consistent with the Xenium Explorer aesthetic, rendered as an overlay on the H&E WSI with adjustable opacity.

Gene Selection

For the initial implementation, users select a single gene at a time for visualization. A planned future feature will support module scores across up to 20 genes, where per-cell expression values are aggregated into a single composite score before color mapping.

Scalability

The test dataset was the 5K Prime Gene Panel + 100, making a full cell × gene CSV impractical — the dense matrix exceeds memory limits even on high-memory compute nodes. Two approaches are supported:

  • Curated gene list — a hardcoded set of biologically relevant genes (see below)
  • Top N genes by coefficient of variation (CV) — recommended for automated selection; high-CV genes capture the most spatially variable expression patterns across the tissue. A reasonable default is top 100–500 genes by CV computed after CPK normalization, with a configurable maximum.

Coordinate Transformation

Cell centroids from cells.parquet are in native Xenium space. The existing tf_mat transformation matrix from the FUSION 2.0 co-registration pipeline is applied at render time to map cell positions into WSI space. This is handled by the FUSION 2.0 pipeline and not by the scripts in this repository.

Test File

A test CSV was generated from Xenium Ranger output for sample D450 (pediatric kidney) using a curated set of 13 genes:

Kidney cell type markers: SLC12A1, AQP1, AQP2, PECAM1, UMOD, LRP2, CUBN, VCAM1

Housekeeping genes: HPRT1, SDHA, TBP, YWHAZ, PGK1

The output CSV is structured as cells × genes with columns cell_id, x_centroid, y_centroid, followed by one column per gene containing normalized, clipped expression values.

Repository Structure

.
├── data/               # Input and output data files (not tracked by git)
└── scripts/
    └── xenium-output-to-gene-expr-csv.R   # Preprocessing script

Usage

Rscript scripts/xenium-output-to-gene-expr-csv.R

Requirements

  • R 4.x
  • hdf5r, arrow, Matrix, dplyr

About

Proposal to prepare and visualize per-cell gene expression data from Xenium Ranger output overlaid on co-registered H&E whole slide images (WSI) in the FUSION 2.0 pipeline.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages