This repository contains script(s) to prepare and visualize per-cell gene expression data from Xenium Ranger output overlaid on co-registered H&E whole slide images (WSI) in the FUSION 2.0 pipeline.
It is unlikely this codebase will actually be used in a future plug-in, but it is being maintained to record the required files, where they come from, and how they are processed to create data structured for visualization.
The goal is to visualize per-cell gene expression overlaid on the co-registered H&E WSI, providing a tissue-level view of spatial expression patterns for user-selected genes — analogous to the cell-level aggregated expression view in Xenium Explorer.
Both files come directly from Xenium Ranger output:
| File | Description |
|---|---|
cell_feature_matrix.h5 |
Cell × gene expression matrix (raw UMI counts) |
cells.parquet |
Cell centroid coordinates in Xenium space (x, y) |
- CPK normalization — counts per 10,000 per cell
- log1p transformation — applied to normalized counts
- Percentile clipping — per gene, clipped to 1st–99th percentile to prevent outlier cells from collapsing the color range
Normalized, clipped expression values are mapped to the Inferno color scale (black = lowest expression, yellow/white = highest), consistent with the Xenium Explorer aesthetic, rendered as an overlay on the H&E WSI with adjustable opacity.
For the initial implementation, users select a single gene at a time for visualization. A planned future feature will support module scores across up to 20 genes, where per-cell expression values are aggregated into a single composite score before color mapping.
The test dataset was the 5K Prime Gene Panel + 100, making a full cell × gene CSV impractical — the dense matrix exceeds memory limits even on high-memory compute nodes. Two approaches are supported:
- Curated gene list — a hardcoded set of biologically relevant genes (see below)
- Top N genes by coefficient of variation (CV) — recommended for automated selection; high-CV genes capture the most spatially variable expression patterns across the tissue. A reasonable default is top 100–500 genes by CV computed after CPK normalization, with a configurable maximum.
Cell centroids from cells.parquet are in native Xenium space. The existing tf_mat transformation matrix from the FUSION 2.0 co-registration pipeline is applied at render time to map cell positions into WSI space. This is handled by the FUSION 2.0 pipeline and not by the scripts in this repository.
A test CSV was generated from Xenium Ranger output for sample D450 (pediatric kidney) using a curated set of 13 genes:
Kidney cell type markers: SLC12A1, AQP1, AQP2, PECAM1, UMOD, LRP2, CUBN, VCAM1
Housekeeping genes: HPRT1, SDHA, TBP, YWHAZ, PGK1
The output CSV is structured as cells × genes with columns cell_id, x_centroid, y_centroid, followed by one column per gene containing normalized, clipped expression values.
.
├── data/ # Input and output data files (not tracked by git)
└── scripts/
└── xenium-output-to-gene-expr-csv.R # Preprocessing script
Rscript scripts/xenium-output-to-gene-expr-csv.R- R 4.x
hdf5r,arrow,Matrix,dplyr