Python bindings for ACTIONet (Action-based Cell-Type Identification and Organism Niche Extraction Tool), a single-cell multi-resolution data analysis toolkit.
This package wraps the C++ backend libactionet without modifications, providing a Python interface with AnnData as the core data container, designed to integrate seamlessly with the scanpy ecosystem.
- Full C++ backend: Leverages the high-performance
libactionetC++ library - OpenMP parallelism: Multi-threaded C++ execution via OpenMP for all compute-intensive operations
- AnnData integration: Native support for AnnData objects used throughout the Python single-cell ecosystem
- Scanpy compatibility: Works alongside standard scanpy workflows
- Multi-resolution analysis: ACTION decomposition for multi-scale archetype discovery
- Network-based analysis: Build and analyze cell-cell interaction networks
- Out-of-core computation: Backed (HDF5-streamed) operator paths for datasets larger than RAM
- Cross-platform: Supports macOS (Intel & Apple Silicon) and Linux
A C++17 compiler, CMake (≥ 3.19), BLAS/LAPACK, HDF5 (C library), and OpenMP are required. OpenMP is a hard build requirement — the build will fail without it.
macOS:
# Install Xcode Command Line Tools
xcode-select --install
# Install dependencies via Homebrew
brew install cmake openblas lapack libomp hdf5Linux (Debian/Ubuntu):
sudo apt-get update
sudo apt-get install build-essential cmake libopenblas-dev liblapack-dev libhdf5-dev libgomp1Conda (rootless / HPC):
conda install -c conda-forge cmake compilers openblas lapack hdf5# Clone repository
git clone https://github.com/KellisLab/actionet-python.git
cd actionet-python
# Initialize libactionet submodule
git submodule update --init --recursive
# Install in development mode
pip install -e .
# Or build and install
pip install .Intel MKL provides highly optimized BLAS/LAPACK implementations and can significantly improve performance, especially for large matrix operations. ACTIONet will automatically detect and use MKL if available.
# Create a new environment with MKL
conda create -n actionet-mkl python=3.12
conda activate actionet-mkl
# Install Intel MKL and build dependencies
conda install -c conda-forge cmake compilers numpy scipy mkl mkl-include
# Clone and build ACTIONet
git clone https://github.com/KellisLab/actionet-python.git
cd actionet-python
git submodule update --init --recursive
# Build with MKL (automatically detected)
pip install -e .For maximum performance, use Intel's oneAPI toolkit with the Intel C++ compiler:
# Download and install Intel oneAPI Base Toolkit
# https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html
# Source the Intel environment
source /opt/intel/oneapi/setvars.sh
# Set environment variables for Intel MKL and compiler
export CC=icx
export CXX=icpx
export MKLROOT=/opt/intel/oneapi/mkl/latest
# Clone and build
git clone https://github.com/KellisLab/actionet-python.git
cd actionet-python
git submodule update --init --recursive
# Build with Intel compiler and MKL
pip install -e .# Install Intel MKL from package manager (Ubuntu/Debian)
wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
sudo apt update
sudo apt install intel-oneapi-mkl intel-oneapi-mkl-devel
# Source MKL environment
source /opt/intel/oneapi/mkl/latest/env/vars.sh
# Build ACTIONet
cd actionet-python
pip install -e .After installation, verify that MKL is being used:
import numpy as np
print(np.__config__.show()) # Should show MKL in BLAS/LAPACK infoFor optimal performance, set these environment variables:
# Use all available cores
export MKL_NUM_THREADS=$(nproc)
# For GNU OpenMP runtime (conda default)
export MKL_THREADING_LAYER=GNU
export OMP_NUM_THREADS=$(nproc)
# For Intel OpenMP runtime (Intel compiler)
export MKL_THREADING_LAYER=INTEL
export OMP_NUM_THREADS=$(nproc)
# Disable MKL's internal threading (if using external parallelization)
export MKL_NUM_THREADS=1With Intel MKL, you should see:
- 2-4x faster matrix operations (SVD, matrix multiplication)
- Especially beneficial for large datasets (>100K cells)
- Better multi-threading efficiency
- Lower memory usage for some operations
MKL not detected:
- Ensure
MKLROOTenvironment variable is set - Verify
mkl-includeis installed (conda) or headers are in/opt/intel/oneapi/mkl/latest/include
Mixed OpenMP runtime warnings:
- Set
MKL_THREADING_LAYER=GNUfor conda environments - Set
MKL_THREADING_LAYER=INTELfor Intel oneAPI builds - OpenMP is required and cannot be disabled; ensure a compatible runtime is installed
Link errors with Intel compiler:
- Ensure
setvars.shis sourced before building - Try:
export LDFLAGS="-L${MKLROOT}/lib/intel64"
import actionet as an
import anndata as ad
# Load data
adata = ad.read_h5ad("your_data.h5ad")
# Preprocess (if not already done)
an.normalize_anndata(adata)
# ACTIONet pipeline
an.reduce_kernel(adata, n_components=50) # Kernel reduction
an.run_action(adata, k_min=2, k_max=30) # ACTION decomposition
an.build_network(adata) # Build cell network
an.layout_network(adata) # UMAP layout
# Feature specificity
an.compute_feature_specificity(adata, labels='assigned_archetype')
# Or run the full pipeline in one call
an.run_actionet(adata, k_min=2, k_max=30)ACTIONet includes a lets-plot based QC violin helper for per-cell metrics stored in adata.obs.
import actionet as an
an.plot_qc_violin(
adata,
keys=["n_counts", "n_genes"],
groupby="CellLabel",
log_trans="log10",
title="QC metrics",
)an.normalize_anndata(adata, target_sum=1e4, pseudocount=0.5, log_base=None)Normalize and log-transform count data. Stores normalized counts in adata.layers['logcounts'].
an.filter_anndata(adata, min_genes=200, min_cells=3)Filter cells and genes by minimum thresholds.
an.reduce_kernel(adata, n_components=50, layer=None, key_added='action')Compute reduced kernel matrix using SVD. Automatically selects the optimal SVD algorithm based on matrix properties (sparse vs dense, size, sparsity) with negligible overhead (~1-2 microseconds).
New in OOM v1:
- Backed sparse AnnData
.Xis executed through an out-of-memory operator path (Halko by default; IRLB, PRIMME, and Feng also supported). - You can reuse an external SVD via
precomputed_svd:an.reduce_kernel(adata, precomputed_svd=an.run_svd(adata.X, n_components=50)) - Explicit helper for this workflow:
an.reduce_kernel_from_svd(...). - Backed SVD/reduction paths expose
backed_n_threads:an.run_svd(adata_backed, n_components=50, backed_n_threads=8)an.reduce_kernel(adata_backed, n_components=50, backed_n_threads=8)(0= auto,1= serial debug path).
Available algorithms:
- IRLB (default for in-memory sparse): Implicitly Restarted Lanczos Bidiagonalization
- Halko (default for dense and backed): Randomized SVD (fastest for dense; predictable I/O cost for backed)
- PRIMME (auto-selected for large in-memory sparse): Memory-efficient for huge sparse matrices
- Feng: Alternative randomized method
an.run_action(adata, k_min=2, k_max=30, reduction_key='action')Multi-resolution archetypal analysis to identify cell states.
an.build_network(adata, obsm_key='H_stacked',
algorithm='k*nn', distance_metric='jsd')Build cell-cell interaction network from archetype footprints.
an.compute_network_diffusion(adata, scores='H_merged',
network_key='actionet', alpha=0.85)Smooth scores over the network topology.
an.compute_feature_specificity(adata, labels='assigned_archetype')Compute archetype/cluster-specific marker genes.
an.layout_network(adata, network_key='actionet',
method='umap', n_components=2)Compute 2D/3D layout using UMAP or t-SNE.
an.correct_batch_effect(adata, batch_key='batch',
reduction_key='action',
corrected_key='action_corrected')Remove batch effects from reduced representation using orthogonalization.
an.correct_basal_expression(adata, basal_genes=['ACTB', 'GAPDH'],
reduction_key='action',
corrected_key='action_basal_corrected')Correct for basal/housekeeping gene expression.
an.impute_features(adata, features=['GENE1', 'GENE2'],
network_key='actionet', alpha=0.85)Impute gene expression using network diffusion.
an.impute_features_from_archetypes(adata, features=['GENE1', 'GENE2'],
H_key='H_merged')Impute expression from archetype profiles.
an.smooth_kernel(adata, reduction_key='action',
smoothed_key='action_smoothed', alpha=0.85)Smooth reduced representation using network diffusion.
an.find_markers(adata, labels='assigned_archetype', layer='logcounts')Identify marker genes for each group.
an.annotate_cells(adata, marker_dict={'CellType': ['GENE1', 'GENE2']})Annotate cells using marker gene dictionaries.
an.cluster_network(adata, network_key='actionet', resolution=1.0)Leiden clustering on the ACTIONet graph.
ACTIONet stores results in standard AnnData slots:
-
adata.obsm: Dimensionality reductions and archetype matrices
action: Reduced kernel representationH_stacked: Stacked archetype matrix (all scales)H_merged: Merged archetype matrix (unified)X_umap: UMAP coordinates
-
adata.obsp: Cell-cell networks
actionet: ACTIONet graph adjacency matrix
-
adata.obs: Cell metadata
assigned_archetype: Discrete archetype assignments
-
adata.varm: Gene/feature metadata
specificity_upper: Marker gene scores (upper-tail)specificity_lower: Marker gene scores (lower-tail)
-
adata.uns: Parameters and auxiliary data
action_params: Kernel reduction parametersaction_results: ACTION decomposition results (C matrices)
| R Function | Python Function | Notes |
|---|---|---|
reduce() |
an.reduce_kernel() |
Kernel reduction |
runACTION() |
an.run_action() |
ACTION decomposition |
buildNetwork() |
an.build_network() |
Network construction |
computeNetworkDiffusion() |
an.compute_network_diffusion() |
Network smoothing |
compute_archetype_feature_specificity() |
an.compute_feature_specificity() |
Marker genes |
compute_archetype_feature_specificity() |
an.compute_archetype_feature_specificity() |
Per-archetype specificity |
layoutNetwork() |
an.layout_network() |
UMAP/t-SNE layout |
runSVD() |
an.run_svd() |
SVD decomposition |
orthogonalizeBatchEffect() |
an.correct_batch_effect() |
Batch correction |
orthogonalizeBasal() |
an.correct_basal_expression() |
Basal correction |
imputeFeatures() |
an.impute_features() |
Network diffusion imputation |
imputeFromArchetypes() |
an.impute_features_from_archetypes() |
Archetype-based imputation |
smoothKernel() |
an.smooth_kernel() |
Kernel smoothing |
findMarkers() |
an.find_markers() |
Marker detection |
annotateCells() |
an.annotate_cells() |
Cell annotation |
annotateClusters() |
an.annotate_clusters() |
Cluster annotation |
clusterNetwork() |
an.cluster_network() |
Leiden clustering |
run_SPA() |
an.run_spa() |
Sparse archetypal analysis |
run_simplex_regression() |
an.run_simplex_regression() |
Simplex regression |
run_label_propagation() |
an.run_label_propagation() |
Label propagation |
colMaps(ace) |
adata.obsm |
Cell-level embeddings |
colNets(ace) |
adata.obsp |
Cell-level networks |
metadata(ace) |
adata.obs |
Cell annotations |
rowMaps(ace) |
adata.varm |
Gene-level annotations |
| R (ACTIONetExperiment) | Python (AnnData) |
|---|---|
assays(ace)$counts |
adata.X or adata.layers['counts'] |
assays(ace)$logcounts |
adata.layers['logcounts'] |
colMaps(ace)$ACTION |
adata.obsm['action'] |
colMaps(ace)$H_stacked |
adata.obsm['H_stacked'] |
colNets(ace)$ACTIONet |
adata.obsp['actionet'] |
metadata(ace)$assigned_archetype |
adata.obs['assigned_archetype'] |
rowMaps(ace)$specificity |
adata.varm['specificity_upper'] |
The build system uses scikit-build-core with CMake. Key options:
# Standard build
pip install .
# Enable architecture-specific optimizations (Linux only)
pip install . -C cmake.define.ACTIONET_ENABLE_OPTIMIZED=ON
# Verbose build output
pip install . -vmacOS:
- Default deployment target: macOS 11.0
- Builds native architecture (x86_64 or arm64)
- Uses Accelerate framework for BLAS/LAPACK
- OpenMP via Homebrew
libomp(required:brew install libomp) - Set
CMAKE_OSX_ARCHITECTURESfor cross-compilation
Linux:
- Targets glibc ≥ 2.17
- OpenMP via
libgomp(GCC) by default; override with-C cmake.define.LIBACTIONET_OPENMP_RUNTIME=GNU|INTEL|LLVM - For best performance, consider building with Intel MKL (see installation section above)
Submodule not initialized:
git submodule update --init --recursiveMissing Armadillo:
Armadillo headers are bundled in libactionet/include/extern. If CMake can't find them, check submodule status.
OpenMP errors: OpenMP is a hard requirement. The build will fail if no OpenMP runtime is found.
- Linux: Install
libgomp(apt install libgomp1oryum install libgomp). - macOS: Install
libompvia Homebrew (brew install libomp). - Conda: Install compilers (
conda install -c conda-forge compilers).
If using MKL (e.g., conda numpy), avoid mixed OpenMP runtimes by setting MKL_THREADING_LAYER=GNU when using GNU OpenMP, or by selecting Intel OpenMP with an Intel toolchain.
Examples:
# Force GNU OpenMP and align MKL
MKL_THREADING_LAYER=GNU pip install . -C cmake.define.LIBACTIONET_OPENMP_RUNTIME=GNU
# Use Intel OpenMP (Intel/IntelLLVM toolchains)
pip install . -C cmake.define.LIBACTIONET_OPENMP_RUNTIME=INTELLinking errors on macOS: Ensure Xcode Command Line Tools are installed and up to date.
pip install -e ".[dev]"
pytest tests/ -vblack src/actionet examples tests
ruff check src/actionet examples testsThe following R package components are not implemented in this Python translation:
-
R-specific visualization helpers (
plots.R,r_visualization.R,utils_plotting_*.R)- Reason: Python ecosystem has mature alternatives.
- Alternative: Use
scanpy.pl.*functions for visualization.
-
R-specific data I/O (
data.R)- Reason: AnnData provides native I/O; scanpy handles format conversion.
- Alternative:
scanpy.read_*()andadata.write_h5ad().
-
Parallel backend utilities (
utils_parallel.R)- Reason: Different parallelization approach. C++ threading via OpenMP is preserved.
- Alternative: Use
n_threadsparameter in functions.
-
Enrichment database utilities (
enrichment.R)- Reason: Python has dedicated packages.
- Alternative: Use
gseapy,gprofiler-official, ordecoupler.
-
Projection (
projection.R)- Reason: Requires R-specific reference dataset handling.
- Alternative: Future work or use
scvi-toolsfor reference mapping.
-
Pseudobulk DGE (
pseudobulk_DGE.R)- Reason: Wrapper around R statistics packages.
- Alternative: Use
scanpy.tl.rank_genes_groups(),pydeseq2, or call R viarpy2.
-
Marker detection helpers (
marker_detection.R)- Reason: High-level R wrappers.
- Alternative:
an.compute_feature_specificity()+ post-processing.
-
Alignment (
alignment.R)- Reason: Multi-dataset alignment utilities specific to R workflows.
- Alternative: Use scanpy integration tools or scvi-tools.
-
Filter ACE (
filter_ace.R)- Reason: ACTIONetExperiment-specific filtering.
- Alternative: Standard AnnData filtering:
adata[adata.obs['column'] > threshold, :].
-
Autocorrelation statistics (Moran's I, Geary's C)
- Reason: Low usage; available via squidpy for spatial data.
- Alternative:
squidpy.gr.spatial_autocorr().
-
Maximum-weight matching (MWM)
- Reason: Utility for batch alignment (omitted feature).
- Alternative:
scipy.optimize.linear_sum_assignment().
-
XICOR correlation
- Reason: Specialized rank-based correlation; niche use case.
- Alternative:
scipy.stats.spearmanr()or pandas correlation methods.
✅ Action decomposition: All archetypal analysis functions
✅ Network construction: Full graph building pipeline
✅ Network diffusion: Smoothing and propagation
✅ Feature specificity: Marker gene identification
✅ SVD/Kernel reduction: Dimensionality reduction (in-memory and backed/out-of-core)
✅ Visualization layouts: UMAP/t-SNE via C++ backend
✅ Matrix operations: Aggregation, normalization, transforms
✅ Batch correction: Orthogonalization-based batch effect and basal expression correction
✅ Imputation: Network diffusion and archetype-based gene expression imputation
✅ Clustering: Leiden clustering via igraph
✅ Annotation: Marker detection and cell/cluster annotation
✅ Preprocessing: Filtering, normalization, backed decompression
✅ Plotting: UMAP, feature expression overlays, QC violin plots (lets-plot, matplotlib, Plotly)
✅ Pipeline: End-to-end run_actionet() orchestration
GPL-3.0 (same as R package)
- C++ Backend: https://github.com/KellisLab/libactionet
- R Package: https://github.com/KellisLab/actionet-r
- R Data Container: https://github.com/shmohammadi86/ACTIONetExperiment
- AnnData: https://anndata.readthedocs.io
- scanpy: https://scanpy.readthedocs.io
UMAP plots use lets-plot for static rendering and Plotly WebGL for interactive views. Install plotting extras first.
pip install -e ".[plotting]"import actionet as act
from lets_plot import LetsPlot
# Prefer static notebook rendering for lets-plot UMAPs.
LetsPlot.setup_html(no_js=True)
# adata.obsm["X_umap"] must exist
p_static = act.plot_umap(adata, color="cluster")
p_png = p_static.to_png()
fig_raster = act.plot_umap_raster(adata, color="cluster")
fig = act.plot_umap_interactive(adata, color="gene1")
# Feature expression overlays
plots = act.plot_feature_expression(
adata,
features=["GeneA", "GeneB"],
features_use=None,
alpha=0,
layer="logcounts",
)
plots_raster = act.plot_feature_expression_raster(
adata,
features=["GeneA", "GeneB"],
alpha=0,
layer="logcounts",
)plot_umap() now disables lets-plot sampling and tooltips by default for fidelity-first static rendering. For very large notebook outputs, prefer PlotSpec.to_png() / to_pdf() first; move to plot_umap_raster() when full lets-plot specs become too heavy.