Linking advanced TCR python pipelines and Hugging Face models in R
immLynx provides a unified R interface for running multiple state-of-the-art TCR analysis pipelines on single-cell TCR sequencing data. The package seamlessly integrates with SingleCellExperiment and scRepertoire workflows, wrapping popular Python-based tools to enable:
- tcrdist3: Calculate pairwise distances between T-cell receptors using
runTCRdist - OLGA: Compute the generation probability of CDR3 sequences or generate new sequences with
runOLGA - soNNia: Infer selection pressures on TCRs using
runSoNNia - clusTCR: Cluster large sets of CDR3 sequences with
runClustTCR - metaclonotypist: Identify TCR metaclones with
runMetaclonotypist - ESM-2: Generate protein language model embeddings with
runEmbeddings
For more details on each function, please refer to the R documentation (e.g., ?runTCRdist).
immLynx has been tested on R versions >= 4.3. Please consult the DESCRIPTION file for more details on required R packages - it is specifically designed to work with single-cell objects that have had BCR/TCRs added using scRepertoire. immLynx has been tested on OS X and Linux platforms.
Install immLynx:
# Install from GitHub
remotes::install_github("BorchLab/immLynx")The first time you use immLynx, it will automatically install the required Python packages in an isolated environment via basilisk. This may take several minutes.
library(immLynx)
library(scater)
# Load example data
data("immLynx_example")
# Summarize TCR repertoire
summary <- summarizeTCRrepertoire(immLynx_example)
print(summary)
# Cluster TCRs
sce <- runClustTCR(immLynx_example, chains = "TRB", method = "mcl")
# Calculate generation probability
sce <- runOLGA(sce, chains = "TRB")
# Generate protein embeddings
sce <- runEmbeddings(sce, chains = "TRB")
# Visualize embeddings
sce <- scater::runUMAP(sce, dimred = "tcr_esm")
scater::plotReducedDim(sce, dimred = "UMAP")Extract and validate TCR data:
# Extract TCR data
tcr_data <- extractTCRdata(sce, chains = "TRB")
# Validate data format
validation <- validateTCRdata(tcr_data)
# Convert to tcrdist3 format
tcrdist_format <- convertToTcrdist(tcr_data)
# Generate repertoire summary
summary <- summarizeTCRrepertoire(sce)Cluster TCRs based on sequence similarity using clusTCR:
# MCL clustering (default)
sce <- runClustTCR(sce,
chains = "TRB",
method = "mcl",
inflation = 2.0)
# DBSCAN clustering
sce <- runClustTCR(sce,
chains = "TRB",
method = "dbscan",
eps = 0.5)Identify metaclones using metaclonotypist:
# Run metaclonotypist with TCRdist
sce <- runMetaclonotypist(sce,
chains = "beta",
method = "tcrdist",
max_edits = 2,
max_dist = 20)
# Use SCEPTR distance metric
sce <- runMetaclonotypist(sce,
method = "sceptr",
max_dist = 1.0)Calculate how likely each TCR sequence is to be generated naturally:
# Calculate Pgen for TRB sequences
sce <- runOLGA(sce,
chains = "TRB",
model = "humanTRB")
# Generate random TCR sequences
random_tcrs <- generateOLGA(n = 1000, model = "humanTRB")Generate dense vector representations using ESM-2:
# Default: ESM-2 35M model
sce <- runEmbeddings(sce,
chains = "TRB",
pool = "mean")
# Use larger model for better embeddings
sce <- runEmbeddings(sce,
model_name = "facebook/esm2_t33_650M_UR50D")
# Visualize in UMAP space
sce <- scater::runUMAP(sce, dimred = "tcr_esm")
scater::plotReducedDim(sce, dimred = "UMAP")Compute pairwise distances between TCRs:
# Calculate TRB distances
dist_results <- runTCRdist(sce,
chains = "beta",
organism = "human")
# Access distance matrices
beta_dist <- dist_results$distances$pw_betasoNNia Selection:
# 1. Generate background
background <- generateOLGA(n = 10000, model = "humanTRB")
write.csv(background, "background.csv", row.names = FALSE)
# 2. Run soNNia
sce <- runSoNNia(sce,
background_file = "background.csv")immLynx expects SingleCellExperiment objects with scRepertoire TCR data in the metadata. The data should include columns like:
CTgene: Gene information (V/J genes)CTaa: CDR3 amino acid sequencesCTnt: CDR3 nucleotide sequencesCTstrict: Combined TCR information
The package uses immApex::getIR() internally to extract:
barcode: Cell identifiercdr3_aa: CDR3 amino acid sequencev,d,j,c: Gene segmentschain: TRA or TRB
If you use immLynx in your research, please cite the underlying tools:
- clusTCR: Valkiers et al. (2021)
- tcrdist3: Mayer-Blackwell et al. (2021)
- OLGA: Sethna et al. (2019)
- soNNia: Isacchini et al. (2021)
- metaclonotypist: qimmuno
- ESM-2: Lin et al. (2023)
If you are interested in GLIPH2 specificity group analysis, please see immGLIPH, a dedicated R package for running GLIPH:
remotes::install_github("BorchLab/immGLIPH")If you run into any issues or bugs please submit a GitHub issue with details of the issue.
- If possible please include a reproducible example.
