immLynx

Linking advanced TCR python pipelines and Hugging Face models in R

immLynx provides a unified R interface for running multiple state-of-the-art TCR analysis pipelines on single-cell TCR sequencing data. The package seamlessly integrates with SingleCellExperiment and scRepertoire workflows, wrapping popular Python-based tools to enable:

tcrdist3: Calculate pairwise distances between T-cell receptors using runTCRdist
OLGA: Compute the generation probability of CDR3 sequences or generate new sequences with runOLGA
soNNia: Infer selection pressures on TCRs using runSoNNia
clusTCR: Cluster large sets of CDR3 sequences with runClustTCR
metaclonotypist: Identify TCR metaclones with runMetaclonotypist
ESM-2: Generate protein language model embeddings with runEmbeddings

For more details on each function, please refer to the R documentation (e.g., ?runTCRdist).

System Requirements

immLynx has been tested on R versions >= 4.3. Please consult the DESCRIPTION file for more details on required R packages - it is specifically designed to work with single-cell objects that have had BCR/TCRs added using scRepertoire. immLynx has been tested on OS X and Linux platforms.

Installation

Install immLynx:

# Install from GitHub
remotes::install_github("BorchLab/immLynx")

The first time you use immLynx, it will automatically install the required Python packages in an isolated environment via basilisk. This may take several minutes.

Quick Start

library(immLynx)
library(scater)

# Load example data
data("immLynx_example")

# Summarize TCR repertoire
summary <- summarizeTCRrepertoire(immLynx_example)
print(summary)

# Cluster TCRs
sce <- runClustTCR(immLynx_example, chains = "TRB", method = "mcl")

# Calculate generation probability
sce <- runOLGA(sce, chains = "TRB")

# Generate protein embeddings
sce <- runEmbeddings(sce, chains = "TRB")

# Visualize embeddings
sce <- scater::runUMAP(sce, dimred = "tcr_esm")
scater::plotReducedDim(sce, dimred = "UMAP")

Main Functions

Utility Functions

Extract and validate TCR data:

# Extract TCR data
tcr_data <- extractTCRdata(sce, chains = "TRB")

# Validate data format
validation <- validateTCRdata(tcr_data)

# Convert to tcrdist3 format
tcrdist_format <- convertToTcrdist(tcr_data)

# Generate repertoire summary
summary <- summarizeTCRrepertoire(sce)

TCR Clustering

Cluster TCRs based on sequence similarity using clusTCR:

# MCL clustering (default)
sce <- runClustTCR(sce,
                          chains = "TRB",
                          method = "mcl",
                          inflation = 2.0)

# DBSCAN clustering
sce <- runClustTCR(sce,
                          chains = "TRB",
                          method = "dbscan",
                          eps = 0.5)

Metaclone Discovery

Identify metaclones using metaclonotypist:

# Run metaclonotypist with TCRdist
sce <- runMetaclonotypist(sce,
                                  chains = "beta",
                                  method = "tcrdist",
                                  max_edits = 2,
                                  max_dist = 20)

# Use SCEPTR distance metric
sce <- runMetaclonotypist(sce,
                                  method = "sceptr",
                                  max_dist = 1.0)

Generation Probability

Calculate how likely each TCR sequence is to be generated naturally:

# Calculate Pgen for TRB sequences
sce <- runOLGA(sce,
                      chains = "TRB",
                      model = "humanTRB")

# Generate random TCR sequences
random_tcrs <- generateOLGA(n = 1000, model = "humanTRB")

Protein Embeddings

Generate dense vector representations using ESM-2:

# Default: ESM-2 35M model
sce <- runEmbeddings(sce,
                            chains = "TRB",
                            pool = "mean")

# Use larger model for better embeddings
sce <- runEmbeddings(sce,
                            model_name = "facebook/esm2_t33_650M_UR50D")

# Visualize in UMAP space
sce <- scater::runUMAP(sce, dimred = "tcr_esm")
scater::plotReducedDim(sce, dimred = "UMAP")

TCR Distance Calculation

Compute pairwise distances between TCRs:

# Calculate TRB distances
dist_results <- runTCRdist(sce,
                           chains = "beta",
                           organism = "human")

# Access distance matrices
beta_dist <- dist_results$distances$pw_beta

Selection Inference

soNNia Selection:

# 1. Generate background
background <- generateOLGA(n = 10000, model = "humanTRB")
write.csv(background, "background.csv", row.names = FALSE)

# 2. Run soNNia
sce <- runSoNNia(sce,
                        background_file = "background.csv")

Data Format

immLynx expects SingleCellExperiment objects with scRepertoire TCR data in the metadata. The data should include columns like:

CTgene: Gene information (V/J genes)
CTaa: CDR3 amino acid sequences
CTnt: CDR3 nucleotide sequences
CTstrict: Combined TCR information

The package uses immApex::getIR() internally to extract:

barcode: Cell identifier
cdr3_aa: CDR3 amino acid sequence
v, d, j, c: Gene segments
chain: TRA or TRB

Citation

If you use immLynx in your research, please cite the underlying tools:

clusTCR: Valkiers et al. (2021)
tcrdist3: Mayer-Blackwell et al. (2021)
OLGA: Sethna et al. (2019)
soNNia: Isacchini et al. (2021)
metaclonotypist: qimmuno
ESM-2: Lin et al. (2023)

Related Packages

If you are interested in GLIPH2 specificity group analysis, please see immGLIPH, a dedicated R package for running GLIPH:

remotes::install_github("BorchLab/immGLIPH")

Bug Reports/New Features

If you run into any issues or bugs please submit a GitHub issue with details of the issue.

If possible please include a reproducible example.

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
.github		.github
R		R
data		data
inst		inst
man		man
tests		tests
vignettes		vignettes
www		www
.Rbuildignore		.Rbuildignore
.gitattributes		.gitattributes
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

immLynx

System Requirements

Installation

Quick Start

Main Functions

Utility Functions

TCR Clustering

Metaclone Discovery

Generation Probability

Protein Embeddings

TCR Distance Calculation

Selection Inference

Data Format

Citation

Related Packages

Bug Reports/New Features

If you run into any issues or bugs please submit a GitHub issue with details of the issue.

Any requests for new features or enhancements can also be submitted as GitHub issues.

Pull Requests are welcome for bug fixes, new features, or enhancements.

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

immLynx

System Requirements

Installation

Quick Start

Main Functions

Utility Functions

TCR Clustering

Metaclone Discovery

Generation Probability

Protein Embeddings

TCR Distance Calculation

Selection Inference

Data Format

Citation

Related Packages

Bug Reports/New Features

If you run into any issues or bugs please submit a GitHub issue with details of the issue.

Any requests for new features or enhancements can also be submitted as GitHub issues.

Pull Requests are welcome for bug fixes, new features, or enhancements.

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages