PLM-Phylo

This is a repository for the scripts used in the paper "Multiple versus pairwise sequence alignments for protein phylogenetics using deep learning models", submitted to ISMB 2026 as a conference proceeding. This paper introduced two methods for inferring phylogenetic trees from the attention matrices of protein foundation models, specifically, MSA Transformer and ESM-2.

MSA Transformer sequence attention matrix phylogenetic inference

(a) A target sequence must be used to compose a multiple sequence alignment alongside all matching BLAST hits
(b) The resulting MSA (c) is then passed into the pretrained msa-pFM
(d) Model inference then yields a set of 144 SAMs of dimensions N x N x L (e), where N is the number of sequences in the input MSA and L is the sequence length
(f) Each SAM is averaged across L to derive a single SAM of dimensions N x N, which is then inverted to create a distance matrix representing the predicted evolutionary distance between any two sequences
(g) The distance matrix is passed to the neighbor-joining algorithm to infer a phylogenetic tree. This process yields a single tree for every attention layer/attention head combination
Multiple trees are combined using ASTRAL in order to yield a final consensus tree

ESM-2 residue attention matrix phylogenetic inference

(a) Two or more unaligned sequences are passed into the pretrained esm-pFM model
(b) Model inference yields (c) N attention matrices of dimensions L x L, where N is the number of sequences and L is the length of each sequence
Distance metrics are calculated for pairwise comparisons, producing a distance matrix (d) that represents the dissimilarity between the attentions of any two species
The distance matrix is passed to the neighbor-joining algorithm to infer a phylogenetic tree (e) This process yields a single tree for every attention layer/attention head combination
Multiple trees are combined using ASTRAL in order to yield a final consensus tree.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Data		Data
Figures		Figures
Plots		Plots
Scripts		Scripts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PLM-Phylo

MSA Transformer sequence attention matrix phylogenetic inference

ESM-2 residue attention matrix phylogenetic inference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PLM-Phylo

MSA Transformer sequence attention matrix phylogenetic inference

ESM-2 residue attention matrix phylogenetic inference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages