PyHeteroMap: A Companion Package for Resolving Local and Global Conformational Heterogeneity of the Human Intrinsically Disordered Proteome
PyHeteroMap is companion to a manuscript currently under review for publication.
PyHeteroMap can help analyze the local and global conformational landscapes of a single polypeptide, polymer, protein, or macromolecule, directly from its trajectory. It uses two main metrics of shape: instantaneous shape ratio (Rs) and relative shape anisotropy (RSA). These shape parameters have previously been defined in this publication: Biophysical Journal (2024). PyHeteroMap generates a scatter plot of Rs against RSA as a simple map of the conformational landscape of a polymer or protein. Tesei et al. (2024), Nature recenly published simulations of all Intrinsically Disordered Regions (IDRs) in the human proteome - an expansive dataset of 28,058 IDRs. We use these IDRs as a test use case for PyHeteroMap.
For a given intrinsically disordered protein (IDP) or region (IDR) of a protein:
- Global conformational ensembles: can be examined by generating a scatter plot of (instantaneous) Rs against RSA of the full chain.
- Local conformational ensembles: can be examined by using a moving/sliding window across the chain and monitoring ⟨Rs⟩, ⟨RSA⟩ and other polymer properties of each subchain. Local ensembles can also be examined by generating a scatter plot of (instantaneous) Rs against RSA of one or more subchain trajectories.
The Gaussian Walk (GW) polymer model, which is not restricted by excluded volume or other types of interactions, can provide a reference ensemble for the conformational ensembles of other proteins and polymers, as was previously demonstrated (Biophysical Journal (2024)).
For a given IDR trajectory, PyHeteroMap can generate:
-
Global (RSA, Rs) scatter plots
- Compare an IDR/peptide conformational landscape against that of a Gaussian Walk (GW) reference.
- Compute metrics such as the fC_shape score that quantify its conformational diversity.
- Compute ν (the Flory scaling exponent) (Tesei et al. (2024), Nature).
-
Local polymer property plots
- Display how polymer properties such as ⟨RSA⟩, ⟨Rₛ⟩, and others vary at the subchain level.
-
Local (RSA, Rs) plots
- For one or more selected subchains of an IDR, (RSA, Rs) scatter plots can be generated (see examples).
PyHeteroMap can additionally simulate new Gaussian Walk (GW) chains of any chain length and any number of snapshots.
(RSA, Rs) scatter plots can be generated directly from a csv file containing (RSA, Rs) data (no trajectory needed) (see examples).
PyHeteroMap mainly targets the ~28,000 human IDR simulations published by
Tesei et al. (2024), Nature
- Two such human IDR simulations are included as examples in the
examples/folder. - Each IDR has a unique identifier or seq_name.
- (RSA, Rs) plots can be generated without needing a trajectory, if a csv file containing (RSA, Rs) data is provided (see examples).
- Should work for other types of trajectories as well.
- Trajectory analysis is performed using MDTraj.
- PyHeteroMap can also simulate new Gaussian Walk (GW) chains of any chain length and number of snapshots.
-
Tesei_2024_IDR-ome_fasta_sequences.csv
Provides fasta sequences for all human IDRs published by Tesei et al. (2024). -
reference_GW_chainlength_100.csv
Gaussian Walk (GW) reference ensemble used in the examples.
Located at: src/pyheteromap/reference_GW_chainlength_100.csv
Clone and install locally:
# Ensure pip is up-to-date (requires version 21.3+)
python -m pip install -U "pip>=21.3"
# or run python -m pip install --upgrade pip
git clone https://github.com/hshadman/IDP_Global_Local_Conformational_Landscapes.git
cd IDP_Global_Local_Conformational_Landscapes
python -m pip install -e .
Example usage inside Python or Jupyter is shown in the examples folder.
NOTE: If the afrc setup fails, please double-check the afrc documentation here and here. The version of afrc used in PyHeteroMap works with Python 3.9.18, but does not work with Python >= 3.12. All dependencies are listed in pyproject.toml.
Example usage in an interface without graphical display (headless) is shown below:
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
import os, pyheteromap
from pyheteromap import PyHeteroMap
FASTA_CSV = os.path.join(os.path.dirname(pyheteromap.__file__), "Tesei_2024_IDR-ome_fasta_sequences.csv")
h = PyHeteroMap("IDR_Example")
h.set_trajectory("traj1.xtc", "top1.pdb")
h.initialize_30mer_subchain(FASTA_CSV)
h.plot_subchain_RSA(6, 4); plt.savefig("subchain_RSA.png", dpi=300, bbox_inches="tight"); plt.close()
h.plot_subchain_Rg(6, 4); plt.savefig("subchain_Rg.png", dpi=300, bbox_inches="tight"); plt.close()
h.mod_RSA_Rs_compute_3dplot_from_seq_name("magenta")
plt.savefig("RSA_Rs_vs_GW.png", dpi=300, bbox_inches="tight"); plt.close()
Please feel free to email me at hossain.shadman17@gmail.com.