Skip to content

Visualization vignette #46

@lgeistlinger

Description

@lgeistlinger

A couple of issues:

Section 1 Setup (currently empty)

Section 2 Intro

  • This vignette demonstrates how to explore and visualize DMS and model scores
    from the ProteinGym database v1.2 -> experimental deep mutational scanning (DMS) fitness scores and variant effect prediction model scores from the ProteinGym database (would remove specific version numbers from the vignette as those become stale pretty fast)
  • all pairwise DMS substitution mutants -> DMS scores of all possible amino acid substitutions
  • demonstrates how to contrast models with DMS experiment scores -> demonstrates how to contrast variant effects predictions with experimental measurements

Section 3 Visualize DMS scores along a protein -> along a protein sequence

  • Explore the “ACE2_HUMAN” assay from Chan et al. 2020 -> Here, we explore the ACE2_HUMAN” DMS assay -> Chan et al. 2020 provide link to paper and include in References
  • use the arguments start_pos() and end_pos() -> start_pos and end_pos
  • The heatmap shows the DMS score at each position along the given protein (x-axis) -> given protein sequence (bottom x-axis)
  • alternate amino acid on displayed on the y-axis and the reference allele at the position is shown on top -> y-axis: substituted amino acid, top x-axis: reference amino acid
  • For this demonstration -> For demonstration
  • See [here][physiochem] for more information -> broken link
  • As a note, not all positions -> Note that not all positions
  • we can see that at positions 90 and 92, fitness remained high despite across amino acid changes -> virtually all possible amino acid changes at positions 90 and 92 lead to higher fitness
  • read the function documentation with ?plot_dms_heatmap() -> refer to the documentation of the plot_dms_heatmap function
  • In this region of the SHOC2_HUMAN protein, mutating to a K (y-axis) seemed to have the most benign affect across all mutations. -> For example, in this region of the SHOC2_HUMAN protein, mutating to a Lysine (K, y-axis) resulted more frequently in higher fitness.

Section 4 Visualize model scores along a protein -> along a protein sequence

  • ProteinGymR Bioc 3.21 -> ProteinGymR
  • supervised_available_models() -> available_semi_supervised_models()
  • default model scores from ProteinGym will be loaded in from ProteinGymR::zeroshot_substitutions() -> default model scores from zeroshot_substitutions()
  • GEMME heatmap: there are yellow points in the heatmap corresponding to higher scores that don't seem to be captured in the color scale on the right
  • increase height of figure to avoid have AA single-letter codes being squeezed on the y-axis
  • quite pathogenic across amino acid substitutions.Note ->
  • as with DMS scores -> as for DMS scores
  • Also note, -> Note further that
  • comparison across of raw model predictions should be cautioned in this context -> comparison of the predicted scores between models is thus not straightforward
  • For more information on model scores and how to interpret them, consult the original ProteinGym publication -> See the ProteinGym publication for more information on model scores and how to interpret them.
  • This can also be done with any output of class ComplexHeatmap::Heatmap() -> this functionality is available for all Heatmap objects generated with the ComplexHeatmap package.

Section Reference -> References

General note: always apply a spell check with the spell check functionality of your editor

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions