Skip to content

Add GDT and TM-score structural similarity metrics#189

Open
diegozea wants to merge 2 commits intomasterfrom
codex/implement-gdt-and-tm-score-functions
Open

Add GDT and TM-score structural similarity metrics#189
diegozea wants to merge 2 commits intomasterfrom
codex/implement-gdt-and-tm-score-functions

Conversation

@diegozea
Copy link
Owner

@diegozea diegozea commented Dec 3, 2025

Summary

  • add fragment-based GDT (TS/HA) and TM-score calculations built on MIToS PDB superimposition utilities
  • expose the new structure similarity functions through the PDB API and test them on representative alignments

Testing

  • julia --project -e 'push!(LOAD_PATH,"test"); using MIToSTests; MIToSTests.retest("PDB"); MIToSTests.retest("Structure similarity (GDT/TM-score)")'

Codex Task

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

return mean(values(scores))
end

tm_d0(L::Integer) = max(0.5, 1.24 * (L - 15)^(1 / 3) - 1.8)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid DomainError in tm_score for short targets

tm_d0 currently computes 1.24 * (L - 15)^(1 / 3) - 1.8; when Ltarget is below 15 (short peptides), raising a negative base to the fractional exponent throws a DomainError before the max(0.5, …) guard can run. That means tm_score crashes instead of returning a score for small structures. Handle the Ltarget < 15 case explicitly (e.g., real cube root or early clamp) so short targets are supported.

Useful? React with 👍 / 👎.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codex address that feedback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

  • Updated TM-score normalization to use a real cube root for d0, avoiding DomainErrors for short targets while preserving the clamp behavior. src/PDB/Kabsch.jlL485

Testing

  • ⚠️ not run (not requested)

View task →

@codecov
Copy link

codecov bot commented Dec 3, 2025

Codecov Report

❌ Patch coverage is 96.55172% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.96%. Comparing base (4a26a3d) to head (2d684e8).
⚠️ Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
src/PDB/Kabsch.jl 96.55% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #189      +/-   ##
==========================================
- Coverage   96.97%   96.96%   -0.01%     
==========================================
  Files          64       64              
  Lines        4861     4948      +87     
==========================================
+ Hits         4714     4798      +84     
- Misses        147      150       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coveralls
Copy link

coveralls commented Dec 3, 2025

Coverage Status

coverage: 97.145% (-0.01%) from 97.156%
when pulling 2d684e8 on codex/implement-gdt-and-tm-score-functions
into 4a26a3d on master.

@github-actions
Copy link
Contributor

Benchmark Results (Julia v1)

Time benchmarks
master 2d684e8... master / 2d684e8...
Information/CorrectedMutualInformation/buslje09/msa 0.882 ± 0.0099 s 0.885 ± 0.008 s 0.997 ± 0.014
Information/CorrectedMutualInformation/buslje09/msa_large 0.0362 ± 0.0012 s 0.0363 ± 0.0013 s 0.999 ± 0.049
Information/CorrectedMutualInformation/buslje09/msa_wide 0.796 ± 0.0067 s 0.788 ± 0.0074 s 1.01 ± 0.013
Information/MIp/PF09645 9.46 ± 0.29 ms 9.48 ± 0.29 ms 0.997 ± 0.043
Information/frequencies!/1 0.301 ± 0.02 μs 0.3 ± 0.021 μs 1 ± 0.097
Information/frequencies!/2 1.44 ± 0.029 μs 1.44 ± 0.02 μs 1 ± 0.024
Information/highlevel/BLMI 0.0644 ± 0.00025 s 0.063 ± 0.00024 s 1.02 ± 0.0056
Information/highlevel/buslje09 11.4 ± 0.13 ms 11.3 ± 0.11 ms 1 ± 0.015
Information/shannon_entropy/PF09645 19.3 ± 0.62 μs 20.3 ± 0.7 μs 0.954 ± 0.045
MSA/Annotations/filtercolumns/boolean mask 10.3 ± 0.23 μs 10.1 ± 0.28 μs 1.03 ± 0.037
MSA/Annotations/filtercolumns/index array 3.42 ± 0.14 μs 3.42 ± 0.17 μs 1 ± 0.064
MSA/Base.vcat/annotated 4.44 ± 0.33 μs 4.45 ± 0.33 μs 0.998 ± 0.1
MSA/Base.vcat/unannotated 1.48 ± 0.14 μs 1.5 ± 0.13 μs 0.987 ± 0.13
MSA/Residue conversions/char2res 0.353 ± 0.89 ms 0.354 ± 0.029 ms 0.997 ± 2.5
MSA/Residue conversions/int2res 0.219 ± 0.84 ms 0.243 ± 0.81 ms 0.904 ± 4.6
MSA/Residue conversions/res2char 0.26 ± 0.011 ms 0.263 ± 0.011 ms 0.99 ± 0.058
MSA/Residue conversions/res2int 0.232 ± 0.82 ms 0.287 ± 0.77 ms 0.81 ± 3.6
MSA/hobohmI/pid20 0.451 ± 0.13 μs 0.46 ± 0.13 μs 0.98 ± 0.4
MSA/hobohmI/pid62 0.501 ± 0.12 μs 0.501 ± 0.12 μs 1 ± 0.34
MSA/hobohmI/pid80 0.501 ± 0.13 μs 0.501 ± 0.15 μs 1 ± 0.4
MSA/hobohmI/pid99 0.581 ± 0.13 μs 0.581 ± 0.12 μs 1 ± 0.3
MSA/identity/matrix_Float64 17.5 ± 0.49 μs 17.6 ± 0.48 μs 0.995 ± 0.039
MSA/identity/mean 0.0898 ± 0.022 ms 0.0914 ± 0.022 ms 0.982 ± 0.34
MSA/read/Clustal 29.7 ± 5.1 μs 29.8 ± 5.5 μs 0.998 ± 0.25
MSA/read/Clustal_num 29.6 ± 5 μs 29.8 ± 5.3 μs 0.992 ± 0.24
MSA/read/FASTA 0.0463 ± 0.0077 ms 0.0448 ± 0.0073 ms 1.03 ± 0.24
MSA/read/FASTA.gz 0.0506 ± 0.014 ms 0.0483 ± 0.014 ms 1.05 ± 0.42
MSA/read/FASTA.gz_annotated 0.0499 ± 0.0059 ms 0.0505 ± 0.011 ms 0.988 ± 0.25
MSA/read/FASTA_deletefullgaps 6.37 ± 2.1 ms 7.05 ± 1.6 ms 0.903 ± 0.36
MSA/read/FASTA_deletefullgaps_mapping 0.0952 ± 0.0043 s 0.0946 ± 0.0059 s 1.01 ± 0.077
MSA/read/Stockholm 0.0343 ± 0.0084 ms 0.0341 ± 0.0081 ms 1 ± 0.34
MSA/read/Stockholm.gz 0.0601 ± 0.0048 ms 0.0595 ± 0.0045 ms 1.01 ± 0.11
MSA/read/Stockholm_annotated 0.0464 ± 0.011 ms 0.0453 ± 0.011 ms 1.02 ± 0.34
MSA/read/Stockholm_mapping 0.198 ± 0.041 ms 0.198 ± 0.042 ms 1 ± 0.3
MSA/read/Stockholm_mapping_coords 0.12 ± 0.03 ms 0.119 ± 0.03 ms 1.01 ± 0.36
MSA/write/FASTA 0.22 ± 0.032 ms 0.215 ± 0.029 ms 1.02 ± 0.2
PDB/_generate_interaction_keys/defaults 0.0446 ± 0.017 ms 0.0441 ± 0.017 ms 1.01 ± 0.54
PDB/_get_matched_Cαs/hemoglobin 0.0381 ± 0.0079 ms 0.0388 ± 0.0081 ms 0.984 ± 0.29
PDB/_pdbresidues_to_mmcifdict/2vqc 0.621 ± 0.024 ms 0.599 ± 0.022 ms 1.04 ± 0.055
PDB/contact/1CBN_20_30_CB 0.19 ± 0.01 μs 0.2 ± 0.001 μs 0.95 ± 0.05
PDB/contact/1CBN_20_30_heavy 0.251 ± 0.01 μs 0.26 ± 0.001 μs 0.965 ± 0.039
PDB/count_alanine/1CBN 0.341 ± 0.001 μs 0.321 ± 0.01 μs 1.06 ± 0.033
PDB/distance/1CBN_20_30 0.14 ± 0.009 μs 0.14 ± 0.009 μs 1 ± 0.091
PDB/read/MMCIFFile 2.93 ± 0.054 ms 2.94 ± 0.049 ms 0.996 ± 0.025
PDB/squared_distance/1CBN_20_30_CB 0.21 ± 0.001 μs 0.2 ± 0.001 μs 1.05 ± 0.0073
PDB/squared_distance/1CBN_20_30_heavy 0.27 ± 0.001 μs 0.26 ± 0.01 μs 1.04 ± 0.04
Pfam/accession mapping/acc2seqnames 0.194 ± 0.01 ms 0.196 ± 0.011 ms 0.991 ± 0.074
SIFTS/ResidueDetails/_get_details 2.69 ± 0.78 μs 3.92 ± 1.1 μs 0.685 ± 0.27
SIFTS/ResidueDetails/_is_missing 3.31 ± 1.3 μs 2.6 ± 0.98 μs 1.27 ± 0.68
SIFTS/SIFTSResidue/18gs 0.1 ± 0 μs 0.1 ± 0.001 μs 1 ± 0.01
SIFTS/siftsmapping/2vqc 2.34 ± 0.059 ms 2.33 ± 0.08 ms 1 ± 0.043
Utils/get_n_words/ascii 0.12 ± 0.001 μs 0.12 ± 0.001 μs 1 ± 0.012
Utils/get_n_words/utf8 0.11 ± 0.001 μs 0.11 ± 0.001 μs 1 ± 0.013
Utils/hascoordinates/invalid 0.09 ± 0.01 μs 0.081 ± 0.01 μs 1.11 ± 0.18
Utils/hascoordinates/valid 0.13 ± 0.001 μs 0.13 ± 0.01 μs 1 ± 0.077
Utils/list2matrix/upper 0.223 ± 0.047 ms 0.241 ± 0.026 ms 0.923 ± 0.22
Utils/list2matrix/upper_diagonal 0.281 ± 0.04 ms 0.325 ± 0.035 ms 0.864 ± 0.15
Utils/matrix2list/upper 0.0692 ± 0.029 ms 0.0877 ± 0.0058 ms 0.789 ± 0.33
Utils/matrix2list/upper_diagonal 0.0692 ± 0.027 ms 0.088 ± 0.0061 ms 0.786 ± 0.31
time_to_load 0.798 ± 0.011 s 0.794 ± 0.011 s 1 ± 0.02
Memory benchmarks
master 2d684e8... master / 2d684e8...
Information/CorrectedMutualInformation/buslje09/msa 0.766 M allocs: 0.032 GB 0.766 M allocs: 0.032 GB 1
Information/CorrectedMutualInformation/buslje09/msa_large 0.0901 M allocs: 5.03 MB 0.0901 M allocs: 5.03 MB 1
Information/CorrectedMutualInformation/buslje09/msa_wide 0.742 M allocs: 30.3 MB 0.742 M allocs: 30.3 MB 1
Information/MIp/PF09645 20.3 k allocs: 0.819 MB 20.3 k allocs: 0.819 MB 1
Information/frequencies!/1 0 allocs: 0 B 0 allocs: 0 B
Information/frequencies!/2 0 allocs: 0 B 0 allocs: 0 B
Information/highlevel/BLMI 19.9 k allocs: 1.19 MB 19.9 k allocs: 1.19 MB 0.999
Information/highlevel/buslje09 0.0377 M allocs: 2.3 MB 0.0377 M allocs: 2.3 MB 1
Information/shannon_entropy/PF09645 0.047 k allocs: 12.2 kB 0.047 k allocs: 12.2 kB 1
MSA/Annotations/filtercolumns/boolean mask 18 allocs: 5.22 kB 18 allocs: 5.22 kB 1
MSA/Annotations/filtercolumns/index array 16 allocs: 1.62 kB 16 allocs: 1.62 kB 1
MSA/Base.vcat/annotated 0.143 k allocs: 6.58 kB 0.143 k allocs: 6.58 kB 1
MSA/Base.vcat/unannotated 0.064 k allocs: 2.7 kB 0.064 k allocs: 2.7 kB 1
MSA/Residue conversions/char2res 3 allocs: 4.1 MB 3 allocs: 4.1 MB 1
MSA/Residue conversions/int2res 3 allocs: 4.1 MB 3 allocs: 4.1 MB 1
MSA/Residue conversions/res2char 3 allocs: 2.05 MB 3 allocs: 2.05 MB 1
MSA/Residue conversions/res2int 3 allocs: 4.1 MB 3 allocs: 4.1 MB 1
MSA/hobohmI/pid20 31 allocs: 1.77 kB 31 allocs: 1.77 kB 1
MSA/hobohmI/pid62 31 allocs: 1.77 kB 31 allocs: 1.77 kB 1
MSA/hobohmI/pid80 31 allocs: 1.77 kB 31 allocs: 1.77 kB 1
MSA/hobohmI/pid99 31 allocs: 1.77 kB 31 allocs: 1.77 kB 1
MSA/identity/matrix_Float64 0.249 k allocs: 11.8 kB 0.249 k allocs: 11.8 kB 1
MSA/identity/mean 1.23 k allocs: 0.0517 MB 1.23 k allocs: 0.0517 MB 1
MSA/read/Clustal 0.394 k allocs: 24.3 kB 0.394 k allocs: 24.3 kB 1
MSA/read/Clustal_num 0.394 k allocs: 24.3 kB 0.394 k allocs: 24.3 kB 1
MSA/read/FASTA 0.406 k allocs: 0.044 MB 0.406 k allocs: 0.044 MB 1
MSA/read/FASTA.gz 0.443 k allocs: 0.0752 MB 0.443 k allocs: 0.0752 MB 1
MSA/read/FASTA.gz_annotated 0.533 k allocs: 0.0794 MB 0.533 k allocs: 0.0793 MB 1
MSA/read/FASTA_deletefullgaps 13.6 k allocs: 17.4 MB 13.6 k allocs: 17.4 MB 1
MSA/read/FASTA_deletefullgaps_mapping 1.64 M allocs: 0.0795 GB 1.64 M allocs: 0.0795 GB 1
MSA/read/Stockholm 0.402 k allocs: 0.033 MB 0.402 k allocs: 0.033 MB 1
MSA/read/Stockholm.gz 0.479 k allocs: 0.0754 MB 0.479 k allocs: 0.0754 MB 1
MSA/read/Stockholm_annotated 0.562 k allocs: 0.0413 MB 0.562 k allocs: 0.0413 MB 1
MSA/read/Stockholm_mapping 2.08 k allocs: 0.104 MB 2.08 k allocs: 0.104 MB 1
MSA/read/Stockholm_mapping_coords 1.64 k allocs: 0.0812 MB 1.64 k allocs: 0.0812 MB 1
MSA/write/FASTA 0.303 k allocs: 14.1 kB 0.303 k allocs: 14.1 kB 1
PDB/_generate_interaction_keys/defaults 0.497 k allocs: 0.0581 MB 0.497 k allocs: 0.0581 MB 1
PDB/_get_matched_Cαs/hemoglobin 0.584 k allocs: 0.0438 MB 0.584 k allocs: 0.0438 MB 1
PDB/_pdbresidues_to_mmcifdict/2vqc 8.56 k allocs: 1.12 MB 8.56 k allocs: 1.12 MB 1
PDB/contact/1CBN_20_30_CB 4 allocs: 0.281 kB 4 allocs: 0.281 kB 1
PDB/contact/1CBN_20_30_heavy 4 allocs: 0.281 kB 4 allocs: 0.281 kB 1
PDB/count_alanine/1CBN 0 allocs: 0 B 0 allocs: 0 B
PDB/distance/1CBN_20_30 0 allocs: 0 B 0 allocs: 0 B
PDB/read/MMCIFFile 0.039 M allocs: 2.9 MB 0.039 M allocs: 2.9 MB 1
PDB/squared_distance/1CBN_20_30_CB 4 allocs: 0.281 kB 4 allocs: 0.281 kB 1
PDB/squared_distance/1CBN_20_30_heavy 4 allocs: 0.281 kB 4 allocs: 0.281 kB 1
Pfam/accession mapping/acc2seqnames 4.32 k allocs: 0.319 MB 4.32 k allocs: 0.319 MB 1
SIFTS/ResidueDetails/_get_details 25 allocs: 1.45 kB 25 allocs: 1.45 kB 1
SIFTS/ResidueDetails/_is_missing 25 allocs: 1.45 kB 25 allocs: 1.45 kB 1
SIFTS/SIFTSResidue/18gs 4 allocs: 0.125 kB 4 allocs: 0.125 kB 1
SIFTS/siftsmapping/2vqc 5.94 k allocs: 0.88 MB 5.94 k allocs: 0.88 MB 1
Utils/get_n_words/ascii 5 allocs: 0.203 kB 5 allocs: 0.203 kB 1
Utils/get_n_words/utf8 5 allocs: 0.219 kB 5 allocs: 0.219 kB 1
Utils/hascoordinates/invalid 0 allocs: 0 B 0 allocs: 0 B
Utils/hascoordinates/valid 0 allocs: 0 B 0 allocs: 0 B
Utils/list2matrix/upper 3 allocs: 1.91 MB 3 allocs: 1.91 MB 1
Utils/list2matrix/upper_diagonal 6 allocs: 2.86 MB 6 allocs: 2.86 MB 1
Utils/matrix2list/upper 3 allocs: 0.952 MB 3 allocs: 0.952 MB 1
Utils/matrix2list/upper_diagonal 3 allocs: 0.956 MB 3 allocs: 0.956 MB 1
time_to_load 0.149 k allocs: 11.1 kB 0.149 k allocs: 11.1 kB 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants