Skip to content

Revised domain detection and feature preparation pipeline#31

Open
segef wants to merge 13 commits intorevision_data-preparationfrom
revision_domain-detection
Open

Revised domain detection and feature preparation pipeline#31
segef wants to merge 13 commits intorevision_data-preparationfrom
revision_domain-detection

Conversation

@segef
Copy link
Copy Markdown
Contributor

@segef segef commented Mar 11, 2026

The domain detection pipeline is revised as follows:

  • domains are detected iteratively, one iteration per each domain
  • relaxed the domain detection thresholds (tmscore to 0.2, and num of aligned residues to 50-70)
  • clean-up of the unused methods and deprecated files
  • introduced a new script for the calculation of the structural features, which will be used by the classifiers in later stages (final pkl structure is the same as before to keep compatibility with the classifiers)
  • new notebook for clustering domains
  • support for config files in domain detection pipeline
  • optional prefiltering of sequences by foldseek alignments to domain templates (if no alignment detected, sequence will be ignored)

structural data

@segef segef requested a review from SamusRam March 11, 2026 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant