Skip to content

Fix clean ec to substrate mapping#34

Open
segef wants to merge 9 commits intomainfrom
bugfix/fix_clean_ec_to_substrate_mapping
Open

Fix clean ec to substrate mapping#34
segef wants to merge 9 commits intomainfrom
bugfix/fix_clean_ec_to_substrate_mapping

Conversation

@segef
Copy link
Copy Markdown
Contributor

@segef segef commented Mar 28, 2026

  • A new script is introduced that saves the EC to TPS substrate mapping to a json
  • CLEAN evaluation logic is updated to use the propagated json file

@segef segef requested a review from SamusRam March 28, 2026 11:06
SamusRam added a commit that referenced this pull request Apr 7, 2026
Merge PR #34 (bugfix/fix_clean_ec_to_substrate_mapping):
- Replace Rhea-based EC lookup with pre-computed JSON mapping
- Add fold-specific pretrained model support (fold_idx parameter)
- Add weighted majority voting for isTPS/substrate prediction
- Add ec_utils.py, clean_dataset_prep.py, get_ec_to_substrate_mapping.py
- Add fold_idx parameter to all predict_proba signatures
- Update all CLEAN/CLEANEcDetection/CLEANBetterDetection configs

Fix _NON_TPS_LABELS:
- Remove 'other' from the set so proteins with 'other' substrate are
  correctly labeled as TPS-positive (isTPS=True)
- Update tests accordingly

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant