Problem
When a counter-assay dataset is available (measuring off-target activity for a subset of training compounds), it provides selectivity
information that is structurally uncorrelated with primary potency (empirically r ≈ 0.10). Primary pEC50 alone cannot predict whether a compound will be selective or show off-target liability.
Without selectivity scoring, ranked predictions from the primary model do not distinguish between selective PXR agonists and compounds with equivalent or higher counter-assay activity — which may represent assay interference, cytotoxicity, or true off-target pharmacology.
Proposed Solution
Add a SelectivityScorer class that trains a lightweight auxiliary model on the counter-assay data and predicts selectivity delta (Δ =
primary_pEC50 − counter_pEC50) for unlabeled test compounds:
# moal/selectivity.py (new file)
class SelectivityScorer:
"""
Trains on (SMILES, Δpotency) pairs from primary + counter-assay overlap
and predicts selectivity risk for new compounds.
"""
def fit(self, smiles: list[str], delta_pec50: list[float]): ...
def predict(self, smiles: list[str]) -> np.ndarray: ...
def flag_anti_selective(self, smiles, threshold=-1.0) -> list[bool]: ...
Backend options (configurable):
- Random Forest on ECFP4 fingerprints (fast, interpretable)
- A second ChemPropLightningModule (higher capacity, slower)
The multi-task head approach is the architectural solution; this issue covers the standalone post-hoc use case that requires no
changes to the primary training pipeline.
Integration point: predict_smiles() optionally returns a selectivity_flag column alongside primary pEC50 predictions.
Files
- moal/selectivity.py (new)
- moal/model.py (optional: hook into predict_smiles return value)
Notes
For test compounds with predicted Δ < −1.0 (counter more potent than primary), consider flagging rather than filtering — the primary pEC50
prediction remains valid, but the compound may not be a suitable lead without selectivity optimization.
Problem
When a counter-assay dataset is available (measuring off-target activity for a subset of training compounds), it provides selectivity
information that is structurally uncorrelated with primary potency (empirically r ≈ 0.10). Primary pEC50 alone cannot predict whether a compound will be selective or show off-target liability.
Without selectivity scoring, ranked predictions from the primary model do not distinguish between selective PXR agonists and compounds with equivalent or higher counter-assay activity — which may represent assay interference, cytotoxicity, or true off-target pharmacology.
Proposed Solution
Add a
SelectivityScorerclass that trains a lightweight auxiliary model on the counter-assay data and predicts selectivity delta (Δ =primary_pEC50 − counter_pEC50) for unlabeled test compounds:
Backend options (configurable):
The multi-task head approach is the architectural solution; this issue covers the standalone post-hoc use case that requires no
changes to the primary training pipeline.
Integration point: predict_smiles() optionally returns a selectivity_flag column alongside primary pEC50 predictions.
Files
Notes
For test compounds with predicted Δ < −1.0 (counter more potent than primary), consider flagging rather than filtering — the primary pEC50
prediction remains valid, but the compound may not be a suitable lead without selectivity optimization.