Skip to content

Post-hoc selectivity scoring using auxiliary assay data #27

@smcolby

Description

@smcolby

Problem

When a counter-assay dataset is available (measuring off-target activity for a subset of training compounds), it provides selectivity
information that is structurally uncorrelated with primary potency (empirically r ≈ 0.10). Primary pEC50 alone cannot predict whether a compound will be selective or show off-target liability.

Without selectivity scoring, ranked predictions from the primary model do not distinguish between selective PXR agonists and compounds with equivalent or higher counter-assay activity — which may represent assay interference, cytotoxicity, or true off-target pharmacology.

Proposed Solution

Add a SelectivityScorer class that trains a lightweight auxiliary model on the counter-assay data and predicts selectivity delta (Δ =
primary_pEC50 − counter_pEC50) for unlabeled test compounds:

# moal/selectivity.py (new file)
class SelectivityScorer:
    """
    Trains on (SMILES, Δpotency) pairs from primary + counter-assay overlap
    and predicts selectivity risk for new compounds.
    """
    def fit(self, smiles: list[str], delta_pec50: list[float]): ...
    def predict(self, smiles: list[str]) -> np.ndarray: ...
    def flag_anti_selective(self, smiles, threshold=-1.0) -> list[bool]: ...

Backend options (configurable):

  • Random Forest on ECFP4 fingerprints (fast, interpretable)
  • A second ChemPropLightningModule (higher capacity, slower)

The multi-task head approach is the architectural solution; this issue covers the standalone post-hoc use case that requires no
changes to the primary training pipeline.

Integration point: predict_smiles() optionally returns a selectivity_flag column alongside primary pEC50 predictions.

Files

  • moal/selectivity.py (new)
  • moal/model.py (optional: hook into predict_smiles return value)

Notes

For test compounds with predicted Δ < −1.0 (counter more potent than primary), consider flagging rather than filtering — the primary pEC50
prediction remains valid, but the compound may not be a suitable lead without selectivity optimization.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions