Post-hoc selectivity scoring using auxiliary assay data

## Problem
 
 When a counter-assay dataset is available (measuring off-target activity for a subset of training compounds), it provides selectivity 
information that is structurally uncorrelated with primary potency (empirically r ≈ 0.10). Primary pEC50 alone cannot predict whether a compound will be selective or show off-target liability.
 
 Without selectivity scoring, ranked predictions from the primary model do not distinguish between selective PXR agonists and compounds with equivalent or higher counter-assay activity — which may represent assay interference, cytotoxicity, or true off-target pharmacology.
 
 ## Proposed Solution
 
 Add a `SelectivityScorer` class that trains a lightweight auxiliary model on the counter-assay data and predicts selectivity delta (Δ = 
primary_pEC50 − counter_pEC50) for unlabeled test compounds:
 
 ```python
 # moal/selectivity.py (new file)
 class SelectivityScorer:
     """
     Trains on (SMILES, Δpotency) pairs from primary + counter-assay overlap
     and predicts selectivity risk for new compounds.
     """
     def fit(self, smiles: list[str], delta_pec50: list[float]): ...
     def predict(self, smiles: list[str]) -> np.ndarray: ...
     def flag_anti_selective(self, smiles, threshold=-1.0) -> list[bool]: ...
```

**Backend options (configurable):**

 - Random Forest on ECFP4 fingerprints (fast, interpretable)
 - A second ChemPropLightningModule (higher capacity, slower)

The multi-task head approach is the architectural solution; this issue covers the standalone post-hoc use case that requires no
changes to the primary training pipeline.

**Integration point:** predict_smiles() optionally returns a selectivity_flag column alongside primary pEC50 predictions.

## Files

 - moal/selectivity.py (new)
 - moal/model.py (optional: hook into predict_smiles return value)

## Notes

For test compounds with predicted Δ < −1.0 (counter more potent than primary), consider flagging rather than filtering — the primary pEC50
prediction remains valid, but the compound may not be a suitable lead without selectivity optimization.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Post-hoc selectivity scoring using auxiliary assay data #27

Problem

Proposed Solution

Files

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Post-hoc selectivity scoring using auxiliary assay data #27

Description

Problem

Proposed Solution

Files

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions