Problem
When a counter-assay dataset is available (e.g. a parallel orthogonal assay measuring off-target activity), the data is currently unused because moal enforces at most one DOSE_RESPONSE record per compound, and counter-assay pEC50 values cannot be added as a second == label for the same compound.
Counter-assay activity is often structurally uncorrelated with primary activity (empirically, Pearson r ≈ 0.10 in tested datasets), meaning the primary-assay-only encoder never receives gradient signal about selectivity-relevant structural features. This is a missed opportunity.
Proposed Solution
Extend ChemPropLightningModule with an optional second FFN head for auxiliary assay prediction:
ChemPropLightningModule(
...
aux_head: bool = False,
w_aux: float = 0.3,
)
Architecture:
- Shared ChemProp/CheMeleon encoder → primary_head (existing) + aux_head (new, same FFN structure)
- aux_head outputs a scalar (auxiliary assay pEC50 or Δ selectivity)
- Loss: L = L_primary(Tobit) + w_aux * L_aux(Tobit)
- L_aux is computed only for records that carry an auxiliary label; records without aux labels contribute 0 to L_aux (masking, not skipping)
Data contract change: Add an optional aux_value / aux_censoring_type pair to LabelRecord, or accept a separate list of aux LabelRecords keyed by canonical SMILES.
predict_smiles() change: Return a tuple (primary_preds, aux_preds) when aux_head=True; aux predictions are informational (selectivity risk flag) and not part of primary output.
Files
- moal/types.py — optional aux fields on LabelRecord
- moal/model.py — second head, masked aux loss, updated forward/predict
- moal/dataset.py — pass aux labels through DataModule
- moal/planning.py — extend validate_training_records() to allow aux-only labels
Impact
Allows the shared encoder to learn selectivity-relevant structural features, improving primary predictions for active compounds. Also enables direct selectivity risk scoring on unlabeled test compounds.
Problem
When a counter-assay dataset is available (e.g. a parallel orthogonal assay measuring off-target activity), the data is currently unused because
moalenforces at most oneDOSE_RESPONSErecord per compound, and counter-assay pEC50 values cannot be added as a second==label for the same compound.Counter-assay activity is often structurally uncorrelated with primary activity (empirically, Pearson r ≈ 0.10 in tested datasets), meaning the primary-assay-only encoder never receives gradient signal about selectivity-relevant structural features. This is a missed opportunity.
Proposed Solution
Extend
ChemPropLightningModulewith an optional second FFN head for auxiliary assay prediction:Architecture:
Data contract change: Add an optional aux_value / aux_censoring_type pair to LabelRecord, or accept a separate list of aux LabelRecords keyed by canonical SMILES.
predict_smiles() change: Return a tuple (primary_preds, aux_preds) when aux_head=True; aux predictions are informational (selectivity risk flag) and not part of primary output.
Files
Impact
Allows the shared encoder to learn selectivity-relevant structural features, improving primary predictions for active compounds. Also enables direct selectivity risk scoring on unlabeled test compounds.