Skip to content

Multi-task auxiliary head for counter-assay / selectivity prediction #16

@smcolby

Description

@smcolby

Problem

When a counter-assay dataset is available (e.g. a parallel orthogonal assay measuring off-target activity), the data is currently unused because moal enforces at most one DOSE_RESPONSE record per compound, and counter-assay pEC50 values cannot be added as a second == label for the same compound.

Counter-assay activity is often structurally uncorrelated with primary activity (empirically, Pearson r ≈ 0.10 in tested datasets), meaning the primary-assay-only encoder never receives gradient signal about selectivity-relevant structural features. This is a missed opportunity.

Proposed Solution

Extend ChemPropLightningModule with an optional second FFN head for auxiliary assay prediction:

ChemPropLightningModule(
    ...
    aux_head: bool = False,
    w_aux: float = 0.3,
)

Architecture:

  • Shared ChemProp/CheMeleon encoder → primary_head (existing) + aux_head (new, same FFN structure)
  • aux_head outputs a scalar (auxiliary assay pEC50 or Δ selectivity)
  • Loss: L = L_primary(Tobit) + w_aux * L_aux(Tobit)
  • L_aux is computed only for records that carry an auxiliary label; records without aux labels contribute 0 to L_aux (masking, not skipping)

Data contract change: Add an optional aux_value / aux_censoring_type pair to LabelRecord, or accept a separate list of aux LabelRecords keyed by canonical SMILES.

predict_smiles() change: Return a tuple (primary_preds, aux_preds) when aux_head=True; aux predictions are informational (selectivity risk flag) and not part of primary output.

Files

  • moal/types.py — optional aux fields on LabelRecord
  • moal/model.py — second head, masked aux loss, updated forward/predict
  • moal/dataset.py — pass aux labels through DataModule
  • moal/planning.py — extend validate_training_records() to allow aux-only labels

Impact

Allows the shared encoder to learn selectivity-relevant structural features, improving primary predictions for active compounds. Also enables direct selectivity risk scoring on unlabeled test compounds.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions