Problem
All LabelRecords of the same fidelity type (DOSE_RESPONSE or PRIMARY_SCREEN) contribute equally to the Tobit loss, governed only by the global w_drc / w_ps scalars. This ignores substantial variation in DRC measurement reliability.
In a representative primary screen dataset, ~8–9% of DRC records have pEC50 standard errors > 0.5 (vs. a median of ~0.15). These noisy measurements — predominantly weak/inactive compounds — contribute the same gradient signal as high-confidence actives. Treating them equally adds noise to the loss landscape.
Proposed Solution
moal/types.py — add a weight: float = 1.0 field to LabelRecord:
@dataclass
class LabelRecord:
...
weight: float = 1.0
- moal/model.py — multiply each record's per-sample loss by its weight in _tobit_loss(): loss = (w_drc * exact_loss * record.weight + w_ps * censored_loss * record.weight).mean()
Weights should be normalized to mean=1.0 within each fidelity class before training to preserve the scale relationship between w_drc and w_ps.
Recommended usage — for DRC records, set weight = 1 / std_error**2 (inverse variance), normalized; for PS records, weight by concentration-tier reliability.
Files
- moal/types.py
- moal/model.py (_tobit_loss)
Notes
The existing cost field on LabelRecord represents experimental cost for active learning acquisition decisions and should not be repurposed for loss weighting — these are semantically distinct.
Problem
All
LabelRecords of the same fidelity type (DOSE_RESPONSEorPRIMARY_SCREEN) contribute equally to the Tobit loss, governed only by the globalw_drc/w_psscalars. This ignores substantial variation in DRC measurement reliability.In a representative primary screen dataset, ~8–9% of DRC records have pEC50 standard errors > 0.5 (vs. a median of ~0.15). These noisy measurements — predominantly weak/inactive compounds — contribute the same gradient signal as high-confidence actives. Treating them equally adds noise to the loss landscape.
Proposed Solution
moal/types.py— add aweight: float = 1.0field toLabelRecord:Weights should be normalized to mean=1.0 within each fidelity class before training to preserve the scale relationship between w_drc and w_ps.
Recommended usage — for DRC records, set weight = 1 / std_error**2 (inverse variance), normalized; for PS records, weight by concentration-tier reliability.
Files
Notes
The existing cost field on LabelRecord represents experimental cost for active learning acquisition decisions and should not be repurposed for loss weighting — these are semantically distinct.