Skip to content

Add per-sample loss weight to LabelRecord and apply in Tobit loss #15

@smcolby

Description

@smcolby

Problem

All LabelRecords of the same fidelity type (DOSE_RESPONSE or PRIMARY_SCREEN) contribute equally to the Tobit loss, governed only by the global w_drc / w_ps scalars. This ignores substantial variation in DRC measurement reliability.

In a representative primary screen dataset, ~8–9% of DRC records have pEC50 standard errors > 0.5 (vs. a median of ~0.15). These noisy measurements — predominantly weak/inactive compounds — contribute the same gradient signal as high-confidence actives. Treating them equally adds noise to the loss landscape.

Proposed Solution

  1. moal/types.py — add a weight: float = 1.0 field to LabelRecord:
    @dataclass
    class LabelRecord:
        ...
        weight: float = 1.0
  2. moal/model.py — multiply each record's per-sample loss by its weight in _tobit_loss(): loss = (w_drc * exact_loss * record.weight + w_ps * censored_loss * record.weight).mean()

Weights should be normalized to mean=1.0 within each fidelity class before training to preserve the scale relationship between w_drc and w_ps.

Recommended usage — for DRC records, set weight = 1 / std_error**2 (inverse variance), normalized; for PS records, weight by concentration-tier reliability.

Files

  • moal/types.py
  • moal/model.py (_tobit_loss)

Notes

The existing cost field on LabelRecord represents experimental cost for active learning acquisition decisions and should not be repurposed for loss weighting — these are semantically distinct.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions