Problem
ChemPropLightningModule currently uses torch.optim.Adam with no weight decay and no dropout in the FFN head. For datasets in the range of ~4,000–12,000 training records, this risks overfitting during fine-tuning, particularly for the encoder parameters.
Proposed Solution
1. Switch to AdamW:
# moal/model.py — configure_optimizers()
optimizer = torch.optim.AdamW(
[
{"params": encoder_params, "lr": self.lr_encoder, "weight_decay": self.weight_decay_encoder},
{"params": head_params, "lr": self.lr_head, "weight_decay": self.weight_decay_head},
]
)
- Expose constructor parameters:
class ChemPropLightningModule(pl.LightningModule):
def __init__(
self,
...
dropout: float = 0.0,
weight_decay_encoder: float = 1e-5,
weight_decay_head: float = 1e-4,
):
- Pass dropout to ChemProp MPNN (already supported in the chemprop API). Defaults of dropout=0.0 and weight_decay=1e-5/1e-4 are conservative and preserve backward compatibility.
Files
Notes
Weight decay for the encoder should be smaller than for the head (or zero) to avoid disrupting the pretrained CheMeleon features.
Problem
ChemPropLightningModulecurrently usestorch.optim.Adamwith no weight decay and no dropout in the FFN head. For datasets in the range of ~4,000–12,000 training records, this risks overfitting during fine-tuning, particularly for the encoder parameters.Proposed Solution
1. Switch to AdamW:
Files
Notes
Weight decay for the encoder should be smaller than for the head (or zero) to avoid disrupting the pretrained CheMeleon features.