Hesam Asadollahzadeh, Feng Liu, Christopher Leckie, Sarah M. Erfani · University of Melbourne · OpenReview
Corresponding author: h.asadollahzadeh@unimelb.edu.au
Official PyTorch code for TRACER — Trajectory-Robust Anchoring for Contrastive Encoder Regularization.
TRACER couples multimodal contrastive learning with self-distillation from a weighted moving-average (WMA) teacher trained along the trajectory.
The student is optimized with LMMCL; the teacher supplies LSD-WMA to preserve orthogonal pretrained structure while adapting in the task subspace (see Algorithm 1 in the paper).
Fine-tuning pretrained multimodal models improves in-distribution (ID) accuracy but often erodes out-of-distribution (OOD) robustness—a hallmark of catastrophic forgetting.
We study contrastive fine-tuning through the contrastive target matrix, a reformulation that turns the linearized objective into a matrix least-squares problem and makes the geometry explicit: adaptation in the task subspace versus preservation along orthogonal directions.
Classical EMA teachers progressively weaken their regularizing gap to the student. Weighted moving-average (WMA) teachers integrate the optimization trajectory and retain meaningful regularization over finite horizons. TRACER combines multimodal contrastive learning with WMA-guided, multi-perspective distillation. On CLIP fine-tuning, TRACER yields consistent OOD accuracy and calibration gains across architectures, backed by thorough ablations over distillation components, regularization strength, teacher update schedules, and kernel shape.
Keywords
Multi-modal Contrastive Learning · Robust Fine-tuning · Distributional Robustness · Self-distillation
- Contrastive target matrix — A least-squares view of linearized contrastive finetuning with closed-form insight into common recipes.
- Task vs. orthogonal geometry — A decomposition that localizes forgetting and motivates dynamic teachers.
- Trajectory regularization — We highlight EMA collapse of the teacher–student signal and show how WMA keeps a usable anchor; TRACER translates this into practice with strong empirical robustness gains.
export PYTHONPATH="${PYTHONPATH}:$(pwd)"
# Adjust dataset paths inside the script first:
bash example_scripts/tracer.shpython src/main.py --helpFor all training flags, see src/args.py.
| Path | Role |
|---|---|
src/models/tracer_loss.py |
TRACER loss and teacher / distillation logic |
src/main.py |
Training entry point |
src/args.py |
CLI / hyperparameters |
example_scripts/tracer.sh |
Example launch script (edit paths for your machine) |
If you use this code or the paper, please cite:
BibTeX (click to expand)
@inproceedings{
asadollahzadeh2026tracer,
title={{TRACER}: Persistent Regularization for Robust Multimodal Finetuning},
author={Asadollahzadeh, Hesam and Liu, Feng and Leckie, Christopher and Erfani, Sarah M.},
booktitle={Forty-third International Conference on Machine Learning},
year={2026},
url={https://openreview.net/forum?id=XOYXLQRlj8}
}