Paper link: OpenReview
How can we efficiently estimate the normalization term in the contrastive loss?
- We study the problem of estimating the normalization term in the contrastive loss (i.e., sum of exponential of similarity with all negative samples)
- We reformulate the contrastive loss for each sample via convex analysis into a minimization problem with an auxiliary variable representing its log-normalizer
- We then leverage a compact neural network to predict the log-normalizers, which is justified by variational analysis
- We design an alternating optimization algorithm, named NeuCLIP, that jointly trains the CLIP model and the auxiliary network.
- We conduct extensive experiments on various datasets to validate the effectiveness of NeuCLIP
Table of Contents
Comparison with baselines: In the following figure, we present the Datacomp Average performance (left), ImageNet & Variants performance (middle), and Retrieval performance (right) of different methods trained on DFN-14M.
To set up the environment for training, please
- Download this repository:
git clone https://github.com/Optimization-AI/NeuCLIP.git cd NeuCLIP - Create a new environment:
conda create -n fastclip python=3.11 conda activate fastclip pip install -r requirements-training.txt
We present sample slurm scripts to run NeuCLIP on DFN-14M.
Sample script to run NeuCLIP on DFN-14M using 8 GPUs (2 nodes and 4 GPUs per node)
#!/bin/bash
#SBATCH --time=2-00:00:00
#SBATCH --mem=120G
#SBATCH --nodes=2
#SBATCH --gres=gpu:4
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=6
#SBATCH --job-name=neuclip
#SBATCH --partition=gpu
#SBATCH --output=./job_output/%x_%j.log
source ~/.bashrc
conda activate fastclip
master_addr=$(scontrol show hostnames "$SLURM_JOB_NODELIST" | head -n 1)
export MASTER_ADDR=$master_addr
export MASTER_PORT=12805
export CUDA_VISIBLE_DEVICES='0,1,2,3'
export PYTHONPATH="$PYTHONPATH:$PWD/src"
export HUGGINGFACE_HUB_CACHE='./checkpoints/huggingface'
srun python -u src/training/main.py \
--save-frequency 1 \
--train-data './datasets/dfn2b/medium/shards/0000{0000..1926}.tar' \
--train-num-samples 13710637 --data_size 19270000 \
--warmup 500 \
--batch-size 512 \
--epochs 24 \
--workers 6 \
--model ViT-B-32 \
--name neuclip \
--seed 2026 \
--wd 0.2 \
--local-loss \
--fastclip --multiply_tau --temperature_scheme global_learnable --temperature 0.07 \
--lr 5e-4 --lr_tau 6.25e-5 --lr_tau_scheduler step_thresh --rho 11.0 --fastclip_eps 1e-6 \
--gamma 0.42 --gamma_schedule cosine --gamma_decay_epochs 24 \
--npn --npn_lr 1.0 --npn_num_protos 4096 --npn_repetition 10 --npn_restart_iter 500We leverage the Datacomp benchmark to evaluate the performance of trained models. We refer the users to their GitHub repository for detailed instructions on how to run the evaluation. Alternatively, we provide a modified fork to simplify the evaluation process. To run the evaluation, please first prepare the environment, clone the repository and download the evaluation datasets:
# create the evaluation environment
env_name='fastclip_eval'
conda create -n "$env_name" python=3.11
conda activate "$env_name"
pip install -r requirements-eval.txt
# clone the datacomp repository
git clone -b project git@github.com:xywei00/datacomp.git
# download the evaluation datasets to `./datasets/datacomp`
python ./datacomp/download_evalsets.py ./datasets/datacompTo evaluate a trained CLIP ViT-B/32 model at epoch 24, run the following command:
# train_output_dir should be the one containing 'checkpoints', 'out.log', etc.
train_output_dir='./logs/name'
data_dir='./datasets/datacomp'
arch='ViT-B-32'
epoch=24
python ./datacomp/evaluate.py --train_output_dir "${train_output_dir}" --data_dir "${data_dir}" --epoch "${epoch}" --arch "${arch}"If you find NeuCLIP useful in your research, please consider citing the following paper:
@inproceedings{wei2026neuclip,
title={Neu{CLIP}: Efficient Large-Scale {CLIP} Training with Neural Normalizer Optimization},
author={Xiyuan Wei and Chih-Jen Lin and Tianbao Yang},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=WoMMSVZHfP}
}