NeuCLIP: Efficient Large-Scale CLIP Training with Neural Normalizer Optimization

How can we efficiently estimate the normalization term in the contrastive loss?

We study the problem of estimating the normalization term in the contrastive loss (i.e., sum of exponential of similarity with all negative samples)
We reformulate the contrastive loss for each sample via convex analysis into a minimization problem with an auxiliary variable representing its log-normalizer
We then leverage a compact neural network to predict the log-normalizers, which is justified by variational analysis
We design an alternating optimization algorithm, named NeuCLIP, that jointly trains the CLIP model and the auxiliary network.
We conduct extensive experiments on various datasets to validate the effectiveness of NeuCLIP

Table of Contents

Experiment Results
Getting Started
Citing NeuCLIP

Experiment Results

Comparison with baselines: In the following figure, we present the Datacomp Average performance (left), ImageNet & Variants performance (middle), and Retrieval performance (right) of different methods trained on DFN-14M.

Getting Started

Environment Setup

To set up the environment for training, please

Download this repository:

git clone https://github.com/Optimization-AI/NeuCLIP.git
cd NeuCLIP

Create a new environment:

conda create -n fastclip python=3.11
conda activate fastclip
pip install -r requirements-training.txt

Training

We present sample slurm scripts to run NeuCLIP on DFN-14M.

Sample script to run NeuCLIP on DFN-14M using 8 GPUs (2 nodes and 4 GPUs per node)

#!/bin/bash
#SBATCH --time=2-00:00:00
#SBATCH --mem=120G
#SBATCH --nodes=2
#SBATCH --gres=gpu:4
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=6
#SBATCH --job-name=neuclip
#SBATCH --partition=gpu
#SBATCH --output=./job_output/%x_%j.log

source ~/.bashrc
conda activate fastclip

master_addr=$(scontrol show hostnames "$SLURM_JOB_NODELIST" | head -n 1)
export MASTER_ADDR=$master_addr
export MASTER_PORT=12805

export CUDA_VISIBLE_DEVICES='0,1,2,3'
export PYTHONPATH="$PYTHONPATH:$PWD/src"
export HUGGINGFACE_HUB_CACHE='./checkpoints/huggingface'

srun python -u src/training/main.py \
    --save-frequency 1 \
    --train-data './datasets/dfn2b/medium/shards/0000{0000..1926}.tar' \
    --train-num-samples 13710637 --data_size 19270000 \
    --warmup 500 \
    --batch-size 512 \
    --epochs 24 \
    --workers 6 \
    --model ViT-B-32 \
    --name neuclip \
    --seed 2026 \
    --wd 0.2 \
    --local-loss \
    --fastclip --multiply_tau --temperature_scheme global_learnable --temperature 0.07 \
    --lr 5e-4 --lr_tau 6.25e-5 --lr_tau_scheduler step_thresh --rho 11.0 --fastclip_eps 1e-6 \
    --gamma 0.42 --gamma_schedule cosine --gamma_decay_epochs 24 \
    --npn --npn_lr 1.0 --npn_num_protos 4096 --npn_repetition 10 --npn_restart_iter 500

Evaluation

We leverage the Datacomp benchmark to evaluate the performance of trained models. We refer the users to their GitHub repository for detailed instructions on how to run the evaluation. Alternatively, we provide a modified fork to simplify the evaluation process. To run the evaluation, please first prepare the environment, clone the repository and download the evaluation datasets:

# create the evaluation environment
env_name='fastclip_eval'
conda create -n "$env_name" python=3.11
conda activate "$env_name"
pip install -r requirements-eval.txt

# clone the datacomp repository
git clone -b project git@github.com:xywei00/datacomp.git

# download the evaluation datasets to `./datasets/datacomp`
python ./datacomp/download_evalsets.py ./datasets/datacomp

To evaluate a trained CLIP ViT-B/32 model at epoch 24, run the following command:

# train_output_dir should be the one containing 'checkpoints', 'out.log', etc.
train_output_dir='./logs/name'
data_dir='./datasets/datacomp'
arch='ViT-B-32'
epoch=24

python ./datacomp/evaluate.py --train_output_dir "${train_output_dir}" --data_dir "${data_dir}" --epoch "${epoch}" --arch "${arch}"

Citing NeuCLIP

If you find NeuCLIP useful in your research, please consider citing the following paper:

@inproceedings{wei2026neuclip,
  title={Neu{CLIP}: Efficient Large-Scale {CLIP} Training with Neural Normalizer Optimization},
  author={Xiyuan Wei and Chih-Jen Lin and Tianbao Yang},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026},
  url={https://openreview.net/forum?id=WoMMSVZHfP}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements-training.txt		requirements-training.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NeuCLIP: Efficient Large-Scale CLIP Training with Neural Normalizer Optimization

Experiment Results

Getting Started

Environment Setup

Training

Evaluation

Citing NeuCLIP

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NeuCLIP: Efficient Large-Scale CLIP Training with Neural Normalizer Optimization

Experiment Results

Getting Started

Environment Setup

Training

Evaluation

Citing NeuCLIP

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages