This is the official repository for the paper:
Weight Weaving: Parameter Pooling for Data-Free Model Merging
Levy Chaves, Eduardo Valle, Sandra Avila
NeurIPS @ 3rd UniReps Workshop 2025.
Abstract: Model merging provides a cost-effective and data-efficient combination of specialized deep neural networks through parameter integration. This technique leverages expert models across downstream tasks without requiring retraining. Most model merging approaches critically depend on scaling hyper-parameters
$\lambda$ , which weight each model's contribution globally or individually. Principled approaches for setting scaling factors without accessing any data (data-free) are scarce, often leading researchers to tune$\lambda$ using privileged data from the evaluation set, which is obviously unfeasible in practice. To address this limitation, we introduce Weight Weaving, a plug-and-play technique that pools model weights across$\lambda$ values search space using user-defined pooling functions, such as averaging, random selection, or even existing model merging methods. Our method demonstrates high modularity, imposing minimal constraints on the search space. It operates orthogonally to existing model merging methods and eliminates evaluation data requirements. We validate Weight Weaving across three ViT variants in three experimental setups: vision multi-task learning, vision continual learning, and domain generalization. Our method consistently improves the performance of several model merging methods, achieving average accuracy gains of up to 15.9 percentage points in a data-free setting.
For a quick installation use the following commands:
conda env create -f environment.yamlThe code is separated into two parts:
- training - these are the scripts
finetune_*, - merging - these are the scripts
merge_*
First, we need to fine-tune the models for each task.
To perform sequential fine-tuning for CIFAR-100 with 10 epochs per task, 20 splits, and random seed 5:
bash scripts/CIL/finetune_seq.sh ViT-B-16 CIFAR100 10 20 5The scripts' arguments ordering are the following:
script_name.sh <model> <dataset> <epochs> <splits> <seed>To fine-tune all CIL models, simply run:
bash scripts/CIL/all.shAfter fine-tuning all models, collect the results for each merging method for a given model (ViT-B-16):
bash scripts/CIL/sota/run_sota_all_seq.sh ViT-B-16For a single run on our proposed parameter pooling for ViT-B-16, and randmix as the pooling function:
bash scripts/CIL/augment/run_plus_seq.sh ViT-B-16 randmix
*_plus_*files indicate pooling the parameters over the set$A^{*}$ (Algo.1 row 10 in paper)
*_new_*files indicate pooling the parameters over the set$augmented_weights$ (Algo.1 rows 6-9 in paper)
To fine-tune a single model across the eight datasets:
bash scripts/8ds/finetune_8ds.sh ViT-B-16After fine-tuning all models, collect the results for each merging method for a given model (ViT-B-16):
bash scripts/8ds/mtl/run_all_sota.sh ViT-B-16To evaluate our pooling method on ViT-B-16 with randmix pooling:
bash scripts/8ds/augment_mtl/run_all_aug.sh ViT-B-16 randmix First, fine-tuning on eight datasets (as described above)
To obtain results for all evaluated model merging methods for the ViT-B-16 model
bash scripts/8ds/ood/sota-ood.sh ViT-B-16To run results with parameter pooling with the ViT-B-16 model and randmix as the pooling function, run:
bash scripts/8ds/augment_ood/run_all_aug.sh ViT-B-16 randmix- By default, the code will run on all available GPUs. Use
CUDA_VISIBLE_DEVICES=Xto restrict GPU usage.
We provide scripts to help download and prepare the datasets used in our experiments.
-
scripts/download_datasets_task_vector.shwill download and make the adequate folder structure to make this dataset compatible with our loading classes defined insrc/datasets/. If a given dataset is not there, it means that Torchvision will automatically handle the download. -
Once the download is complete, run the file
scripts/split_datasets_ilharco.pyto generate the train and test splits, as described in mlfoundations/task_vectors#1 -
The CUB200 dataset is a particular case and requires further preprocessing after downloading from the torchvision library. After download, run the
scripts/process_cub200_dataset.py.
-
If a script fails with a CUDA error, verify your PyTorch + CUDA compatibility and available GPU memory.
-
If a script cannot find a dataset, double-check that you ran the dataset download/preparation scripts and that the environment variables pointing to dataset locations match those used by the scripts.
This repository builds upon and integrates components from the following open-source projects:
We thank the authors of these works for their open-source contributions.
L. Chaves is partially funded by FAPESP (2024/16685-7), CAPES, Becas Santander/UNICAMP – HUB 2022, and Google LARA 2021. S.Avila is also partially funded by FAPESP (2023/12086-9, 2023/12865-8, 2020/09838-0, 2013/08293-7), H.IAAC 01245.003479/2024-10 and CNPq 316489/2023-9, and Google AIR~2022.
@article{chaves2025weight,
title={Weight Weaving: Parameter Pooling for Data-Free Model Merging},
author={Chaves, Levy and Valle, Eduardo and Avila, Sandra},
journal={arXiv preprint arXiv:2510.13921},
year={2025}
}