Skip to content

This repository contains the code to reproduce the work "Weight Weaving: Parameter Pooling for Data-Free Model Merging", accepted at the UniReps workshop @ NeurIPS 2025.

Notifications You must be signed in to change notification settings

VirtualSpaceman/weight_weaving

Repository files navigation

Weight Weaving: Parameter Pooling for Data-Free Model Merging

This is the official repository for the paper:

Weight Weaving: Parameter Pooling for Data-Free Model Merging
Levy Chaves, Eduardo Valle, Sandra Avila
NeurIPS @ 3rd UniReps Workshop 2025.

Abstract: Model merging provides a cost-effective and data-efficient combination of specialized deep neural networks through parameter integration. This technique leverages expert models across downstream tasks without requiring retraining. Most model merging approaches critically depend on scaling hyper-parameters $\lambda$, which weight each model's contribution globally or individually. Principled approaches for setting scaling factors without accessing any data (data-free) are scarce, often leading researchers to tune $\lambda$ using privileged data from the evaluation set, which is obviously unfeasible in practice. To address this limitation, we introduce Weight Weaving, a plug-and-play technique that pools model weights across $\lambda$ values search space using user-defined pooling functions, such as averaging, random selection, or even existing model merging methods. Our method demonstrates high modularity, imposing minimal constraints on the search space. It operates orthogonally to existing model merging methods and eliminates evaluation data requirements. We validate Weight Weaving across three ViT variants in three experimental setups: vision multi-task learning, vision continual learning, and domain generalization. Our method consistently improves the performance of several model merging methods, achieving average accuracy gains of up to 15.9 percentage points in a data-free setting.

Instalation

For a quick installation use the following commands:

conda env create -f environment.yaml

Usage

The code is separated into two parts:

  • training - these are the scripts finetune_*,
  • merging - these are the scripts merge_*

First, we need to fine-tune the models for each task.

Class-incremental learning (CIL)

To perform sequential fine-tuning for CIFAR-100 with 10 epochs per task, 20 splits, and random seed 5:

bash scripts/CIL/finetune_seq.sh ViT-B-16 CIFAR100 10 20 5

The scripts' arguments ordering are the following:

script_name.sh  <model> <dataset> <epochs> <splits> <seed>

To fine-tune all CIL models, simply run:

bash scripts/CIL/all.sh

After fine-tuning all models, collect the results for each merging method for a given model (ViT-B-16):

bash scripts/CIL/sota/run_sota_all_seq.sh ViT-B-16

For a single run on our proposed parameter pooling for ViT-B-16, and randmix as the pooling function:

bash scripts/CIL/augment/run_plus_seq.sh ViT-B-16 randmix

*_plus_* files indicate pooling the parameters over the set $A^{*}$ (Algo.1 row 10 in paper)

*_new_* files indicate pooling the parameters over the set $augmented_weights$ (Algo.1 rows 6-9 in paper)

Multi-Task Learning (8 Datasets)

To fine-tune a single model across the eight datasets:

bash scripts/8ds/finetune_8ds.sh ViT-B-16

After fine-tuning all models, collect the results for each merging method for a given model (ViT-B-16):

bash scripts/8ds/mtl/run_all_sota.sh ViT-B-16

To evaluate our pooling method on ViT-B-16 with randmix pooling:

bash scripts/8ds/augment_mtl/run_all_aug.sh ViT-B-16 randmix 

8 datasets (Generalization - Leave-one-out)

First, fine-tuning on eight datasets (as described above)

To obtain results for all evaluated model merging methods for the ViT-B-16 model

bash scripts/8ds/ood/sota-ood.sh ViT-B-16

To run results with parameter pooling with the ViT-B-16 model and randmix as the pooling function, run:

bash scripts/8ds/augment_ood/run_all_aug.sh ViT-B-16 randmix

Tips

  • By default, the code will run on all available GPUs. Use CUDA_VISIBLE_DEVICES=X to restrict GPU usage.

Datasets

We provide scripts to help download and prepare the datasets used in our experiments.

  • scripts/download_datasets_task_vector.sh will download and make the adequate folder structure to make this dataset compatible with our loading classes defined in src/datasets/. If a given dataset is not there, it means that Torchvision will automatically handle the download.

  • Once the download is complete, run the file scripts/split_datasets_ilharco.py to generate the train and test splits, as described in mlfoundations/task_vectors#1

  • The CUB200 dataset is a particular case and requires further preprocessing after downloading from the torchvision library. After download, run the scripts/process_cub200_dataset.py.

Troubleshooting

  • If a script fails with a CUDA error, verify your PyTorch + CUDA compatibility and available GPU memory.

  • If a script cannot find a dataset, double-check that you ran the dataset download/preparation scripts and that the environment variables pointing to dataset locations match those used by the scripts.

Credits

This repository builds upon and integrates components from the following open-source projects:

We thank the authors of these works for their open-source contributions.

Acknowledgments

L. Chaves is partially funded by FAPESP (2024/16685-7), CAPES, Becas Santander/UNICAMP – HUB 2022, and Google LARA 2021. S.Avila is also partially funded by FAPESP (2023/12086-9, 2023/12865-8, 2020/09838-0, 2013/08293-7), H.IAAC 01245.003479/2024-10 and CNPq 316489/2023-9, and Google AIR~2022.

Citation

@article{chaves2025weight,
  title={Weight Weaving: Parameter Pooling for Data-Free Model Merging},
  author={Chaves, Levy and Valle, Eduardo and Avila, Sandra},
  journal={arXiv preprint arXiv:2510.13921},
  year={2025}
}

About

This repository contains the code to reproduce the work "Weight Weaving: Parameter Pooling for Data-Free Model Merging", accepted at the UniReps workshop @ NeurIPS 2025.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published