Compositional Generalization Requires Linear, Orthogonal Representations in Vision Embedding Models

This repository contains the empirical pipeline for the pre-trained models experiments (Section 5 of the paper).

Setup

uv sync
source .venv/bin/activate

Data

Dataset paths

Dataset paths are configured in src/complinearity/_config.py. Update these to match your local setup before running any experiments.

Datasets

dSprites: Download our cleaned variant here.
MPI3D: Download real.npz from the MPI3D repository.
PUG Animals: See the PUG benchmark.

Pipeline

The main pipeline has three stages: generate embeddings, train linear probes, and run the factorization analysis. All commands assume you are at the repo root.

1. Generate embeddings

Extracts embeddings from all models (CLIP, OpenCLIP/SigLIP, DINOv2) for one or more datasets:

./src/complinearity/run_all_models.sh clean_dsprites

To run multiple datasets and/or override the output directory:

OUT_DIR="$PWD/outputs/clip_models_laion" \
  ./src/complinearity/run_all_models.sh clean_dsprites mpi3d pug

Supported datasets: clean_dsprites, mpi3d, pug.

2. Train probes

Trains per-concept linear probes on the extracted embeddings:

VAL_SPLITS="0.05" \
  ./src/complinearity/run_all_probes.sh clean_dsprites

Outputs are written to outputs/clip_models_laion/<dataset>/_probes_*.

3. Run factorization analysis

Evaluates the linear factorization and orthogonality metrics from the paper, and extracts ranks per concept:

./src/complinearity/run_all_analyses_simple.sh clean_dsprites

Models

The default model set includes:

Backend	Model	Pretrained weights
CLIP	ViT-B/32	OpenAI
CLIP	ViT-L/14	OpenAI
OpenCLIP	ViT-B-32	LAION-400M
OpenCLIP	ViT-B-16	LAION-400M
OpenCLIP	ViT-L-14	LAION-2B
OpenCLIP	SigLIP-Large-Patch16-256	WebLI
OpenCLIP	SigLIP2-Large-Patch16-384	WebLI
DINOv2	ViT-S/16	—
DINOv2	ViT-B/16	—
DINOv2	ViT-L/16	—

Citation

@misc{uselis2026compositionalgeneralizationrequireslinear,
      title={Compositional Generalization Requires Linear, Orthogonal Representations in Vision Embedding Models}, 
      author={Arnas Uselis and Andrea Dittadi and Seong Joon Oh},
      year={2026},
      eprint={2602.24264},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2602.24264}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
src/complinearity		src/complinearity
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Compositional Generalization Requires Linear, Orthogonal Representations in Vision Embedding Models

Setup

Data

Dataset paths

Datasets

Pipeline

1. Generate embeddings

2. Train probes

3. Run factorization analysis

Models

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Compositional Generalization Requires Linear, Orthogonal Representations in Vision Embedding Models

Setup

Data

Dataset paths

Datasets

Pipeline

1. Generate embeddings

2. Train probes

3. Run factorization analysis

Models

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages