In the process of fine-tuning/linear probing the embedding models, the following dataset can be used:
| Dataset | Domain | Config key | Note |
|---|---|---|---|
| Products-10k | Packaged goods | products_10k |
|
| Google Landmarks v2 | Landmarks | gldv2 |
cleaned subset of GLDv2 is used. |
| DeepFashion (Consumer to Shop) | Apparel & Accessories | deep_fashion |
|
| MET Artwork | Artwork | met_art |
|
| Shopee | Packaged goods | shopee |
|
| H&M Personalized Fashion | Apparel & Accessories | hm |
|
| RP2K | Packaged goods | rp2k |
|
| Stanford Online Products | Packaged goods | sop |
|
| Fashion200k | Apparel & Accessories | fashion200k |
annotations in csv format (see data/fashion200k_train.csv) |
| Food Recognition 2022 | Food & Dishes | food_rec22 |
dataset must to bee preprocessed according to data/food_rec22_preprocess.py |
| Stanford Cars | Cars | stanford_cars |
annotations in csv format here |
| DeepFashion2 | Apparel & Accessories | deep_fashion2 |
dataset must to bee preprocessed according to data/deep_fashion2_preprocess.py |
| Food101 | Food & Dishes | food101 |
test images are used for training. |
| Furniture 180 | Furniture | furniture180 |
annotations in csv format (see data/furniture180_train.csv) |
| Storefornts 146 | Storefronts | storefronts146 |
annotations in csv format (see data/storefronts146_train.csv) |
Download the datasets and place them in a <data_dir> of your choice. The directory structure should look as follows:
<data_dir>/
├── m4d-35k_train.csv
├── products-10k/
│ ├── train
│ └── train.csv
├── google_landmark_recognition_2021/
│ ├── train
│ └── train.csv
├── deepfashion/
│ ├── train
│ └── deepfashion_train.json
├── met_dataset/
│ ├── MET
│ └── ground_truth/MET_database.json
├── shopee/
│ ├── train_imahes
│ └── train.csv
├── hm_personalized_fashion/
│ ├── images
│ └── articles.csv
├── rp2k/
│ ├── train
│ └── train.csv
├── stanford_online_products/
│ ├── <img_dirs>
│ └── Ebay_train.txt
├── fashion200k/
│ ├── women
│ └── fashion200k_train.csv
├── fr22_train_v2/
│ ├── images
│ ├── preprocessed_imgs
│ ├── annotations.json
│ └── train.csv
├── stanford_cars/
│ ├── cars_train
│ └── sc_train.csv
├── deep_fashion2/
│ ├── image
│ ├── annos
│ ├── preprocessed_imgs
│ └── train.csv
├── food-101/
│ ├──images
│ └── meta/test.json
├── furniture_180/
│ ├── <img_dirs>
│ └── furniture180_train.csv
└── storefronts_146/
├── <img_dirs>
└── storefronts146_train.csv
The following parameters in the configuration file in configs/ can be adjusted regarding the training data and data loading:
DATASET.names: list of dataset names to be used for training. The dataset names are theconfig keysfrom the table above.DATALOADER.batch_size: batch size for training data loading.DATALOADER.num_workers: number of workers for data loading.TRANSFORM.name: name of the transformation to be used for data augmentation, supported are the training transforms from CLIP (openai-clip), OpenCLIP (openclip), and SigLIP (siglip).
Image size and normalization (mean and std from the pre-training dataset) are automatically determined from the pre-trained foundation model used.