Skip to content

Latest commit

 

History

History
86 lines (82 loc) · 6.88 KB

File metadata and controls

86 lines (82 loc) · 6.88 KB

Datasets & Data Preparation

In the process of fine-tuning/linear probing the embedding models, the following dataset can be used:

Dataset Domain Config key Note
Products-10k Packaged goods products_10k
Google Landmarks v2 Landmarks gldv2 cleaned subset of GLDv2 is used.
DeepFashion (Consumer to Shop) Apparel & Accessories deep_fashion
MET Artwork Artwork met_art
Shopee Packaged goods shopee
H&M Personalized Fashion Apparel & Accessories hm
RP2K Packaged goods rp2k
Stanford Online Products Packaged goods sop
Fashion200k Apparel & Accessories fashion200k annotations in csv format (see data/fashion200k_train.csv)
Food Recognition 2022 Food & Dishes food_rec22 dataset must to bee preprocessed according to data/food_rec22_preprocess.py
Stanford Cars Cars stanford_cars annotations in csv format here
DeepFashion2 Apparel & Accessories deep_fashion2 dataset must to bee preprocessed according to data/deep_fashion2_preprocess.py
Food101 Food & Dishes food101 test images are used for training.
Furniture 180 Furniture furniture180 annotations in csv format (see data/furniture180_train.csv)
Storefornts 146 Storefronts storefronts146 annotations in csv format (see data/storefronts146_train.csv)

Download the datasets and place them in a <data_dir> of your choice. The directory structure should look as follows:

<data_dir>/
├── m4d-35k_train.csv
├── products-10k/
│   ├── train
│   └── train.csv
├── google_landmark_recognition_2021/
│   ├── train
│   └── train.csv
├── deepfashion/
│   ├── train
│   └── deepfashion_train.json
├── met_dataset/
│   ├── MET
│   └── ground_truth/MET_database.json
├── shopee/
│   ├── train_imahes
│   └── train.csv
├── hm_personalized_fashion/
│   ├── images
│   └── articles.csv
├── rp2k/
│   ├── train
│   └── train.csv
├── stanford_online_products/
│   ├── <img_dirs>
│   └── Ebay_train.txt
├── fashion200k/
│   ├── women
│   └── fashion200k_train.csv
├── fr22_train_v2/
│   ├── images
│   ├── preprocessed_imgs
│   ├── annotations.json
│   └── train.csv
├── stanford_cars/
│   ├── cars_train
│   └── sc_train.csv
├── deep_fashion2/
│   ├── image
│   ├── annos
│   ├── preprocessed_imgs
│   └── train.csv
├── food-101/
│   ├──images
│   └── meta/test.json
├── furniture_180/
│   ├── <img_dirs>
│   └── furniture180_train.csv
└── storefronts_146/
    ├── <img_dirs>
    └── storefronts146_train.csv

The following parameters in the configuration file in configs/ can be adjusted regarding the training data and data loading:

  • DATASET.names: list of dataset names to be used for training. The dataset names are the config keys from the table above.
  • DATALOADER.batch_size: batch size for training data loading.
  • DATALOADER.num_workers: number of workers for data loading.
  • TRANSFORM.name: name of the transformation to be used for data augmentation, supported are the training transforms from CLIP (openai-clip), OpenCLIP (openclip), and SigLIP (siglip).

Image size and normalization (mean and std from the pre-training dataset) are automatically determined from the pre-trained foundation model used.