Klustering Images for Subset Selection. Research on the challenge of selecting a representative training subset for convolutional neural classifiers, employing methods that leverage clustering and difficulty estimation for specialized sample selection. Explore various techniques, experiments, and tools designed to address this problem.
In this project, our goal is to design and implement efficient methods for selecting a training subset from a larger image dataset. The chosen subsets should be relatively small while maintaining or slightly compromising classification accuracy.
To accomplish this, we leverage DINOv2 as our advanced feature extractor, providing vectorized representations of the images. Subsequently, we employ the FAISS library to execute multiple similarity searches and clusterings, determining the most representative subset of elements.
The repository includes methods and experiments validating their performance, positioning the project as both a practical tool and a versatile research framework.
P.S. We are aware it's clustering, not klustering.
To use our project, follow these simple steps:
Visit the PyTorch site to select your system configuration. We recommend using conda and the Nightly build:
conda install pytorch-nightly::pytorch torchvision torchaudio -c pytorch-nightlygit clone https://github.com/Drske/KISS.gitpip install .For an editable installation, run:
pip install -e .Finally, execute the kiss hello command to verify if the package has been installed correctly.
kiss helloThis directory serves as a placeholder for any dataset that should be downloaded while using the repository.
All experiment configurations and results are stored here.
Contains the source code of the kiss project.
Explore useful notebooks, including examples or prototypes.
All pretrained model weights are stored here.
Our package has been released under the MIT License. Refer to LICENSE for more details.