This project introduces a custom loss function for Knowledge Distillation
(KD) technique to improve the training of the student model.
In this project a custom loss function using the standard cross
entropy loss function and confidence penalty is being created to test in some datasets like
CIFAR-100, Tiny ImageNet, CUB-200.
Student Models Used: ShuffleNetV1, ResNet-18, ResNet-34, ResNet-50, EfficientNet-B0.
Teacher Models Used: VGG13-BN, ResNet50, DeepLabV3+ResNet101 (for segmentation), EfficientNet-B0/B3, ResNet-50.