This project focuses on building and optimizing deep convolutional neural networks (CNNs) for robust classification across both image and audio domains. By integrating hybrid architectural blocks into custom ResNet, VGG, and Inception models, the system effectively handles diverse datasets such as CIFAR-10 (image classification) and SpeechCommands V0.02 (audio classification).
- Custom CNN Architectures: Implemented enhanced ResNet, VGG, and Inception variants, augmented with hybrid layers for improved feature representation and robustness.
- Cross-Domain Generalization: Trained models to achieve high accuracy on both visual and auditory datasets, enabling effective multi-modal learning.
- Robust Training Pipeline: Developed a flexible pipeline with custom checkpointing, enabling reproducibility, modular training, and efficient evaluation.
- Benchmark-Driven Evaluation: Models were optimized using Cross-Entropy Loss and the Adam Optimizer, reaching competitive benchmarks for both image and audio tasks.
.
├── train.py # Core logic: dataloading, model definitions, training & validation loops
├── __init__.py # Global variable and parameter initialization
├── main.py # Entry point of the pipeline; orchestrates training & evaluation
├── requirements.txt # Python dependenciesmain.py: Drives the overall execution. Calls utility functions fromtrain.py, sets up configurations, and runs training and evaluation.train.py: Implements dataset loading, custom CNNs, training and validation loops, and accuracy tracking.__init__.py: Initializes global variables, constants, and hyperparameters.requirements.txt: Lists all required packages to replicate the environment.
-
CIFAR-10
60,00032×32 color imagesacross10 classes.- Used for evaluating image classification performance.
- Link to dataset
-
SpeechCommands V0.02
- Over
100,000 one-second audio clipsof35 spoken words. - Used for evaluating audio classification robustness.
- Link to dataset
- Over
- Languages:
Python - Frameworks:
PyTorch,TensorFlow - Libraries:
Torchaudio,Torchvision,NumPy,Matplotlib - Development:
Jupyter Notebooks,Google Colab - Concepts:
Deep Learning,CNNsCross-domain learningTraining pipelinesandcheckpointingState-of-the-Art image/audio classification
-
Clone the Repository
git clone https://github.com/imtanmay46/Deep-Learning.git cd Deep-Learning -
Install Dependencies
pip install -r requirements.txt
-
Download Datasets
Although the
train.pyfile handles everything, including the loading of the dataset, but if facing any issues:- Download
CIFAR-10andSpeechCommands V0.02using the links above and do the necessary changes intrain.py.
- Download
-
Run the Training Pipeline
python main.py