Optimized Deep Convolutional Architectures for Cross-Domain Classification Tasks

Project Overview

This project focuses on building and optimizing deep convolutional neural networks (CNNs) for robust classification across both image and audio domains. By integrating hybrid architectural blocks into custom ResNet, VGG, and Inception models, the system effectively handles diverse datasets such as CIFAR-10 (image classification) and SpeechCommands V0.02 (audio classification).

Key Features

Custom CNN Architectures: Implemented enhanced ResNet, VGG, and Inception variants, augmented with hybrid layers for improved feature representation and robustness.
Cross-Domain Generalization: Trained models to achieve high accuracy on both visual and auditory datasets, enabling effective multi-modal learning.
Robust Training Pipeline: Developed a flexible pipeline with custom checkpointing, enabling reproducibility, modular training, and efficient evaluation.
Benchmark-Driven Evaluation: Models were optimized using Cross-Entropy Loss and the Adam Optimizer, reaching competitive benchmarks for both image and audio tasks.

Project Structure

.
├── train.py            # Core logic: dataloading, model definitions, training & validation loops
├── __init__.py         # Global variable and parameter initialization
├── main.py             # Entry point of the pipeline; orchestrates training & evaluation
├── requirements.txt    # Python dependencies

File Descriptions

main.py: Drives the overall execution. Calls utility functions from train.py, sets up configurations, and runs training and evaluation.
train.py: Implements dataset loading, custom CNNs, training and validation loops, and accuracy tracking.
__init__.py: Initializes global variables, constants, and hyperparameters.
requirements.txt: Lists all required packages to replicate the environment.

Datasets

CIFAR-10
- 60,000 32×32 color images across 10 classes.
- Used for evaluating image classification performance.
- Link to dataset
SpeechCommands V0.02
- Over 100,000 one-second audio clips of 35 spoken words.
- Used for evaluating audio classification robustness.
- Link to dataset

Technologies & Concepts

Languages: Python
Frameworks: PyTorch, TensorFlow
Libraries: Torchaudio, Torchvision, NumPy, Matplotlib
Development: Jupyter Notebooks, Google Colab
Concepts:
- Deep Learning, CNNs
- Cross-domain learning
- Training pipelines and checkpointing
- State-of-the-Art image/audio classification

Setup Instructions

Clone the Repository

git clone https://github.com/imtanmay46/Deep-Learning.git
cd Deep-Learning

Install Dependencies
```
pip install -r requirements.txt
```
Download Datasets

Although the train.py file handles everything, including the loading of the dataset, but if facing any issues:
- Download CIFAR-10 and SpeechCommands V0.02 using the links above and do the necessary changes in train.py.
Run the Training Pipeline
```
python main.py
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Optimized Deep Convolutional Architectures for Cross-Domain Classification Tasks

Project Overview

Key Features

Project Structure

File Descriptions

Datasets

Technologies & Concepts

Setup Instructions

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
__init__.py		__init__.py
main.py		main.py
requirements.txt		requirements.txt
train.py		train.py

imtanmay46/Deep-Learning

Folders and files

Latest commit

History

Repository files navigation

Optimized Deep Convolutional Architectures for Cross-Domain Classification Tasks

Project Overview

Key Features

Project Structure

File Descriptions

Datasets

Technologies & Concepts

Setup Instructions

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages