Music Genre Classifier

A PyTorch-based project to classify music genres from audio clips using neural networks. Different models and feature extraction techniques are explored, starting with a simple Feedforward Neural Network (FNN) on Mel-Frequency Cepstral Coefficients (MFCCs) and progressing to an optimized Convolutional Neural Network (CNN) on Mel Spectrograms.

The model classifies audio into four distinct genres:

Blues
Classical
Hiphop
Rock/Metal/Hardrock

Models

1. FNN with MFCC Features

The first model attempts classification using MFCCs as input features.

Data: Loads pre-processed files containing MFCC features. Each audio sample is represented by a 1D vector of 26 features.
Model: A simple 3-layer Feedforward Neural Network built with Linear layers.
Training: The model is trained using Stochastic Gradient Descent (SGD) for 30 epochs.
Evaluation: Model performance is evaluated using Accuracy and F1-Score, and a confusion matrix is generated. The best-performing model from the training epochs is saved and evaluated on the test set.

2. CNN with Mel Spectrogram Features

The second model uses Mel Spectrograms ("melgrams"), which treat the audio data as a 2D image-like representation. This allows for the use of Convolutional Neural Networks.

Model Development Process

CNN Model: The model architectural design incorporates:
- Padding: Padding is added to the convolution layers to preserve feature map dimensions.
- Max Pooling: Max Pooling layers are added after each convolution to downsample the feature maps and reduce computational load.
Optimizer & Activation Function Tuning:
- Optimizers: This model is used to benchmark various PyTorch optimizers, including SGD, Adam, AdamW, Adadelta, and others.
- Activation Functions: Experimentation with 11 different activation functions (e.g., ReLU, LeakyReLU, GELU, SiLU, Mish) to find the best one.

Final Optimized Model

Model: A 3-layer CNN with Max Pooling and 4 (dense) layers for classification.
Key Components:
- Activation: Sigmoid-weighted Linear Unit ("SiLU") is used as the primary activation function.
- Optimizer: Adam is chosen, using Weight Decay for regularization.
- Scheduler: A Cyclic Learning Rate scheduler is used to dynamically adjust the learning rate during training.

3. Inference (Testing on YouTube)

Download: Uses the pytube library to download a YouTube video's audio stream.
Convert: Uses the pydub library to convert the downloaded audio into a .wav file.
Extract Features: Uses librosa to load the .wav file, segment it into chunks, and generate Mel Spectrograms for each chunk.
Classify: The best model is loaded and performs inference on these new spectrograms.
Visualize: The results are plotted over time, showing the model's predicted genre for each segment of the song, along with a table showing the overall percentage breakdown.

How to Run It

1. Prerequisites

You must have Python installed and different libraries. You can install them via pip:

pip install torch torcheval-nightly
pip install pytube pydub
pip install scikit-learn matplotlib numpy librosa

2. Running the Notebook

Environment: This notebook is designed to run in a Jupyter environment like Google Colab.
Data: Ensure your pre-processed training, validation, and test datasets (MFCCs and Mel Spectrograms) are available and update the paths in the notebook to load them correctly.
Execution: Open the notebook and execute the cells sequentially.
- The notebook will guide you through loading data, training the initial FNN model, and then building, training, and optimizing the final CNN model.
- The best-performing models will be saved as .pt files.
Inference: The final sections of the notebook demonstrate how to use the saved models to classify new songs directly from a YouTube URL. You can change the provided URLs to test your own examples.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitattributes		.gitattributes
README.md		README.md
music_genre_classifier.ipynb		music_genre_classifier.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Music Genre Classifier

Models

1. FNN with MFCC Features

2. CNN with Mel Spectrogram Features

Model Development Process

Final Optimized Model

3. Inference (Testing on YouTube)

How to Run It

1. Prerequisites

2. Running the Notebook

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Music Genre Classifier

Models

1. FNN with MFCC Features

2. CNN with Mel Spectrogram Features

Model Development Process

Final Optimized Model

3. Inference (Testing on YouTube)

How to Run It

1. Prerequisites

2. Running the Notebook

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages