Skip to content

TomusD/Music-Genre-Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Music Genre Classifier

A PyTorch-based project to classify music genres from audio clips using neural networks. Different models and feature extraction techniques are explored, starting with a simple Feedforward Neural Network (FNN) on Mel-Frequency Cepstral Coefficients (MFCCs) and progressing to an optimized Convolutional Neural Network (CNN) on Mel Spectrograms.

The model classifies audio into four distinct genres:

  • Blues
  • Classical
  • Hiphop
  • Rock/Metal/Hardrock

Models

1. FNN with MFCC Features

The first model attempts classification using MFCCs as input features.

  • Data: Loads pre-processed files containing MFCC features. Each audio sample is represented by a 1D vector of 26 features.
  • Model: A simple 3-layer Feedforward Neural Network built with Linear layers.
  • Training: The model is trained using Stochastic Gradient Descent (SGD) for 30 epochs.
  • Evaluation: Model performance is evaluated using Accuracy and F1-Score, and a confusion matrix is generated. The best-performing model from the training epochs is saved and evaluated on the test set.

2. CNN with Mel Spectrogram Features

The second model uses Mel Spectrograms ("melgrams"), which treat the audio data as a 2D image-like representation. This allows for the use of Convolutional Neural Networks.

Model Development Process

  1. CNN Model: The model architectural design incorporates:

    • Padding: Padding is added to the convolution layers to preserve feature map dimensions.
    • Max Pooling: Max Pooling layers are added after each convolution to downsample the feature maps and reduce computational load.
  2. Optimizer & Activation Function Tuning:

    • Optimizers: This model is used to benchmark various PyTorch optimizers, including SGD, Adam, AdamW, Adadelta, and others.
    • Activation Functions: Experimentation with 11 different activation functions (e.g., ReLU, LeakyReLU, GELU, SiLU, Mish) to find the best one.

Final Optimized Model

  • Model: A 3-layer CNN with Max Pooling and 4 (dense) layers for classification.
  • Key Components:
    • Activation: Sigmoid-weighted Linear Unit ("SiLU") is used as the primary activation function.
    • Optimizer: Adam is chosen, using Weight Decay for regularization.
    • Scheduler: A Cyclic Learning Rate scheduler is used to dynamically adjust the learning rate during training.

3. Inference (Testing on YouTube)

  1. Download: Uses the pytube library to download a YouTube video's audio stream.
  2. Convert: Uses the pydub library to convert the downloaded audio into a .wav file.
  3. Extract Features: Uses librosa to load the .wav file, segment it into chunks, and generate Mel Spectrograms for each chunk.
  4. Classify: The best model is loaded and performs inference on these new spectrograms.
  5. Visualize: The results are plotted over time, showing the model's predicted genre for each segment of the song, along with a table showing the overall percentage breakdown.

How to Run It

1. Prerequisites

You must have Python installed and different libraries. You can install them via pip:

pip install torch torcheval-nightly
pip install pytube pydub
pip install scikit-learn matplotlib numpy librosa

2. Running the Notebook

  1. Environment: This notebook is designed to run in a Jupyter environment like Google Colab.
  2. Data: Ensure your pre-processed training, validation, and test datasets (MFCCs and Mel Spectrograms) are available and update the paths in the notebook to load them correctly.
  3. Execution: Open the notebook and execute the cells sequentially.
    • The notebook will guide you through loading data, training the initial FNN model, and then building, training, and optimizing the final CNN model.
    • The best-performing models will be saved as .pt files.
  4. Inference: The final sections of the notebook demonstrate how to use the saved models to classify new songs directly from a YouTube URL. You can change the provided URLs to test your own examples.

About

Classifying music genres using Feedforward and Convolutional Neural Networks on MFCC and Mel Spectrogram features.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors