Bird Audio Classification

Abstract

This project aimed to develop a deep learning model for classifying different species of birds based on audio recordings of their vocalizations. The dataset was obtained from the Kaggle Bird CLEF competition and pre-processed to filter out low-quality audio samples and ensure a sufficient number of samples per bird species. The librosa library was used to extract log mel-spectrogram image representations from the audio files. These 2D spectrograms, which encode the time-frequency patterns of the bird vocalizations, were then normalized. The normalized spectrogram images served as input to a convolutional neural network (CNN) model built using the TensorFlow framework. After training for multiple epochs, the validation accuracy was about 74% and the validation F1 score was 73%; the trained CNN demonstrates the feasibility of using deep learning on audio spectrograms for acoustic bird species classification. Potential improvements could involve data augmentation, regularization, and ensemble methods to better generalize the model's performance across diverse recording conditions.

Techniques Used

Mel-frequency Spectrogram

A Mel-frequency spectrogram is a representation of the spectrum of a signal as it varies over time. It is derived from the traditional spectrogram, which displays the frequency content of a signal over time. However, instead of linearly spaced frequency bins, the mel spectrogram uses frequency bins that are spaced according to the mel scale, which is a perceptual scale of pitches based on human hearing. This scaling is designed to better represent how humans perceive differences in pitch.

Steps to get the Mel spectrogram:

The Short Time Fourier Transform is calculated, and the amplitude is converted to decibels.
Convert frequencies to the Mel scale.
Choose the number of mel bands and construct mel filter banks, which are then applied to the spectrogram.

Convolutional Neural Network

CNNs, or Convolutional Neural Networks, are deep learning architectures particularly effective for image processing tasks. They consist of layers that apply convolution operations to capture features like edges and textures, pooling layers to reduce spatial dimensions, activation functions for non-linearity, and fully connected layers for classification or regression. CNNs excel at automatically learning hierarchical representations from raw data, making them invaluable for tasks such as image classification, object detection, and segmentation, where they have achieved state-of-the-art performance.

Dataset

The dataset consists of 40 bird species. The goal is to extract mel spectrograms from the audio recordings and pass them to the CNN.

Architecture

The convolutional neural network has the following structure:

4 blocks, each consisting of:
- Convolutional layer
- Batch normalization
- Max pooling
Followed by:
- Global average pooling
- Dropout
- Final classification dense layer

Results

after parameter tuning, we get a test accuracy of 83.33% and a test loss of 0.4696.

Conclusion

The deep learning model developed in this project successfully classified different species of birds based on their vocalizations. Using a dataset from Kaggle containing audio recordings of five bird species, we processed the audio data with the Python library librosa, converting the recordings into log mel-spectrogram images to capture the time-frequency characteristics of the bird calls. Finally, we achieve a test accuracy of approximately 83.33%, indicating the effectiveness of using deep learning with audio spectrograms for bird species classification.

References

Mentors

Aryan N Herur
Vaibhav Santhosh

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Scripts		Scripts
docs		docs
photos		photos
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bird Audio Classification

Table of Contents

Abstract

Techniques Used

Mel-frequency Spectrogram

Convolutional Neural Network

Dataset

Architecture

Results

Conclusion

References

Mentors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Bird Audio Classification

Table of Contents

Abstract

Techniques Used

Mel-frequency Spectrogram

Convolutional Neural Network

Dataset

Architecture

Results

Conclusion

References

Mentors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages