Skip to content

ebertv/deep-audio-transcription

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

deep-audio-transcription

warning

This code is under construction. The code is not cleaned up and well documented at this point in the project and the progress. It will be eventually, but for now, it is not.

abstract

Transcription is the act of converting something to a symbolic format. For musical transcriptions, that format is MIDI. This project aims to use a recurrent neural network to convert an audio file of a solo piano performance into a MIDI file. Specifically, a bidirectional long short term memory (BLSTM) neural network is used. This project initially aims to simply produce a symbolic representation of a piano audio, but this project can have applications in the area of sheet music creation for any sound. The neural network was trained using twelve minutes worth of audio and MIDI files of the same performance, aligned to within 3 milliseconds. The network was trained only on this single song as a proof of concept for the architecture of the neural network, and the plausibility of the project as a whole. Two different versions of the file were run through the network, one version with velocity, and one version without. The results show promise for the use of a BLSTM neural network in the transcription of audio. Both versions, although not perfect, had similarities to the expected output in terms of note location and general pitch. The version with velocity had similarities in volume as well. The next step is to upscale the neural network to be trained on over 200 hours worth of data, and to increase the similarity between the true and the expected output.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages