Accompanying code to the paper Timbre transfer using image-to-image denoising diffusion implicit models [1].
For any question, please write at luca.comanducci@polimi.it.
Tensorflow (>2.11), librosa, pretty_midi, os, numpy, essentia, frechet_audio_distance
The model is trained using the StarNet dataset, freely available on Zenodo link
- audio_utils.py --> contains shared audio utilities and functions
- params.py --> Contains parameters shared along scripts
- network_lib_attention.py --> Contains Denoising Diffusion Implicit Model Implementation
- DiffTransfer.py --> Actually runs the training, takes the following arguments:
- dataset_train_path: String, path to training data
- desired_instrument: String, name of desired output instrument
- conditioning_instrument: String, name of input instrument
- GPU: number of GPU, in case you have multiple ones
- compute_eval_tracks_mixture.py
- compute_eval_tracks_separate.py
- compute_frechet.py
- compute_jaccard.py
- compute_listening_test_results.py
- preprocess_tracks_listening_test.py
If using this code please cite the following paper
@inproceedings{comanducci2023timbre, author = {Luca Comanducci and Fabio Antonacci and Augusto Sarti}, title = {Timbre Transfer Using Image-to-Image Denoising Diffusion Implicit Models}, booktitle = {Proceedings of the 24th International Society for Music Information Retrieval Conference, {ISMIR} 2023, Milan, Italy, November 5-9, 2023}, pages = {257--263}, year = {2023}, }
[1] Comanducci, Luca, Fabio Antonacci, and Augusto Sarti. "Timbre transfer using image-to-image denoising diffusion models. ISMIR International Society for Music Information Retrieval Conference arXiv