Dynamical-Variational-Autoencoders

This project explores the application of Variational Autoencoders (VAEs) to sequential data, such as time series or audio. Applying VAEs to sequential data presents a specific challenge: standard VAEs assume independent data points, whereas sequential data has strong temporal dependencies. To address this, Recurrent Neural Networks (RNNs) must be integrated into the VAE architecture to capture temporal context.

Project Overview

We replicate and experiment with two specific architectures presented in the paper A Recurrent Latent Variable Model for Sequential Data (arXiv:2008.12595). :

VRNN (Variational Recurrent Neural Network): A model where the VAE is applied at every time step, conditioned on the RNN state.
SRNN (Stochastic Recurrent Neural Network): A hierarchical architecture designed to better separate deterministic and stochastic information.

Results

The code of our experiments is available in the Experiments with VRNN.ipynb notebook, within the "experiments" folder.
Our experiments are presented in the file "Poster (one page presentation of the project)".
A comprehensive report of the project is also available under the name "Report DVAE project".

Dataset

We conducted our experiments on a subset of the VCTK corpus, a speech dataset containing audio recordings from various English speakers. For this project, the raw waveform data (.wav) was converted into Mel-spectrograms, a standard time-frequency representation for audio processing.

Objectives

The primary goals of this implementation were to:

Re-implement Architectures: Build the VRNN and SRNN architectures from scratch using PyTorch.
Hyperparameter Optimization: Train models by selecting optimal parameters, with a specific focus on choice of prior distribution (Gaussian vs. Student-t), duration of the Kullback-Leibler (KL) divergence annealing phase, number of epochs and learning rate scheduling, ...
Reconstruction Evaluation: Assess the quality of audio reconstruction across the different architectures and priors.
Audio Generation: Experiment with generating audio in two modes:
Cold Start: Generating from scratch with no initial context.
Priming: Using a "warm-up" phase where the model is fed a short sequence of real data to initialize the hidden state.
Refinement: Analyze and refine the models and generation process to improve the intelligibility of the output audio.

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
dataset		dataset
experiments		experiments
model		model
utils		utils
LICENSE		LICENSE
Poster (one page presentation of the project).pdf		Poster (one page presentation of the project).pdf
README.md		README.md
Report DVAE project.pdf		Report DVAE project.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dynamical-Variational-Autoencoders

Project Overview

Results

Dataset

Objectives

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Dynamical-Variational-Autoencoders

Project Overview

Results

Dataset

Objectives

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages