Skip to content

Vanishing gradients of GANs? #22

@MicPie

Description

@MicPie

I am wrapping my head around the explanation of the vanishing gradients problem of GANs for quite some time:

The current solution pdf document plots the input of the sigma functions over the loss function values to explain the (non-)saturating behavior. However, I am asking myself if that plot just captures the saturation of the sigma function and not the saturation behavior of the G loss function itself.

The NIPS 2016 GAN tutorial shows in figure 16 (p.26) an explanation of the saturating loss without taking the sigma function into account. With this explanation, I guess, the saturation behavior is explained through the gradients for G when G is not (yet) able to generate good fakes and D can easily identify them as fake (x = 0 or close to 0).
See a plot of the saturating and non-saturating loss function and their derivations. There, the saturating loss has a small gradient of around -1 and the saturating loss of -infinity at x = 0.
When I plot the gradients over the training for both loss functions I also get higher gradient means and higher standard deviations for the non-saturating loss when compared to the saturating loss (see notebook).

Maybe I am missing something?

I would be happy if somebody could point me in the right direction.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions