Hi Krzysztof,
When visualizing the distribution of weights and gradients of each tensor over training, I noticed that some the weights don't seem to be updating. E.g. InvertibleConv1x1Layer's U_mat, L_mat, and log_S.

My first thought was that maybe the gradients are too small, but it doesn't look like that's the case:


Weights remain mostly constant:


But gradients are... pretty explosive 😔

I didn't change the core code and used the high-level API, but trained it on a different task and it is plugged into a larger model.
I will try running the original example you provided and report back with that, but in the meantime I was wondering if you (or anyone else) had any early ideas about this. Thanks!
Hi Krzysztof,
When visualizing the distribution of weights and gradients of each tensor over training, I noticed that some the weights don't seem to be updating. E.g.

InvertibleConv1x1Layer'sU_mat,L_mat, andlog_S.My first thought was that maybe the gradients are too small, but it doesn't look like that's the case:




Weights remain mostly constant:
But gradients are... pretty explosive 😔

I didn't change the core code and used the high-level API, but trained it on a different task and it is plugged into a larger model.
I will try running the original example you provided and report back with that, but in the meantime I was wondering if you (or anyone else) had any early ideas about this. Thanks!