Currently, the gradient of the hyperparameters is computed along the "default axis", one can use the last gradient descent direction to create an orthonormal basis and compute the next gradient along these directions.
This is referred to as the "SmartGradient" approach and is described here: https://arxiv.org/abs/2106.07313
Currently, the gradient of the hyperparameters is computed along the "default axis", one can use the last gradient descent direction to create an orthonormal basis and compute the next gradient along these directions.
This is referred to as the "SmartGradient" approach and is described here: https://arxiv.org/abs/2106.07313