I want to clarify in what domain this theory of batch normalization holds.
The evidence we have is mostly about batch normalization in those types of feedforward neural networks that are often used in 2019, right? So, residual CNNs, VGG-like CNNs, other CNNs, transformers. Maybe other types of feedforward neural networks. But not RNNs.
Has anyone explored the applicability of batch normalization or similar techniques to non neural network functions which we optimize by gradient descent like algorithms? Perhaps to tensor networks?
I want to clarify in what domain this theory of batch normalization holds.
The evidence we have is mostly about batch normalization in those types of feedforward neural networks that are often used in 2019, right? So, residual CNNs, VGG-like CNNs, other CNNs, transformers. Maybe other types of feedforward neural networks. But not RNNs.
Has anyone explored the applicability of batch normalization or similar techniques to non neural network functions which we optimize by gradient descent like algorithms? Perhaps to tensor networks?