Charlie Steiner comments on Re-Examining LayerNorm

Charlie Steiner 2 Dec 2022 9:17 UTC
3 points
0
I think in feed-forward networks (i.e. they don’t re-use the same neuron multiple times), having to learn all the $k_{i j}$ inhibition coefficients is too much to ask. RNNs have gone in an out of fashion, and maybe they could use something like this (maybe scaled down a little), but you could achieve similar inhibition effects with multiple different architectures—LSTMs already have multiplication built into them, but in a different way. There is not a particularly deep technical reason for different choices.