Single neurons cannot represent two distinct kind of quantities, as would be required to do backprop (the presence of features and gradients for training).
I don’t understand why can’t you just have some neurons which represent the former, and some neurons which represent the latter?
The drop-out algorithm (which has been very popular, though it recently seems to have been largely replaced by batch normalisation).
Do you have any particular source for dropout being replaced by batch normalisation, or is it an impression from the papers you’ve been reading?
I don’t understand why can’t you just have some neurons which represent the former, and some neurons which represent the latter?
Because people thought you needed the same weights to 1) transport the gradients back, 2) send the activations forward. Having two distinct networks with the same topology and getting the weights to match was known as the “weight transport problem”. See Grossberg, S. 1987. Competitive learning: From interactive activation to adaptive resonance. Cognitive science 11(1):23–63.
Do you have any particular source for dropout being replaced by batch normalisation, or is it an impression from the papers you’ve been reading?
I don’t understand why can’t you just have some neurons which represent the former, and some neurons which represent the latter?
Do you have any particular source for dropout being replaced by batch normalisation, or is it an impression from the papers you’ve been reading?
Because people thought you needed the same weights to 1) transport the gradients back, 2) send the activations forward. Having two distinct networks with the same topology and getting the weights to match was known as the “weight transport problem”. See Grossberg, S. 1987. Competitive learning: From interactive activation to adaptive resonance. Cognitive science 11(1):23–63.
The latter.