I think this is quite strong evidence that I was not taught the correct usage of vanishing gradients.
I’m very confused. The way I’m reading the quote you provided, it says ReLu works better because it doesn’t have the gradient vanishing effect that sigmoid and tanh have.
I’m very confused. The way I’m reading the quote you provided, it says ReLu works better because it doesn’t have the gradient vanishing effect that sigmoid and tanh have.
Interesting. I just re-read it and you are completely right. Well I wonder how that interacts with what I said above.