Donald Hobson comments on Why does gradient descent always work on neural networks?

Donald Hobson 21 May 2022 0:54 UTC
3 points
It doesn’t always work when the goal is to find a good enough answer. When it fails, you fiddle with something and try again. You sure aren’t getting optimal large datasets. On many large problems, each piece of training data is only used once. This means the first few steps are applied to randomness, and the last few steps can only make a tiny change.
Actually, there are momentum methods, ADAM etc that are often used instead of gradient descent.