Empirically, the inductive bias that you get when you train with SGD, and similar optimisers, is in fact quite similar to the inductive bias that you would get, if you were to repeatedly re-initialise a neural network until you randomly get a set of weights that yield a low loss. Which optimiser you use does have an effect as well, but this is very small by comparison. See this paper.
Empirically, the inductive bias that you get when you train with SGD, and similar optimisers, is in fact quite similar to the inductive bias that you would get, if you were to repeatedly re-initialise a neural network until you randomly get a set of weights that yield a low loss. Which optimiser you use does have an effect as well, but this is very small by comparison. See this paper.