It’s a good caution, but I do see more bumps with Adam than with SGD across a number of random initializations.
(with the caveat that this is still “I tried a few times” and not any quantitative study)
It’s a good caution, but I do see more bumps with Adam than with SGD across a number of random initializations.
(with the caveat that this is still “I tried a few times” and not any quantitative study)