J L comments on Understanding “Deep Double Descent”

J L 16 Mar 2022 7:28 UTC
LW: 3 AF: 2
AF
Apologies if it’s obvious, but why the focus on SGD? I’m assuming it’s not meant as shorthand for other types of optimization algorithms given the emphasis on SGD’s specific inductive bias, and the Deep Double Descent paper mentions that the phenomena hold across most natural choices in optimizers.
- evhub 16 Mar 2022 20:58 UTC
  LW: 4 AF: 3
  AF Parent
  SGD is meant as a shorthand that includes other similar optimizers like Adam.