I’m not sure if I count as a skeptic, but at least for me the only part of this that I find confusing is SGD not making a difference over random search. The fact that simple functions take up a larger volume in parameter space seems obviously true to me and I can’t really imagine anyone disagreeing with that part (though I’m still quite glad to have actual analysis to back that up).
Pinging you to see what your current thoughts are! I think that if “SGD is basically equivalent to random search” then that has huge, huge implications.
I guess I would say something like: random search is clearly a pretty good first-order approximation, but there are also clearly second-order effects. I think that exactly how strong/important/relevant those second-order effects are is unclear, however, and I remain pretty uncertain there.
I’m not sure if I count as a skeptic, but at least for me the only part of this that I find confusing is SGD not making a difference over random search. The fact that simple functions take up a larger volume in parameter space seems obviously true to me and I can’t really imagine anyone disagreeing with that part (though I’m still quite glad to have actual analysis to back that up).
Pinging you to see what your current thoughts are! I think that if “SGD is basically equivalent to random search” then that has huge, huge implications.
I guess I would say something like: random search is clearly a pretty good first-order approximation, but there are also clearly second-order effects. I think that exactly how strong/important/relevant those second-order effects are is unclear, however, and I remain pretty uncertain there.