jacob_cannell comments on Everyday Lessons from High-Dimensional Optimization

jacob_cannell 7 Nov 2023 22:26 UTC
4 points

How much slower is e-coli optimization compared to gradient descent? What’s the cost of experimenting with random directions, rather than going in the “best” direction?

There was some post a bit ago how evolutionary optimization is somehow equivalent to SGD, and I was going to respond no, that can’t be, as it steps in mostly random directions, so at best it’s equivalent to a random forward gradient method: completely different (worse) asymptotic convergence with respect to parameter dimension as you discuss. There’s a reason why SGD methods end up using large batching/momentum to smooth out gradient noise before stepping.
- johnswentworth 8 Nov 2023 1:52 UTC
  2 points
  Parent
  I do still expect that evolutionary optimization is basically-similar to SGD in terms of what kinds of optima they find, and more generally in terms of what their trajectories look like at a course-grained scale. But algorithmically, yeah, SGD should follow that trajectory a lot faster, especially as dimensionality goes up.