Steven Byrnes comments on Re-introducing Selection vs Control for Optimization (Optimizing and Goodhart Effects—Clarifying Thoughts, Part 1)

Steven Byrnes 4 Jul 2019 14:51 UTC
2 points
One way to think about gradient descent is that if there’s an N-dimensional parameter space on which you build a grid of M^N points and do a grid search for the minimum, well, you can accomplish the same selection task with O(M) steps of gradient descent (at least if there’s no local minimum). So should we say does that M steps of gradient descent gives you O(N log M) bits of optimization? Or something like that? I’m not sure.
- Davidmanheim 7 Jul 2019 10:37 UTC
  1 point
  Parent
  I like that intuition overall, but there’s a sense that adaptive search does give far more resolution than grid search, but the analysis seems wrong; if I use gradient descent, by eventual accuracy is the step size near the end, which gives far more precise answers than “only” checking a grid of M^N points spaced equally.