FordO comments on Everyday Lessons from High-Dimensional Optimization

FordO 14 Jun 2020 15:30 UTC
3 points
Do you count gradient descent as a blackbox optimization, and isn’t backpropagation guided by gradient (at least in ANN)?
- johnswentworth 14 Jun 2020 16:37 UTC
  3 points
  Parent
  Gradient descent requires somehow computing the gradient. There are both blackbox and non-blackbox ways of doing that, and the blackbox methods are much more expensive.
  Backpropagation is the main non-blackbox method for computing the gradient; it looks at the steps used to compute the function and propagates information backward through those steps. If the function takes m steps to compute, backprop takes O(m) to compute the gradient.
  The main blackbox method for computing gradients is to evaluate the function itself, then evaluate it with a small change in one coordinate, then evaluate it with a small change in the next coordinate, and so forth—one evaluation for each coordinate. If the function takes m steps to compute and has n inputs, then this takes O(mn).