Gradient descent requires somehow computing the gradient. There are both blackbox and non-blackbox ways of doing that, and the blackbox methods are much more expensive.
Backpropagation is the main non-blackbox method for computing the gradient; it looks at the steps used to compute the function and propagates information backward through those steps. If the function takes m steps to compute, backprop takes O(m) to compute the gradient.
The main blackbox method for computing gradients is to evaluate the function itself, then evaluate it with a small change in one coordinate, then evaluate it with a small change in the next coordinate, and so forth—one evaluation for each coordinate. If the function takes m steps to compute and has n inputs, then this takes O(mn).
Do you count gradient descent as a blackbox optimization, and isn’t backpropagation guided by gradient (at least in ANN)?
Gradient descent requires somehow computing the gradient. There are both blackbox and non-blackbox ways of doing that, and the blackbox methods are much more expensive.
Backpropagation is the main non-blackbox method for computing the gradient; it looks at the steps used to compute the function and propagates information backward through those steps. If the function takes m steps to compute, backprop takes O(m) to compute the gradient.
The main blackbox method for computing gradients is to evaluate the function itself, then evaluate it with a small change in one coordinate, then evaluate it with a small change in the next coordinate, and so forth—one evaluation for each coordinate. If the function takes m steps to compute and has n inputs, then this takes O(mn).