Thinking about it a bit, the “global” vs “local” property is not really a property of an optimizer, as it is a combined property of the optimizer and the search space (aka the error landscape). For an entirely arbitrary search space, only the exhaustive search and equivalents will guarantee you the global optimum. For simple search spaces (e.g. quadratic optimization) easy “local” things like hill-climbing will guarantee you the global optimum. If your solution is bounded, you always know how close to the theoretical best you are and can adjust the search accordingly (e.g. terminate early), etc.
Yes. In some cases simple (convex) search spaces even permit exact non-iterative analytic solutions which are faster than any iterative approach like SGD (although curiously, they are typically not dramatically faster—as they still need to analyze each example once, and the number of full epochs/iterations in SGD type approaches appears to be sublinear. )
The more advanced methods—natural gradient, hessian free, etc—expend extra computation to analyze the local search space and then use this analysis to make an improved nonlocal jump. This always has extra cost though, so these techniques just end up tending to match SGD in terms of net optimization speed.
Looking at it another way, when a researcher uses something like a convex optimization technique to solve a problem more quickly—what actually happened is the combined human+machine system used a global optimizer to solve the problem, with more of the algorithm running in the human’s brain than on the machine. ;)
Thinking about it a bit, the “global” vs “local” property is not really a property of an optimizer, as it is a combined property of the optimizer and the search space (aka the error landscape). For an entirely arbitrary search space, only the exhaustive search and equivalents will guarantee you the global optimum. For simple search spaces (e.g. quadratic optimization) easy “local” things like hill-climbing will guarantee you the global optimum. If your solution is bounded, you always know how close to the theoretical best you are and can adjust the search accordingly (e.g. terminate early), etc.
Yes. In some cases simple (convex) search spaces even permit exact non-iterative analytic solutions which are faster than any iterative approach like SGD (although curiously, they are typically not dramatically faster—as they still need to analyze each example once, and the number of full epochs/iterations in SGD type approaches appears to be sublinear. )
The more advanced methods—natural gradient, hessian free, etc—expend extra computation to analyze the local search space and then use this analysis to make an improved nonlocal jump. This always has extra cost though, so these techniques just end up tending to match SGD in terms of net optimization speed.
Looking at it another way, when a researcher uses something like a convex optimization technique to solve a problem more quickly—what actually happened is the combined human+machine system used a global optimizer to solve the problem, with more of the algorithm running in the human’s brain than on the machine. ;)