It should be obvious that SGD over an appropriately general model—with appropriate random inits and continuous restarts with keep the max score solution found—will eventually converge on the global optimum, and will do so in expected time similar or better to any naive brute force search such as SI.
In particular SGD is good at exploiting any local smoothness in solution space.
It should be obvious that SGD over an appropriately general model—with appropriate random inits and continuous restarts with keep the max score solution found—will eventually converge on the global optimum, and will do so in expected time similar or better to any naive brute force search such as SI.
In particular SGD is good at exploiting any local smoothness in solution space.