Ahhh my bad. I thought that SA explored multiple solution points at once. Of course, if the stochastic jumps are similar, it could end up being the same exploration path eventually, just serial vs parallel, although that seems to make vanilla SA incredibly non-useful in the modern era of parallel computing.
Also, SA is specifically useful when the original objective function can be easily evaluated, but its derivatives are too expensive. With SA, you don’t need to compute derivatives or the normalizing constants. You can try quasi-Newton methods and other approaches, but even these are computationally intractable in many cases. There are certain ways in which a problem can be non-convex that makes SA an attractive alternative. In principle, this could be true even in low dimensional problems, meaning that it’s not at all just a question of parallelism. Another thing worth mentioning is that SA lends itself very well to the GPU in some cases where traditional optimizers don’t.
Ahhh my bad. I thought that SA explored multiple solution points at once. Of course, if the stochastic jumps are similar, it could end up being the same exploration path eventually, just serial vs parallel, although that seems to make vanilla SA incredibly non-useful in the modern era of parallel computing.
That is what parallel tempering is for.
Also, SA is specifically useful when the original objective function can be easily evaluated, but its derivatives are too expensive. With SA, you don’t need to compute derivatives or the normalizing constants. You can try quasi-Newton methods and other approaches, but even these are computationally intractable in many cases. There are certain ways in which a problem can be non-convex that makes SA an attractive alternative. In principle, this could be true even in low dimensional problems, meaning that it’s not at all just a question of parallelism. Another thing worth mentioning is that SA lends itself very well to the GPU in some cases where traditional optimizers don’t.