I think the same argument works if there could be multiple peaks (even if my picture doesn’t cover that case) -- you just need the local properties around the optimum to run things. But in that case you can’t assume a local optimum is a global optimum, so it’s harder to apply.
As you say in many cases we don’t need to worry about these complications, so I haven’t spent too much time on that.
And a monomodal assumption as well.
But many real-world distributions are approximately like that, so its good.
I think the same argument works if there could be multiple peaks (even if my picture doesn’t cover that case) -- you just need the local properties around the optimum to run things. But in that case you can’t assume a local optimum is a global optimum, so it’s harder to apply.
As you say in many cases we don’t need to worry about these complications, so I haven’t spent too much time on that.