(It may learn more about high points, but it must still learn about low points.)
That’s not how Bayesian optimization works. Broadly, the idea is that we use Bayesian optimization when both calculating the value of the target function at a point and calculating its gradient are both expensive or infeasible. Thus, we instead choose points at which to sample the target function, and the samples train a Gaussian process model (or other nonparametric model of functions) that tells us what the function’s surface looks like. In such a procedure, we obtain the best performance by sampling points where either the expected function value or the model’s variance is particularly high. Thus, we choose points that we know are good, or points where we’re very uncertain, but we never specifically search for low points. We’ll probably encounter some when sampling points of great uncertainty, but we didn’t specifically seek them out.
That’s not how Bayesian optimization works. Broadly, the idea is that we use Bayesian optimization when both calculating the value of the target function at a point and calculating its gradient are both expensive or infeasible. Thus, we instead choose points at which to sample the target function, and the samples train a Gaussian process model (or other nonparametric model of functions) that tells us what the function’s surface looks like. In such a procedure, we obtain the best performance by sampling points where either the expected function value or the model’s variance is particularly high. Thus, we choose points that we know are good, or points where we’re very uncertain, but we never specifically search for low points. We’ll probably encounter some when sampling points of great uncertainty, but we didn’t specifically seek them out.