Does anyone have advice on how to optimize the expectation of a noisy function? The naive approach I’ve used is to sample the function for a given parameter a decent number of times, average those together, and hope the result is close enough to stand in for the true objective function. This seems really wasteful though.
Most of the algorithms I’m coming (like modelling the objective function with gaussian process regression) would be useful, but are more high-powered than I need. Any simple techniques better than the naive approach? Any recommendations among sophisticated approaches?
There are some techniques that can be used with simulated annealing to deal with noise in the evaluation of the objective function. See Section 3 of Branke et al (2008) for a quick overview of proposed methods (they also propose new techniques in that paper). Most of these techniques come with the usual convergence guarantees that are associated with simulated annealing (but there are of course performance penalties in dealing with noise).
What is the dimensionality of your parameter space? What do you know about the noise? (e.g., if you know that the noise is mostly homoscedastic or if you can parameterize it, then you can probably use this to push the performance of some of the simulated annealing algorithms.)
The parameter space is only two dimensional here, so it’s not hard to eyeball roughly where the minimum is if I sample enough. I can say very little about the noise. I’m more interested being able to approximate the optimum quickly (since simulation time adds up) than hitting it exactly. The approach taken in this paper based on a non-parametric tau test looks interesting.
Not really. In this particular case, I’m minimizing how long it takes a simulation reach one state, so the distribution ends up looking lognormal- or Poisson-ish.
Edit: Seeing your added question, I don’t need an efficient estimator in the usual sense per se. This is more about how to search the parameter space in a reasonable way to find where the minimum is, despite the noise.
Hm. Is the noise magnitude comparable with features in your search space? In other words, can you ignore noise to get a fast lock on a promising section of the space and then start multiple sampling?
Simulated annealing that has been mentioned is a good approach but slow to the extent of being impractical for large search spaces.
Solutions to problems such as yours are rarely general and typically depend on the specifics of the problem—essentially it’s all about finding shortcuts.
The parameter space in this current problem is only two dimensional, so I can eyeball a plausible region, sample at a higher rate there, and iterate by hand. In another project, I had something with an very high dimensional parameter space, so I figured it’s time I learn more about these techniques.
Any resources you can recommend on this topic then? Is there a list of common shortcuts anywhere?
You may find better ideas under the phrase “stochastic optimization,” but it’s a pretty big field. My naive suggestion (not knowing the particulars of your problem) would be to do a stochastic version of Newton’s algorithm. I.e. (1) sample some points (x,y) in the region around your current guess (with enough spread around it to get a slope and curvature estimate). Fit a locally weighted quadratic regression through the data. Subtract some constant times the identity matrix from the estimated Hessian to regularize it; you can choose the constant (just) big enough to enforce that the move won’t exceed some maximum step size. Set your current guess to the maximizer of the regularized quadratic. Repeat re-using old data if convenient.
Does anyone have advice on how to optimize the expectation of a noisy function? The naive approach I’ve used is to sample the function for a given parameter a decent number of times, average those together, and hope the result is close enough to stand in for the true objective function. This seems really wasteful though.
Most of the algorithms I’m coming (like modelling the objective function with gaussian process regression) would be useful, but are more high-powered than I need. Any simple techniques better than the naive approach? Any recommendations among sophisticated approaches?
There are some techniques that can be used with simulated annealing to deal with noise in the evaluation of the objective function. See Section 3 of Branke et al (2008) for a quick overview of proposed methods (they also propose new techniques in that paper). Most of these techniques come with the usual convergence guarantees that are associated with simulated annealing (but there are of course performance penalties in dealing with noise).
What is the dimensionality of your parameter space? What do you know about the noise? (e.g., if you know that the noise is mostly homoscedastic or if you can parameterize it, then you can probably use this to push the performance of some of the simulated annealing algorithms.)
Thanks for the SA paper!
The parameter space is only two dimensional here, so it’s not hard to eyeball roughly where the minimum is if I sample enough. I can say very little about the noise. I’m more interested being able to approximate the optimum quickly (since simulation time adds up) than hitting it exactly. The approach taken in this paper based on a non-parametric tau test looks interesting.
That rather depends on the particulars, for example, do you know (or have good reasons to assume) the characteristics of your noise?
Basically you have a noisy sample and want some kind of an efficient estimator, right?
Not really. In this particular case, I’m minimizing how long it takes a simulation reach one state, so the distribution ends up looking lognormal- or Poisson-ish.
Edit: Seeing your added question, I don’t need an efficient estimator in the usual sense per se. This is more about how to search the parameter space in a reasonable way to find where the minimum is, despite the noise.
Hm. Is the noise magnitude comparable with features in your search space? In other words, can you ignore noise to get a fast lock on a promising section of the space and then start multiple sampling?
Simulated annealing that has been mentioned is a good approach but slow to the extent of being impractical for large search spaces.
Solutions to problems such as yours are rarely general and typically depend on the specifics of the problem—essentially it’s all about finding shortcuts.
The parameter space in this current problem is only two dimensional, so I can eyeball a plausible region, sample at a higher rate there, and iterate by hand. In another project, I had something with an very high dimensional parameter space, so I figured it’s time I learn more about these techniques.
Any resources you can recommend on this topic then? Is there a list of common shortcuts anywhere?
Well, optimization (aka search in parameter space) is a large and popular topic. There are a LOT of papers and books about it.
And sorry, I don’t know of a list of common shortcuts. As I mentioned they really depend on the specifics of the problem.
You may find better ideas under the phrase “stochastic optimization,” but it’s a pretty big field. My naive suggestion (not knowing the particulars of your problem) would be to do a stochastic version of Newton’s algorithm. I.e. (1) sample some points (x,y) in the region around your current guess (with enough spread around it to get a slope and curvature estimate). Fit a locally weighted quadratic regression through the data. Subtract some constant times the identity matrix from the estimated Hessian to regularize it; you can choose the constant (just) big enough to enforce that the move won’t exceed some maximum step size. Set your current guess to the maximizer of the regularized quadratic. Repeat re-using old data if convenient.