gwern comments on Open thread, Sep. 19 - Sep. 25, 2016

gwern 20 Sep 2016 16:38 UTC
3 points
You want some sort of adaptive or sequential design (right?), so the optimal design not being terribly helpful is not surprising: they’re more intended for fixed up-front designing of experiments. They also tend to be oriented towards overall information or reduction of variance, which doesn’t necessarily correspond to your loss function. Having priors affects the optimal design somewhat (usually, you can spend fewer datapoints on the variables with prior information; for a Bayesian experimental design, you can simulate a set of parameters from your priors and then simulate drawing n datapoints with a particular experimental design, fit the model, find your loss or your entropy/variance, record the loss/design, and repeat many times; then find the design with the best average loss.).

If you are running the learning material experiment indefinitely and want to maximize cumulative test scores, then it’s a multi-armed bandit and so Thompson sampling on a factorial Bayesian model will work well & handle your 3 desiderata: you set your informative priors on each learning material, model as a linear model (with interactions?), and Thompson sample from the model+data.

If you want to find what set of learning materials is optimal as fast as possible by the end of your experiment, then that’s the ‘best-arm identification’ multi-armed bandit problem. You can do a kind of Thompson sampling there too: best-arm Thompson sampling: http://imagine.enpc.fr/publications/papers/COLT10.pdf https://www.escholar.manchester.ac.uk/api/datastream?publicationPid=uk-ac-man-scw:227658&datastreamId=FULL-TEXT.PDF http://nowak.ece.wisc.edu/bestArmSurvey.pdf http://arxiv.org/pdf/1407.4443v1.pdf https://papers.nips.cc/paper/4478-multi-bandit-best-arm-identification.pdf One version goes: with the full posteriors, find the action A with the best expected loss; for all the other actions B..Z, Thompson sample their possible value; take the action with the best loss out of A..Z. This explores the other arms in proportion to their remaining chance of being the best arm, better than A, while firming up the estimate of A’s value.
- MattG2 22 Sep 2016 19:17 UTC
  1 point
  Parent
  
  You want some sort of adaptive or sequential design (right?), so the optimal design not being terribly helpful is not surprising: they’re more intended for fixed up-front designing of experiments.
  
  So after looking at the problem I’m actually working on, I realize an adaptive/sequential design isn’t really what I’m after.
  
  What I really want is a fractional factorial model that takes a prior (and minimizes regret between information learned and cumulative score). It seems like the goal of multi-armed bandit is to do exactly that, but I only want to do it once, assuming a fixed prior which doesn’t update over time.
  
  Do you think your monte-carlo Bayesian experimental design is the best way to do this, or can I utilize some of the insights from Thompson sampling to make this process a bit less computationally expensive (which is important for my particular use case)?
  - gwern 23 Sep 2016 16:34 UTC
    1 point
    Parent
    
    but I only want to do it once, assuming a fixed prior which doesn’t update over time.
    
    I still don’t understand what you’re trying to do. If you’re trying to maximize test scores by increasing them through picking textbooks and this is done many times, you want a multi-armed bandit to help you find what is the best textbook over the many students exposed to different combinations. If you are throwing out the information from each batch and assuming the interventions are totally different each time, then your decision is made before you do any learning and your optimal choice is simply whatever your prior says: the value of information is the subsequent decisions it affects, except you’re not updating your prior so the information can’t change any decisions after the first one and is worthless.
    
    Do you think your monte-carlo Bayesian experimental design is the best way to do this, or can I utilize some of the insights from Thompson sampling to make this process a bit less computationally expensive (which is important for my particular use case)?
    
    Dunno. Simulation is the most general way of tackling the problem, which will work for just about anything, but can be extremely computationally expensive. There are many special cases which can reuse computations or have closed-form solutions, but must be considered on a case by case basis.