gwern comments on Open thread, Sep. 19 - Sep. 25, 2016

gwern 23 Sep 2016 16:34 UTC
1 point

but I only want to do it once, assuming a fixed prior which doesn’t update over time.

I still don’t understand what you’re trying to do. If you’re trying to maximize test scores by increasing them through picking textbooks and this is done many times, you want a multi-armed bandit to help you find what is the best textbook over the many students exposed to different combinations. If you are throwing out the information from each batch and assuming the interventions are totally different each time, then your decision is made before you do any learning and your optimal choice is simply whatever your prior says: the value of information is the subsequent decisions it affects, except you’re not updating your prior so the information can’t change any decisions after the first one and is worthless.

Do you think your monte-carlo Bayesian experimental design is the best way to do this, or can I utilize some of the insights from Thompson sampling to make this process a bit less computationally expensive (which is important for my particular use case)?

Dunno. Simulation is the most general way of tackling the problem, which will work for just about anything, but can be extremely computationally expensive. There are many special cases which can reuse computations or have closed-form solutions, but must be considered on a case by case basis.