David Scott Krueger (formerly: capybaralet) comments on Value of Information: Four Examples

David Scott Krueger (formerly: capybaralet)Sep 26, 2016, 10:48 PM
2 points
Does anyone have any insight into VoI plays with Bayesian reasoning?

At a glance, it looks like the VoI is usually not considered from a Bayesian viewpoint, as it is here. For instance, wikipedia says:

“”″ A special case is when the decision-maker is risk neutral where VoC can be simply computed as; VoC = “value of decision situation with perfect information”—“value of current decision situation” “”″

From the perspective of avoiding wireheading, an agent should be incentivized to gain information even when this information decreases its (subjective) “value of decision situation”. For example, consider a bernoulli 2-armed bandit:

If the agent’s prior over the arms is uniform over [0,1], so its current value is .5 (playing arm1), but after many observations, it learns that (with high confidence) arm1 has reward of .1 and arm2 has reward of .2, it should be glad to know this (so it can change to the optimal policy, of playing arm2), BUT the subjective value of this decision situation is less than when it was ignorant, because .2 < .5.
- Vaniver Sep 27, 2016, 9:38 PM
  4 points
  Parent
  There shouldn’t be any conflicts between VoI and Bayesian reasoning; I thought of all of my examples as Bayesian.
  
  From the perspective of avoiding wireheading, an agent should be incentivized to gain information even when this information decreases its (subjective) “value of decision situation”. For example, consider a bernoulli 2-armed bandit:
  
  I don’t think that example describes the situation you’re talking about. Remember that VoI is computed in a forward-looking fashion; when one has a (1, 1) beta distribution over the arm, one thinks it is equally likely that the true propensity of the arm is above .5 and below .5.
  
  The VoI comes into that framework by being the piece that agitates for exploration. If you’ve pulled arm1 seven times and gotten 4 heads and three tails, and haven’t pulled arm2 yet, the expected value of pulling arm1 is higher than pulling arm2 but there’s a fairly substantial chance that arm2 has a higher propensity than arm1. Heuristics that say to do something like pull the level with the higher 95th percentile propensity bake in the VoI from pulling arms with lower means but higher variances.
  
  If, from a forward-looking perspective, one does decrease their subjective value of decision situation by gaining information, then one shouldn’t gain that information. That is, it’s a bad idea to pay for a test if you don’t expect the cost of the test to pay for the additional value. (Maybe you’ll continue to pull arm1, regardless of the results of pulling arm2, as in the case where arm1 has delivered heads 7 times in a row. Then switching means taking a hit for nothing.)
  
  One thing that’s important to remember here is conservation of expected evidence—if I believe now that running an experiment will lead me to believe that arm1 has a propensity of .1 and arm2 has a propensity of .2, then I should already believe those are the propensities of those arms, and so there’s no subjective loss of well-being.