Just because you might be wrong about utilities (I assume that’s a possibility that you’re implying) doesn’t mean that you should make the process you use to choose outcomes random.
Yes. What I meant is rather something like in this example:
You have 4 options.
Option A: estimated utility = 10 ± 5
Option B: estimated utility = 5 ± 10
Option C: estimated utility = 3 ± 2
Option D: estimated utility = −10 ± 30
It seems reasonable to not always choose A, and sometimes choose B, and from time to time even D. At least until you gather enough data to improve the accuracy of your estimates.
I expect you can arrive at this solution by carefully calculating probabilities of changing your estimates by various amounts and how much more utility you can get if your estimates change.
There’s been quite a lot of work on this sort of question, under the title of “Multi-armed bandits”. (As opposed to the “one-armed bandits” you find rows and rows of in casinos.)
The multi-armed bandit scenario applies when you are uncertain about the distributions produced by these options, and are going to have lots of interactions with them that you can use to discover more about them while extracting utility.
For a one-shot game, or if those estimated utilities are distributions you know each option will continue to produce every time, you just compute the expected utility and you’re done.
But suppose you know that each produces some distribution of utilities, but you don’t know what it is yet (but e.g. maybe you know they’re all normally distributed and have some guess at the means and variances), and you get to interact with them over and over again. Then you will probably begin by trying them all a few times to get a sense of what they do, and as you learn more you will gradually prioritize maximizing expected-utility-this-turn over knowledge gain (and hence expected utility in the future).
I assume that when you write ’10 +/- 5′, you mean that Option A could have a utility on the open interval with 0 and 10 as lower and upper bounds.
You can transform this into a decision problem under risk. Assuming that, say, in option A, you’re not reasoning as though 6 is more probable than 10 because 6 is closer to 5 than 10 is (your problem statement did not indicate anything like this), then you can assign an expected utility to each option by making an equiprobable prior over the open interval including the set of possible utilities for each action. For example, since there are 10 members in the set defined as the open interval with 0 and 10 as lower and upper bounds, you would assign a probability of 0.1 to each member of the set. Furthermore, the expected utility for each Option is as follows:
However, my guess is that your utilities are secretly dollar values and you have an implicit utility function over outcomes. You can represent this by introducing a term u into the expected utility calculations that weights the outcomes by their real utility. This makes sense in the real world because of things like gambler’s ruin. In a world of perfect emptiness, you have infinite money to lose, so it makes sense to maximize expected value. In the real world, you can run out of money, so evolution might make you loss averse to compensate. This was the original motivation for formulating the notion of expected utility (some quantitative measure of desirableness weighted by probability), as opposed to the earlier notion of expected value (dollar value weighted by probability).
Your analysis misses the point that you may play the game many times and change your estimates as you go.
For the record, 10 ± 5 means an interval from 5 to 15, not 0 to 10, and in any case I intended it as a shorthand for a bell-like distribution with a mean of 10 and a standard deviation of 5.
Ah, okay. I went downstairs for a minute and thought to myself, “Well, the only way I get what he’s saying is if we go up a level and assume that the given utilities are not simply changing, but are changing according to some sort of particular rule.”
Also, I spent a long time writing my reply to your original problem statement, without refreshing the page, so I only read the original comment, not the edit. That might explain why I didn’t immediately notice that you were talking about value of information, if I seemed a little pedantic in my earlier comment with all of the math.
Back to the original point that brought this problem up, what’s going on inside the brain is that the brain has assigned utilities to outcomes, but there’s a tremble on its actions caused by the stochastic nature of neural networks. The brain isn’t so much uncertain about utilities as it is believing that its utility estimates are accurate and randomly not doing what it considers most desirable.
That’s why I wrote, in the original comment:
It just seems interesting to consider the consequences of the assumption that there is a decision-maker without a trembling hand.
Ah, okay. I went downstairs for a minute and thought to myself, “Well, the only way I get what he’s saying is if we go up a level and assume that the given utilities are not simply changing, but are changing according to some sort of particular rule.”
Congratulations on good thinking and attitude :)
Does that make sense?
Yes, I get that. What I meant to suggest to you in the broader picture, is that this “tremble” might be evolution’s way to crudely approximate a fully rational agent, who makes decisions based on VOI.
So it’s not necessarily detrimental to us. Sometimes it might well be.
The main takeaway from all that I have said is it that replacing your intuition with “let’s always take option A because it’s the rational thing to do” just doesn’t do the trick when you play multiple games (as is often the case in real life).
Yes. What I meant is rather something like in this example:
You have 4 options.
Option A: estimated utility = 10 ± 5
Option B: estimated utility = 5 ± 10
Option C: estimated utility = 3 ± 2
Option D: estimated utility = −10 ± 30
It seems reasonable to not always choose A, and sometimes choose B, and from time to time even D. At least until you gather enough data to improve the accuracy of your estimates.
I expect you can arrive at this solution by carefully calculating probabilities of changing your estimates by various amounts and how much more utility you can get if your estimates change.
There’s been quite a lot of work on this sort of question, under the title of “Multi-armed bandits”. (As opposed to the “one-armed bandits” you find rows and rows of in casinos.)
Your response is very different from mine, so I’m wondering if I’m wrong.
The multi-armed bandit scenario applies when you are uncertain about the distributions produced by these options, and are going to have lots of interactions with them that you can use to discover more about them while extracting utility.
For a one-shot game, or if those estimated utilities are distributions you know each option will continue to produce every time, you just compute the expected utility and you’re done.
But suppose you know that each produces some distribution of utilities, but you don’t know what it is yet (but e.g. maybe you know they’re all normally distributed and have some guess at the means and variances), and you get to interact with them over and over again. Then you will probably begin by trying them all a few times to get a sense of what they do, and as you learn more you will gradually prioritize maximizing expected-utility-this-turn over knowledge gain (and hence expected utility in the future).
I assume that when you write ’10 +/- 5′, you mean that Option A could have a utility on the open interval with 0 and 10 as lower and upper bounds.
You can transform this into a decision problem under risk. Assuming that, say, in option A, you’re not reasoning as though 6 is more probable than 10 because 6 is closer to 5 than 10 is (your problem statement did not indicate anything like this), then you can assign an expected utility to each option by making an equiprobable prior over the open interval including the set of possible utilities for each action. For example, since there are 10 members in the set defined as the open interval with 0 and 10 as lower and upper bounds, you would assign a probability of 0.1 to each member of the set. Furthermore, the expected utility for each Option is as follows:
A = (0*0.1) + (1*0.1) + (2*0.1) + (3*0.1) + (4*0.1) + (5*0.1) +(6*0.1) + (7*0.1) + (8*0.1) + (9*0.1) + (10*0.1) = 1.5
B = 0
C = 0.3
D = −61
The expected utility formalism prescribes A. Choosing any other option violates the Von Neumann-Morgenstern axioms.
However, my guess is that your utilities are secretly dollar values and you have an implicit utility function over outcomes. You can represent this by introducing a term u into the expected utility calculations that weights the outcomes by their real utility. This makes sense in the real world because of things like gambler’s ruin. In a world of perfect emptiness, you have infinite money to lose, so it makes sense to maximize expected value. In the real world, you can run out of money, so evolution might make you loss averse to compensate. This was the original motivation for formulating the notion of expected utility (some quantitative measure of desirableness weighted by probability), as opposed to the earlier notion of expected value (dollar value weighted by probability).
Your analysis misses the point that you may play the game many times and change your estimates as you go.
For the record, 10 ± 5 means an interval from 5 to 15, not 0 to 10, and in any case I intended it as a shorthand for a bell-like distribution with a mean of 10 and a standard deviation of 5.
Yeah, I parsed it as 5 +/- 5 some how. Might have messed up the other ones too.
Wouldn’t you just maximize expected utility in each iteration, regardless of what estimates are given in each iteration?
You would indeed maximize EV in each iteration, but this EV would also include a factor from value of information.
Ah, okay. I went downstairs for a minute and thought to myself, “Well, the only way I get what he’s saying is if we go up a level and assume that the given utilities are not simply changing, but are changing according to some sort of particular rule.”
Also, I spent a long time writing my reply to your original problem statement, without refreshing the page, so I only read the original comment, not the edit. That might explain why I didn’t immediately notice that you were talking about value of information, if I seemed a little pedantic in my earlier comment with all of the math.
Back to the original point that brought this problem up, what’s going on inside the brain is that the brain has assigned utilities to outcomes, but there’s a tremble on its actions caused by the stochastic nature of neural networks. The brain isn’t so much uncertain about utilities as it is believing that its utility estimates are accurate and randomly not doing what it considers most desirable.
That’s why I wrote, in the original comment:
Does that make sense?
Congratulations on good thinking and attitude :)
Yes, I get that. What I meant to suggest to you in the broader picture, is that this “tremble” might be evolution’s way to crudely approximate a fully rational agent, who makes decisions based on VOI.
So it’s not necessarily detrimental to us. Sometimes it might well be.
The main takeaway from all that I have said is it that replacing your intuition with “let’s always take option A because it’s the rational thing to do” just doesn’t do the trick when you play multiple games (as is often the case in real life).