The substantive point here isn’t about EU calculations per se. Running a full analysis of everything that might happen and doing an EU calculation on that basis is fine, and I don’t think the OP disputes this.
The subtlety is about what numerical data can formally represent your full state of knowledge. The claim is that a mere probability of getting the $2 payout does not. It’s the case that on the first use of a box, the probability of the payout given its colour is 0.45 regardless of the colour.
However, if you merely hold onto that probability, then if you put in a coin and so learn something about the boxes you can’t update that probability to figure out what the probability of payout for the second attempt is. You need to go back and also remember whether the box is green or brown. The point of Jaynes and the A_p distribution is that it actually does screen off all other information. If you keep track of it you never need to worry about remembering the colour of the box, or the setup of the experiment. Just this “meta-distribution”.
The subtlety is about what numerical data can formally represent your full state of knowledge. The claim is that a mere probability of getting the $2 payout does not.
However, a single probability for each outcome given each strategy is all the information needed. The problem is not with using single probabilities to represent knowledge about the world, it’s the straw math that was used to represent the technique. To me, this reasoning is equivalent to the following:
“You work at a store where management is highly disorganized. Although they precisely track the number of days you have worked since the last payday, they never remember when they last paid you, and thus every day of the work week has a 1⁄5 chance of being a payday. For simplicity’s sake, let’s assume you earn $100 a day.
You wake up on Monday and do the following calculation: If you go in to work, you have a 1⁄5 chance of being paid. Thus the expected payoff of working today is $20, which is too low for it to be worth it. So you skip work. On Tuesday, you make the same calculation, and decide that it’s not worth it to work again, and so you continue forever.
I visit you and immediately point out that you’re being irrational. After all, a salary of $100 a day clearly is worth it to you, yet you are not working. I look at your calculations, and immediately find the problem: You’re using a single probability to represent your expected payoff from working! I tell you that using a meta-probability distribution fixes this problem, and so you excitedly scrap your previous calculations and set about using a meta-probability distribution instead. We decide that a Gaussian sharply peaked at 0.2 best represents our meta-probability distribution, and I send you on your way.”
Of course, in this case, the meta-probability distribution doesn’t change anything. You still continue skipping work, because I have devised the hypothetical situation to illustrate my point (evil laugh). The point is that in this problem the meta-probability distribution solves nothing, because the problem is not with a lack of meta-probability, but rather a lack of considering future consequences.
In both the OPs example and mine, the problem is that the math was done incorrectly, not that you need meta-probabilities. As you said, meta-probabilities are a method of screening off additional labels on your probability distributions for a particular class of problems where you are taking repeated samples that are entangled in a very particular sort of way. As I said above, I appreciate the exposition of meta-probabilities as a tool, and your comment as well has helped me better understand their instrumental nature, but I take issue with what sort of tool they are presented as.
If you do the calculations directly with the probabilities, your calculation will succeed if you do the math right, and fail if you do the math wrong. Meta-probabilities are a particular way of representing a certain calculation that succeed and fail on their own right. If you use them to represent the correct direct probabilities, you will get the right answer, but they are only an aid in the calculation, they never fix any problem with direct probability calculations. The fixing of the calculation and the use of probabilities are orthogonal issues.
To make a blunt analogy, this is like someone trying to plug an Ethernet cable into a phone jack, and then saying “when Ethernet fails, wifi works”, conveniently plugging in the wifi adapter correctly.
The key of the dispute in my eyes is not whether wifi can work for certain situations, but whether there’s anything actually wrong with Ethernet in the first place.
So, my observation is that without meta-distributions (or A_p), or conditioning on a pile of past information (and thus tracking /more/ than just a probability distribution over current outcomes), you don’t have the room in your knowledge to be able to even talk about sensitivity to new information coherently. Once you can talk about a complete state of knowledge, you can begin to talk about the utility of long term strategies.
For example, in your example, one would have the same probability of being paid today if 20% of employers actually pay you every day, whilst 80% of employers never paid you. But in such an environment, it would not make sense to work a second day in 80% of cases. The optimal strategy depends on what you know, and to represent that in general requires more than a straight probability.
There are different problems coming from the distinction between choosing a long term policy to follow, and choosing a one shot action. But we can’t even approach this question in general unless we can talk sensibly about a sufficient set of information to keep track of about. There are two distinct problems, one prior to the other.
Jaynes does discuss a problem which is closer to your concerns (that of estimating neutron multiplication in a 1-d experiment 18.15, pp579. He’s comparing two approaches, which for my purposes differ in their prior A_p distribution.
Jeremy, I think the apparent disagreement here is due to unclarity about what the point of my argument was. The point was not that this situation can’t be analyzed with decision theory; it certainly can, and I did so. The point is that different decisions have to be made in two situations where the probabilities are the same.
Your discussion seems to equate “probability” with “utility”, and the whole point of the example is that, in this case, they are not the same.
While there are sets of probabilities which by themselves are not adequate to capture the information about a decision, there always is a set of probabilities which is adequate to capture the information about a decision.
In that sense I do not see your article as an argument against using probabilities to represent decision information, but rather a reminder to use the correct set of probabilities.
In that sense I do not see your article as an argument against using probabilities to represent decision information, but rather a reminder to use the correct set of probabilities.
My understanding of Chapman’s broader point (which may differ wildly from his understanding) is that determining which set of probabilities is correct for a situation can be rather hard, and so it deserves careful and serious study from people who want to think about the world in terms of probabilities.
Thanks, Jonathan, yes, that’s how I understand it.
Jaynes’ discussion motivates A_p as an efficiency hack that allows you to save memory by forgetting some details. That’s cool, although not the point I’m trying to make here.
The substantive point here isn’t about EU calculations per se. Running a full analysis of everything that might happen and doing an EU calculation on that basis is fine, and I don’t think the OP disputes this.
The subtlety is about what numerical data can formally represent your full state of knowledge. The claim is that a mere probability of getting the $2 payout does not. It’s the case that on the first use of a box, the probability of the payout given its colour is 0.45 regardless of the colour.
However, if you merely hold onto that probability, then if you put in a coin and so learn something about the boxes you can’t update that probability to figure out what the probability of payout for the second attempt is. You need to go back and also remember whether the box is green or brown. The point of Jaynes and the A_p distribution is that it actually does screen off all other information. If you keep track of it you never need to worry about remembering the colour of the box, or the setup of the experiment. Just this “meta-distribution”.
However, a single probability for each outcome given each strategy is all the information needed. The problem is not with using single probabilities to represent knowledge about the world, it’s the straw math that was used to represent the technique. To me, this reasoning is equivalent to the following:
“You work at a store where management is highly disorganized. Although they precisely track the number of days you have worked since the last payday, they never remember when they last paid you, and thus every day of the work week has a 1⁄5 chance of being a payday. For simplicity’s sake, let’s assume you earn $100 a day.
You wake up on Monday and do the following calculation: If you go in to work, you have a 1⁄5 chance of being paid. Thus the expected payoff of working today is $20, which is too low for it to be worth it. So you skip work. On Tuesday, you make the same calculation, and decide that it’s not worth it to work again, and so you continue forever.
I visit you and immediately point out that you’re being irrational. After all, a salary of $100 a day clearly is worth it to you, yet you are not working. I look at your calculations, and immediately find the problem: You’re using a single probability to represent your expected payoff from working! I tell you that using a meta-probability distribution fixes this problem, and so you excitedly scrap your previous calculations and set about using a meta-probability distribution instead. We decide that a Gaussian sharply peaked at 0.2 best represents our meta-probability distribution, and I send you on your way.”
Of course, in this case, the meta-probability distribution doesn’t change anything. You still continue skipping work, because I have devised the hypothetical situation to illustrate my point (evil laugh). The point is that in this problem the meta-probability distribution solves nothing, because the problem is not with a lack of meta-probability, but rather a lack of considering future consequences.
In both the OPs example and mine, the problem is that the math was done incorrectly, not that you need meta-probabilities. As you said, meta-probabilities are a method of screening off additional labels on your probability distributions for a particular class of problems where you are taking repeated samples that are entangled in a very particular sort of way. As I said above, I appreciate the exposition of meta-probabilities as a tool, and your comment as well has helped me better understand their instrumental nature, but I take issue with what sort of tool they are presented as.
If you do the calculations directly with the probabilities, your calculation will succeed if you do the math right, and fail if you do the math wrong. Meta-probabilities are a particular way of representing a certain calculation that succeed and fail on their own right. If you use them to represent the correct direct probabilities, you will get the right answer, but they are only an aid in the calculation, they never fix any problem with direct probability calculations. The fixing of the calculation and the use of probabilities are orthogonal issues.
To make a blunt analogy, this is like someone trying to plug an Ethernet cable into a phone jack, and then saying “when Ethernet fails, wifi works”, conveniently plugging in the wifi adapter correctly.
The key of the dispute in my eyes is not whether wifi can work for certain situations, but whether there’s anything actually wrong with Ethernet in the first place.
So, my observation is that without meta-distributions (or A_p), or conditioning on a pile of past information (and thus tracking /more/ than just a probability distribution over current outcomes), you don’t have the room in your knowledge to be able to even talk about sensitivity to new information coherently. Once you can talk about a complete state of knowledge, you can begin to talk about the utility of long term strategies.
For example, in your example, one would have the same probability of being paid today if 20% of employers actually pay you every day, whilst 80% of employers never paid you. But in such an environment, it would not make sense to work a second day in 80% of cases. The optimal strategy depends on what you know, and to represent that in general requires more than a straight probability.
There are different problems coming from the distinction between choosing a long term policy to follow, and choosing a one shot action. But we can’t even approach this question in general unless we can talk sensibly about a sufficient set of information to keep track of about. There are two distinct problems, one prior to the other.
Jaynes does discuss a problem which is closer to your concerns (that of estimating neutron multiplication in a 1-d experiment 18.15, pp579. He’s comparing two approaches, which for my purposes differ in their prior A_p distribution.
It may be helpful to read some related posts (linked by lukeprog in a comment on this post): Estimate stability, and Model Stability in Intervention Assessment, which comments on Why We Can’t Take Expected Value Estimates Literally (Even When They’re Unbiased). The first of those motivates the A_p (meta-probability) approach, the second uses it, and the third explains intuitively why it’s important in practice.
Jeremy, I think the apparent disagreement here is due to unclarity about what the point of my argument was. The point was not that this situation can’t be analyzed with decision theory; it certainly can, and I did so. The point is that different decisions have to be made in two situations where the probabilities are the same.
Your discussion seems to equate “probability” with “utility”, and the whole point of the example is that, in this case, they are not the same.
I guess my position is thus:
While there are sets of probabilities which by themselves are not adequate to capture the information about a decision, there always is a set of probabilities which is adequate to capture the information about a decision.
In that sense I do not see your article as an argument against using probabilities to represent decision information, but rather a reminder to use the correct set of probabilities.
My understanding of Chapman’s broader point (which may differ wildly from his understanding) is that determining which set of probabilities is correct for a situation can be rather hard, and so it deserves careful and serious study from people who want to think about the world in terms of probabilities.
Thanks, Jonathan, yes, that’s how I understand it.
Jaynes’ discussion motivates A_p as an efficiency hack that allows you to save memory by forgetting some details. That’s cool, although not the point I’m trying to make here.