There is a single correct distribution for our starting information, which is (1/3,1/3,1/3), the “distribution across possible distributions” is just a delta function there.
Whoa, you think the only correct interpretation of “there’s a die that returns 1, 2, or 3” is to be absolutely certain that it’s fair? Or what do you think a delta function in the distribution space means?
(This will have effects, and they will not be subtle.)
Any non-delta “distribution over distributions” is laden with some model of what’s going on in the die, and is a distribution over parts of that model.
One of the classic examples of this is three interpretations of “randomly select a point from a circle.” You could do this by selecting a angle for a radius uniformly, then selecting a point on that radius uniformly along its length. Or you could do those two steps, and then select a point along the associated chord uniformly at random. Or you could select x and y uniformly at random in a square bounding the circle, and reject any point outside the circle. Only the last one will make all areas in the circle equally likely- the first method will make areas near the center more likely and the second method will make areas near the edge more likely (if I remember correctly).
But I think that it generally is possible to reach consensus on what criterion you want (such as “pick a method such that any area of equal size has equal probability of containing the point you select.”) and then it’s obvious what sort of method you want to use. (There’s a non-rejection sampling way to get the equal area method for the circle, by the way.) And so you probably need to be clever about how you parameterize your distributions, and what priors you put on those parameters, and eventually you do have hyperparameters that functionally have no uncertainty. (This is, for example, seeing a uniform as a beta(1/2,1/2), where you don’t have a distribution on the 1/2s.) But I think this is a reasonable way to go about things.
One of the classic examples of this is three interpretations of “randomly select a point from a circle.”
In a separate comment, Kurros worries about cases with “no preferred parameterisation of the problem”. I have the same worry as both of you, I think. I guess I’m less optimistic about the resolution. The parameterization seems like an empirical rabbit that Jaynes and other descendants of the Principle of Insufficient Reason are trying to pull out of an a priori hat. (See also Seidenfeld .pdf) section 3 on re-partitioning the sample space.)
I’d appreciate it if someone could assuage—or aggravate—this concern. Preferably without presuming quite as much probability and statistics knowledge as Seidenfeld does (that one went somewhat over my head, toward the end).
Whoa, you think the only correct interpretation of “there’s a die that returns 1, 2, or 3” is to be absolutely certain that it’s fair? Or what do you think a delta function in the distribution space means?
I haven’t been able to follow this whole thread of conversation, but I think it’s pretty clear you’re talking about different things here.
Obviously, the long-run frequency distribution of the die can be many different things. One of them, (1/3, 1⁄3, 1⁄3), represents fairness, and is just one among many possibilities.
Equally obviously, the probability distribution that represents rational expectations about the first flip is only one thing. Manfred claims that it’s (1/3, 1⁄3, 1⁄3), which doesn’t represent fairness. It could equally well represent being certain that it’s biased to land on only one side every time, but you have no idea which side.
I think it’s pretty clear you’re talking about different things here.
I thought so too, which is why I asked him what he thought a delta function in the distribution space meant.
One of them, (1/3, 1⁄3, 1⁄3), represents fairness, and is just one among many possibilities.
Right; but putting a delta function there means you’re infinitely certain that’s what it is, because you give probability 0 to all other possibilities.
It could equally well represent being certain that it’s biased to land on only one side every time, but you have no idea which side.
Knowing that the die is completely biased, but not which side it is biased towards, would be represented by three delta functions, at (1,0,0), (0,1,0), and (0,0,1), each with a coefficient of (1/3). This is very different from the uniform case and the delta at (1/3,1/3,1/3) case, as you can see by calculating the posterior distribution for observing that the die rolled a 1.
okay, and you were just trying to make sure that Manfred knows that all this probability-of-distributions speech you’re speaking isn’t, as he seems to think, about the degree-of-belief-in-my-current-state-of-ignorance distribution for the first roll. Gotcha.
Okay… but do we agree that the degree-of-belief distribution for the first roll is (1/3, 1⁄3, 1⁄3), whether it’s a fair die or a completely biased in an unknown way die?
Because I’m pretty sure that’s what Manfred’s talking about when he says
There is a single correct distribution for our starting information, which is (1/3,1/3,1/3),
and I think him going on to say
the “distribution across possible distributions” is just a delta function there.
was a mistake, because you were talking about different things.
EDIT:
I thought so too, which is why I asked him what he thought a delta function in the distribution space meant.
Ah. Yes. Okay. I am literally saying only things that you know, aren’t I. My bad.
Whoa, you think the only correct interpretation of “there’s a die that returns 1, 2, or 3” is to be absolutely certain that it’s fair? Or what do you think a delta function in the distribution space means?
It’s not about if the die is fair—my state of information is fair. Of that it is okay to be certain. Also, I think I figured it out—see my recent reply to Oscar’s parent comment.
Whoa, you think the only correct interpretation of “there’s a die that returns 1, 2, or 3” is to be absolutely certain that it’s fair? Or what do you think a delta function in the distribution space means?
(This will have effects, and they will not be subtle.)
One of the classic examples of this is three interpretations of “randomly select a point from a circle.” You could do this by selecting a angle for a radius uniformly, then selecting a point on that radius uniformly along its length. Or you could do those two steps, and then select a point along the associated chord uniformly at random. Or you could select x and y uniformly at random in a square bounding the circle, and reject any point outside the circle. Only the last one will make all areas in the circle equally likely- the first method will make areas near the center more likely and the second method will make areas near the edge more likely (if I remember correctly).
But I think that it generally is possible to reach consensus on what criterion you want (such as “pick a method such that any area of equal size has equal probability of containing the point you select.”) and then it’s obvious what sort of method you want to use. (There’s a non-rejection sampling way to get the equal area method for the circle, by the way.) And so you probably need to be clever about how you parameterize your distributions, and what priors you put on those parameters, and eventually you do have hyperparameters that functionally have no uncertainty. (This is, for example, seeing a uniform as a beta(1/2,1/2), where you don’t have a distribution on the 1/2s.) But I think this is a reasonable way to go about things.
In a separate comment, Kurros worries about cases with “no preferred parameterisation of the problem”. I have the same worry as both of you, I think. I guess I’m less optimistic about the resolution. The parameterization seems like an empirical rabbit that Jaynes and other descendants of the Principle of Insufficient Reason are trying to pull out of an a priori hat. (See also Seidenfeld .pdf) section 3 on re-partitioning the sample space.)
I’d appreciate it if someone could assuage—or aggravate—this concern. Preferably without presuming quite as much probability and statistics knowledge as Seidenfeld does (that one went somewhat over my head, toward the end).
I haven’t been able to follow this whole thread of conversation, but I think it’s pretty clear you’re talking about different things here.
Obviously, the long-run frequency distribution of the die can be many different things. One of them, (1/3, 1⁄3, 1⁄3), represents fairness, and is just one among many possibilities.
Equally obviously, the probability distribution that represents rational expectations about the first flip is only one thing. Manfred claims that it’s (1/3, 1⁄3, 1⁄3), which doesn’t represent fairness. It could equally well represent being certain that it’s biased to land on only one side every time, but you have no idea which side.
I thought so too, which is why I asked him what he thought a delta function in the distribution space meant.
Right; but putting a delta function there means you’re infinitely certain that’s what it is, because you give probability 0 to all other possibilities.
Knowing that the die is completely biased, but not which side it is biased towards, would be represented by three delta functions, at (1,0,0), (0,1,0), and (0,0,1), each with a coefficient of (1/3). This is very different from the uniform case and the delta at (1/3,1/3,1/3) case, as you can see by calculating the posterior distribution for observing that the die rolled a 1.
okay, and you were just trying to make sure that Manfred knows that all this probability-of-distributions speech you’re speaking isn’t, as he seems to think, about the degree-of-belief-in-my-current-state-of-ignorance distribution for the first roll. Gotcha.
Okay… but do we agree that the degree-of-belief distribution for the first roll is (1/3, 1⁄3, 1⁄3), whether it’s a fair die or a completely biased in an unknown way die?
Because I’m pretty sure that’s what Manfred’s talking about when he says
and I think him going on to say
was a mistake, because you were talking about different things.
EDIT:
Ah. Yes. Okay. I am literally saying only things that you know, aren’t I. My bad.
It’s not about if the die is fair—my state of information is fair. Of that it is okay to be certain. Also, I think I figured it out—see my recent reply to Oscar’s parent comment.