Hah. The dice example and the application of maxent to it comes originally from Jaynes himself, see page 4 of the linked paper.
I’ll try to reformulate the problem without the constraint rule, to clear matters up or maybe confuse them even more. Imagine that, instead of you throwing the die a billion times and obtaining a mean of 3.5, a truthful deity told you that the mean was 3.5. First question: do you think the maxent solution in that case is valid, for some meaning of “valid”? Second question: why do you think it disagrees with Bayesian updating as you throw the die a huge number of times and learn only the mean? Is the information you receive somehow different in quality? Third question: which answer is actually correct, and what does “correct” mean here?
I’m not really qualified to comment on the methodological issues since I have yet to work through the formal meaning of “maximum entropy” approaches. What I know at this stage is the general argument for justifying priors, i.e. that they should in some manner reflect your actual state of knowledge (or uncertainty), rather than be tainted by preconceptions.
If you appeal to intuitions involving a particular physical object (a die) and simultaneously pick a particular mathematical object (the uniform prior) without making a solid case that the latter is our best representation the former, I won’t be overly surprised at some apparently absurd result.
It’s not clear to me for instance what we take a “possibly biased die” to be. Suppose I have a model that a cubic die is made biased by injecting a very small but very dense object at a particular (x,y,z) coordinate in a cubic volume. Now I can reason based on a prior distribution for (x,y,z) and what probability theory can possibly tell me about the posterior distribution, given a number of throws with a certain mean.
Now a six-sided die is normally symmetrical in such a way that 3 and 4 are on opposite sides, and I’m having trouble even seeing how a die could be biased “towards 3 and 4” under such conditions. Which means a prior which makes that a more likely outcome than a fair die should probably be ruled out by our formalization—or we should also model our uncertainty over how which faces have which numbers.
I’m having trouble even seeing how a die could be biased “towards 3 and 4” under such conditions.
If the die is slightly shorter along the 3-4 axis than along the 1-6 and 2-5 axes, then the 3 and 4 faces will have slightly greater surface area than the other faces.
Our models differ, then: I was assuming a strictly cubic die. So maybe we should also model our uncertainty over the dimensions of the (parallelepipedic) die.
But it seems in any case that we are circling back to the question of model checking, via the requirement that we should first be clear about what our uncertainty is about.
Hah. The dice example and the application of maxent to it comes originally from Jaynes himself, see page 4 of the linked paper.
I’ll try to reformulate the problem without the constraint rule, to clear matters up or maybe confuse them even more. Imagine that, instead of you throwing the die a billion times and obtaining a mean of 3.5, a truthful deity told you that the mean was 3.5. First question: do you think the maxent solution in that case is valid, for some meaning of “valid”? Second question: why do you think it disagrees with Bayesian updating as you throw the die a huge number of times and learn only the mean? Is the information you receive somehow different in quality? Third question: which answer is actually correct, and what does “correct” mean here?
I think I’d answer, “the mean of what?” ;)
I’m not really qualified to comment on the methodological issues since I have yet to work through the formal meaning of “maximum entropy” approaches. What I know at this stage is the general argument for justifying priors, i.e. that they should in some manner reflect your actual state of knowledge (or uncertainty), rather than be tainted by preconceptions.
If you appeal to intuitions involving a particular physical object (a die) and simultaneously pick a particular mathematical object (the uniform prior) without making a solid case that the latter is our best representation the former, I won’t be overly surprised at some apparently absurd result.
It’s not clear to me for instance what we take a “possibly biased die” to be. Suppose I have a model that a cubic die is made biased by injecting a very small but very dense object at a particular (x,y,z) coordinate in a cubic volume. Now I can reason based on a prior distribution for (x,y,z) and what probability theory can possibly tell me about the posterior distribution, given a number of throws with a certain mean.
Now a six-sided die is normally symmetrical in such a way that 3 and 4 are on opposite sides, and I’m having trouble even seeing how a die could be biased “towards 3 and 4” under such conditions. Which means a prior which makes that a more likely outcome than a fair die should probably be ruled out by our formalization—or we should also model our uncertainty over how which faces have which numbers.
If the die is slightly shorter along the 3-4 axis than along the 1-6 and 2-5 axes, then the 3 and 4 faces will have slightly greater surface area than the other faces.
Our models differ, then: I was assuming a strictly cubic die. So maybe we should also model our uncertainty over the dimensions of the (parallelepipedic) die.
But it seems in any case that we are circling back to the question of model checking, via the requirement that we should first be clear about what our uncertainty is about.
Cyan, I was hoping you’d show up. What do you think about this whole mess?
I find myself at a loss to give a brief answer. Can you ask a more specific question?