In doing the Bayesian updating method, you assumed that the die has some weights, and that the die having different weights are events in event-space. This assumption is a very good one for a physical die, and the nature of the assumption is most obvious from the Kolmogorov and Savage perspectives.
Then, when translating the information that the expected roll was 5⁄2, you translated it as “the sum of weight 1 + 2 * weight 2 + 3 * weight 3 is equal to 5⁄2.” (Note that this is not necessary! If you’re symmetrically uncertain about the weights, the expected roll can still be 5⁄2. Frequentist intuitions are so sneaky :P )
What does the maximum entropy principle say if we give it that same information? The exact same answer you got! It maximizes entropy over those different possibilities in event-space, and the constraint that the weighted sum of the weights is 5⁄2 is interpreted in just the way you’d expect, leaving a straight line of possibilities in event-space with equal weights. Thus, maxent gives the same answer as Bayes’ theorem for this question, and it certainly seems like it did so given the same information you used for Bayes’ theorem.
Since it didn’t give the same answer before, this means we’re solving a different set of equations. Different equations means different information.
The state of information that I use in the post is different because we have no knowledge that the probabilities comes from some physical process with different weights. No physical events at all are entangled with the probabilities. It’s obvious why this is unintuitive—any die has some physical weights underlying it. So calling our unknown number “the roll of a die” is actually highly misleading. My bad on that one—it looks like christopherj’s concerns about the example being unrealistic were totally legit.
However, that doesn’t mean that we’ll never see our maximum entropy result in the physical world. Suppose that I started not knowing that the expected roll of the die was 5⁄2. And then someone offered to repeat not just “rolling the die,” but to repeat experiments with equivalent states of knowledge many times. And then what they’ll do is after 1000 repeats of experiments with the same state of knowledge, is if the average roll was really close to 5⁄2, they’ll stop, but if the average roll wasn’t 5⁄2 they’ll try again until it is.
Since the probability given my state of knowledge is 1⁄3, I expect a repeat of many experiments with the same state of knowledge to be like a rolling a fair die many times, then only keeping ensembles with average 5⁄2. Then, if I look at this ensemble that represents your state of knowledge except for happening to have average roll 5⁄2, I will see a maximum entropy distribution of rolls. (proof left as an exercise :P ) This physical process encapsulates the information stated in the post, in a way that rolling a die whose weights are different physical events does not.
Okay, I have an answer for you.
In doing the Bayesian updating method, you assumed that the die has some weights, and that the die having different weights are events in event-space. This assumption is a very good one for a physical die, and the nature of the assumption is most obvious from the Kolmogorov and Savage perspectives.
Then, when translating the information that the expected roll was 5⁄2, you translated it as “the sum of weight 1 + 2 * weight 2 + 3 * weight 3 is equal to 5⁄2.” (Note that this is not necessary! If you’re symmetrically uncertain about the weights, the expected roll can still be 5⁄2. Frequentist intuitions are so sneaky :P )
What does the maximum entropy principle say if we give it that same information? The exact same answer you got! It maximizes entropy over those different possibilities in event-space, and the constraint that the weighted sum of the weights is 5⁄2 is interpreted in just the way you’d expect, leaving a straight line of possibilities in event-space with equal weights. Thus, maxent gives the same answer as Bayes’ theorem for this question, and it certainly seems like it did so given the same information you used for Bayes’ theorem.
Since it didn’t give the same answer before, this means we’re solving a different set of equations. Different equations means different information.
The state of information that I use in the post is different because we have no knowledge that the probabilities comes from some physical process with different weights. No physical events at all are entangled with the probabilities. It’s obvious why this is unintuitive—any die has some physical weights underlying it. So calling our unknown number “the roll of a die” is actually highly misleading. My bad on that one—it looks like christopherj’s concerns about the example being unrealistic were totally legit.
However, that doesn’t mean that we’ll never see our maximum entropy result in the physical world. Suppose that I started not knowing that the expected roll of the die was 5⁄2. And then someone offered to repeat not just “rolling the die,” but to repeat experiments with equivalent states of knowledge many times. And then what they’ll do is after 1000 repeats of experiments with the same state of knowledge, is if the average roll was really close to 5⁄2, they’ll stop, but if the average roll wasn’t 5⁄2 they’ll try again until it is.
Since the probability given my state of knowledge is 1⁄3, I expect a repeat of many experiments with the same state of knowledge to be like a rolling a fair die many times, then only keeping ensembles with average 5⁄2. Then, if I look at this ensemble that represents your state of knowledge except for happening to have average roll 5⁄2, I will see a maximum entropy distribution of rolls. (proof left as an exercise :P ) This physical process encapsulates the information stated in the post, in a way that rolling a die whose weights are different physical events does not.