Very interesting. I agree that the MEP does not solve everything (though Solomonoff induction does).
The use of the mean is a premise. That is, assuming you know the mean, the Maximum Entropy distribution is the correct distribution. If you know some other measure, then you can find the ME distribution that has that measure. If you don’t know anything about the distribution, then the Maximum Entropy principle still works by giving you the flat prior. If this is over all reals, it’s the “improper” prior, but it’s still the correct one.
Another issue with MEP is that it does not contain any intrinsic method to prevent overfitting.
The MEP doesn’t work if you assume you know statistics that you don’t. Using a thousand statistics from a data sample should not be done because what you measure from the data sample aren’t exactly the statistics from the true distribution. If you use the statistics that you do know, then the MEP is actually the exactly non-overfitting principle—it has exactly the information that you gave it.
The difficulty is in actually knowing any given statistic. Assuming you know one for the sake of actually getting anything done is where subjectivity comes in.
The MEP doesn’t work if you assume you know statistics that you don’t. Using a thousand statistics from a data sample should not be done because what you measure from the data sample aren’t exactly the statistics from the true distribution.
Right, but what people use the MEP for in practice is to do statistical modeling: one has a data set of outcomes and attempts to build a statistical model of it. So you never know any statistic—even the mean—with absolute confidence.
In the phrase “the correct one”, I have a problem with the word “the”. See the discussion of the Bertrand paradox in krey’s links.
For a specific example: I want to set a prior (an improper prior is okay!) for a constant in an Arrhenius equation for a chemical reaction. Oversimplified the equation looks like “r = A * exp(T/T0)”. Oversimplify more, and pretend that T0 is known but I know nothing about A. Do I set a flat prior on A? But what if I instead chose to write the equation as “r = exp(T/T0 + a)”. It’s the same equation; A = exp(a). But the flat prior on a is not equivalent to the flat prior on A. Which do I choose?
Very interesting. I agree that the MEP does not solve everything (though Solomonoff induction does).
The use of the mean is a premise. That is, assuming you know the mean, the Maximum Entropy distribution is the correct distribution. If you know some other measure, then you can find the ME distribution that has that measure. If you don’t know anything about the distribution, then the Maximum Entropy principle still works by giving you the flat prior. If this is over all reals, it’s the “improper” prior, but it’s still the correct one.
The MEP doesn’t work if you assume you know statistics that you don’t. Using a thousand statistics from a data sample should not be done because what you measure from the data sample aren’t exactly the statistics from the true distribution. If you use the statistics that you do know, then the MEP is actually the exactly non-overfitting principle—it has exactly the information that you gave it.
The difficulty is in actually knowing any given statistic. Assuming you know one for the sake of actually getting anything done is where subjectivity comes in.
Right, but what people use the MEP for in practice is to do statistical modeling: one has a data set of outcomes and attempts to build a statistical model of it. So you never know any statistic—even the mean—with absolute confidence.
In the phrase “the correct one”, I have a problem with the word “the”. See the discussion of the Bertrand paradox in krey’s links.
For a specific example: I want to set a prior (an improper prior is okay!) for a constant in an Arrhenius equation for a chemical reaction. Oversimplified the equation looks like “r = A * exp(T/T0)”. Oversimplify more, and pretend that T0 is known but I know nothing about A. Do I set a flat prior on A? But what if I instead chose to write the equation as “r = exp(T/T0 + a)”. It’s the same equation; A = exp(a). But the flat prior on a is not equivalent to the flat prior on A. Which do I choose?