Alex_Altair comments on The Principle of Maximum Entropy

Alex_Altair 10 Feb 2012 18:31 UTC
0 points
Very interesting. I agree that the MEP does not solve everything (though Solomonoff induction does).

The use of the mean is a premise. That is, assuming you know the mean, the Maximum Entropy distribution is the correct distribution. If you know some other measure, then you can find the ME distribution that has that measure. If you don’t know anything about the distribution, then the Maximum Entropy principle still works by giving you the flat prior. If this is over all reals, it’s the “improper” prior, but it’s still the correct one.

Another issue with MEP is that it does not contain any intrinsic method to prevent overfitting.

The MEP doesn’t work if you assume you know statistics that you don’t. Using a thousand statistics from a data sample should not be done because what you measure from the data sample aren’t exactly the statistics from the true distribution. If you use the statistics that you do know, then the MEP is actually the exactly non-overfitting principle—it has exactly the information that you gave it.

The difficulty is in actually knowing any given statistic. Assuming you know one for the sake of actually getting anything done is where subjectivity comes in.
- Daniel_Burfoot 12 Feb 2012 4:30 UTC
  0 points
  Parent
  
  The MEP doesn’t work if you assume you know statistics that you don’t. Using a thousand statistics from a data sample should not be done because what you measure from the data sample aren’t exactly the statistics from the true distribution.
  
  Right, but what people use the MEP for in practice is to do statistical modeling: one has a data set of outcomes and attempts to build a statistical model of it. So you never know any statistic—even the mean—with absolute confidence.
- roystgnr 11 Feb 2012 2:12 UTC
  0 points
  Parent
  In the phrase “the correct one”, I have a problem with the word “the”. See the discussion of the Bertrand paradox in krey’s links.
  
  For a specific example: I want to set a prior (an improper prior is okay!) for a constant in an Arrhenius equation for a chemical reaction. Oversimplified the equation looks like “r = A * exp(T/T0)”. Oversimplify more, and pretend that T0 is known but I know nothing about A. Do I set a flat prior on A? But what if I instead chose to write the equation as “r = exp(T/T0 + a)”. It’s the same equation; A = exp(a). But the flat prior on a is not equivalent to the flat prior on A. Which do I choose?