christopherj comments on Putting in the Numbers

christopherj 31 Jan 2014 1:15 UTC
−2 points
OK, suppose that I tell your robot that a random number generator produces integers from 1 to 4. It dutifully calculates the maximum entropy, ¹⁄₄ chance for each. Now suppose that I tell it that a random number generator produces integers from 1 to 4, with an average value of 2.5. Now, though this is consistent with the original maximum entropy, the maximum entropy of the constraints (P1 + P2 + P3 + P4 = 1 and average = 2.5) will be an exponential distribution much like your distribution when the average was 3.

If your robot always calculates the maximum entropy distribution given constraints, it cannot instead take information as evidence of its previous hypothesis.

Me: What do you think is the distribution of this random number generator which returns integers from 1 to 4?

Robot: P=0.25 for each, because that is maximum entropy

Me: Guess what? The average value is 2.5, exactly as you predicted!

Robot: Excellent. My new prediction is that the distribution is exponential, instead of all 0.25, because that is now maximum entropy.
- Manfred 31 Jan 2014 2:17 UTC
  3 points
  Parent
  e^0 is also exponential.
  - christopherj 22 Feb 2014 19:24 UTC
    0 points
    Parent
    e^0 is not the exponential you told your robot to choose in that situation.
  - christopherj 22 Feb 2014 19:01 UTC
    0 points
    Parent
    The robot doesn’t care about your irrelevant technicality, it cares about maximum entropy. e^0 is exponential, but it is not the maximum entropy distribution with a given mean. In this case, it is Pr(X=x_k) = C*r^(x_k) for k = 1,2,3,4 where the positive constants C and r can be determined by the requirements that the sum of all the probabilities must be 1 and the expected value must be 2.5.
    
    Just because I didn’t actually write the formula doesn’t mean it doesn’t exist or you can replace it with any formula you like. So if the robot works as described, this is what the robot will update its expected probabilities to upon learning that the mean is 2.5, and not 2.5*e^0 because it would be convenient for you.
    
    This is why many of us are terrified of the Singularity, because the author of a program seldom anticipates its actual result. What’s even more terrifying is that this should have been obvious to you, as you gave an example as a strength of your idea of when the mean was 3.0. Why are you upset that I pointed out the consequences when the mean was 2.5? Instead of acknowledging the fact, you entirely forget what you told your robot to do and blurt out a misleading technicality?
    - gjm 22 Feb 2014 19:59 UTC
      3 points
      Parent
      Allow me to explain less snarkily and more directly than Manfred.
      
      As you correctly observe, the maximum-entropy probability on {1,2,3,4} with any given mean is one that gives k (k=1,2,3,4) probability A.r^k for some A,r, and these parameters are uniquely determined by the requirement that the probabilities sum to 1 and that the resulting mean should be the given one.
      
      In the particular case where the mean is 2.5, the specific values in question are A=1/4 and r=1.
      
      This distribution can be described as exponential, if you insist—but it also happens to be the same uniform distribution that’s maximum-entropy without knowing the mean.
      
      So the inconsistency you seemed to be suggesting—of an entropy-maximizing Bayesian robot choosing one distribution on the basis of maxent, and then switching to a different one on having one property of that distribution confirmed—is not real. On learning that the mean is 2.5 as it already guessed, the robot does not switch to a different distribution.
      
      [EDITED: minor tweaks for clarity.]
      - christopherj 22 Feb 2014 20:37 UTC
        2 points
        Parent
        It just occurred to me that I really ought to check whether I ought to check that it was in fact different rather than going, “what are the odds that out of all the possibilities that equation happens to be the uniform distribution?”. Guess I should have done that before posting.
        
        It also occurs to me now, that I didn’t even have to calculate out the equation (which I thought was too much effort for a “someone is wrong on the internet”), and could just plug in the values … and in fact I even already did that when finding the uniform distribution and its mean.
        
        This post sponsored by “When someone is wrong on the internet, it’s sometimes you”
    - Manfred 22 Feb 2014 19:41 UTC
      0 points
      Parent
      
      Pr(X=xk) = C*r^(xk) for k = 1,2,3,4 where the positive constants C and r can be determined by the requirements that the sum of all the probabilities must be 1 and the expected value must be 2.5.
      
      r^x = e^kx, where e^k=r. So. Would you like to wager whether r=1?