Nathan Young comments on Nathan Young’s Shortform

Nathan Young 2 Jan 2024 10:56 UTC
5 points
0
I am trying to learn some information theory.
It feels like the bits of information between 50% and 25% and 50% and 75% should be the same.
But for probability p, the information is -log2(p).
But then the information of .5 → .25 is 1 bit and but from .5 to .75 is .41 bits. What am I getting wrong?
I would appreciate blogs and youtube videos.
- mattmacdermott 2 Jan 2024 12:16 UTC
  12 points
  3
  Parent
  I might have misunderstood you, but I wonder if you’re mixing up calculating the self-information or surpisal of an outcome with the information gain on updating your beliefs from one distribution to another.
  
  An outcome which has probability 50% contains $- log (0.5) = 1$ bit of self-information, and an outcome which has probability 75% contains $- log (0.75) = 0.41$ bits, which seems to be what you’ve calculated.
  
  But since you’re talking about the bits of information between two probabilities I think the situation you have in mind is that I’ve started with 50% credence in some proposition A, and ended up with 25% (or 75%). To calculate the information gained here, we need to find the entropy of our initial belief distribution, and subtract the entropy of our final beliefs. The entropy of our beliefs about A is $- p (A) log (p (A)) - p (\neg A) log (p (\neg A))$ .
  
  So for 50% → 25% it’s $H (P_{i}) - H (P_{f}) = (- 0.5 log (0.5) - 0.5 log (0.5)) - (- 0.25 log (0.25) - 0.75 log (0.75)) .$
  
  And for 50%->75% it’s $H (P_{i}) - H (P_{f}) = (- 0.5 log (0.5) - 0.5 log (0.5)) - (- 0.75 log (0.75) - 0.25 log (0.25)) .$
  
  So your intuition is correct: these give the same answer.