I might have misunderstood you, but I wonder if you’re mixing up calculating the self-information or surpisal of an outcome with the information gain on updating your beliefs from one distribution to another.
An outcome which has probability 50% contains −log(0.5)=1 bit of self-information, and an outcome which has probability 75% contains −log(0.75)=0.41 bits, which seems to be what you’ve calculated.
But since you’re talking about the bits of information between two probabilities I think the situation you have in mind is that I’ve started with 50% credence in some proposition A, and ended up with 25% (or 75%). To calculate the information gained here, we need to find the entropy of our initial belief distribution, and subtract the entropy of our final beliefs. The entropy of our beliefs about A is −p(A)log(p(A))−p(¬A)log(p(¬A)).
So for 50% → 25% it’s H(Pi)−H(Pf)=(−0.5log(0.5)−0.5log(0.5))−(−0.25log(0.25)−0.75log(0.75)).
And for 50%->75% it’s H(Pi)−H(Pf)=(−0.5log(0.5)−0.5log(0.5))−(−0.75log(0.75)−0.25log(0.25)).
So your intuition is correct: these give the same answer.
I am trying to learn some information theory.
It feels like the bits of information between 50% and 25% and 50% and 75% should be the same.
But for probability p, the information is -log2(p).
But then the information of .5 → .25 is 1 bit and but from .5 to .75 is .41 bits. What am I getting wrong?
I would appreciate blogs and youtube videos.
I might have misunderstood you, but I wonder if you’re mixing up calculating the self-information or surpisal of an outcome with the information gain on updating your beliefs from one distribution to another.
An outcome which has probability 50% contains −log(0.5)=1 bit of self-information, and an outcome which has probability 75% contains −log(0.75)=0.41 bits, which seems to be what you’ve calculated.
But since you’re talking about the bits of information between two probabilities I think the situation you have in mind is that I’ve started with 50% credence in some proposition A, and ended up with 25% (or 75%). To calculate the information gained here, we need to find the entropy of our initial belief distribution, and subtract the entropy of our final beliefs. The entropy of our beliefs about A is −p(A)log(p(A))−p(¬A)log(p(¬A)).
So for 50% → 25% it’s H(Pi)−H(Pf)=(−0.5log(0.5)−0.5log(0.5))−(−0.25log(0.25)−0.75log(0.75)).
And for 50%->75% it’s H(Pi)−H(Pf)=(−0.5log(0.5)−0.5log(0.5))−(−0.75log(0.75)−0.25log(0.25)).
So your intuition is correct: these give the same answer.