If OP were an entropy, then we’d simply do a weighted sum 1/2(OP(X4)+OP(X7))=1/2(1+3)=2, and then add one extra bit of entropy to represent our (binary) uncertainty as to what state we were in
Why do we add the extra bit? Doesn’t the weighted sum already represent that uncertainty?
Suppose the X’s had 0 entropy each—that is, they were states with no “internal moving parts,” like an electron.
Now imagine that you introduced ignorance into the problem—now we don’t know if the electron is in state 4 or state 7, so you assign each P=0.5. What is the entropy of this distribution?
Well, it turns out the entropy (amount of ignorance) is 1 bit. Which is 1 bit more than the 0 bits of entropy that states 4 and 7 had on their own.
I’m confused.
Why do we add the extra bit? Doesn’t the weighted sum already represent that uncertainty?
Suppose the X’s had 0 entropy each—that is, they were states with no “internal moving parts,” like an electron.
Now imagine that you introduced ignorance into the problem—now we don’t know if the electron is in state 4 or state 7, so you assign each P=0.5. What is the entropy of this distribution?
Well, it turns out the entropy (amount of ignorance) is 1 bit. Which is 1 bit more than the 0 bits of entropy that states 4 and 7 had on their own.