The Shannon entropy of a distribution over random variable X conditional on the value of another random variable C can be written as H(X|C)=H(X)−H(C)
If X and C are which face is up for two different fair coins, H(X) = H(C) = −1. But H(X|C)?=0 ? I think this works out fine for your case because (a) I(X,C) = H(C): the mutual information between C (which well you’re in) and X (where you are) is the entropy of C, (b) H(C|X) = 0: once you know where you are, you know which well you’re in, and, relatedly (c) H(X,C) = H(X): the entropy of the joint distribution just is the entropy over X.
Yeah I think the key point here more generally (I might be getting this wrong) is that C represents some partial state of knowledge about X, i.e. macro rather than micro-state knowledge. In other words it’s a (non-bijective) function of X. That’s why (b) is true, and the equation holds.
Sorry for the delay. As both you and TheMcDouglas have mentioned; yea, this relies on $H(C|X) = 0$. The way I’ve worded it above is somewhere between misleading and wrong, have modified. Thanks for pointing this out!
If X and C are which face is up for two different fair coins, H(X) = H(C) = −1. But H(X|C)?=0 ? I think this works out fine for your case because (a) I(X,C) = H(C): the mutual information between C (which well you’re in) and X (where you are) is the entropy of C, (b) H(C|X) = 0: once you know where you are, you know which well you’re in, and, relatedly (c) H(X,C) = H(X): the entropy of the joint distribution just is the entropy over X.
Yeah I think the key point here more generally (I might be getting this wrong) is that C represents some partial state of knowledge about X, i.e. macro rather than micro-state knowledge. In other words it’s a (non-bijective) function of X. That’s why (b) is true, and the equation holds.
Sorry for the delay. As both you and TheMcDouglas have mentioned; yea, this relies on $H(C|X) = 0$. The way I’ve worded it above is somewhere between misleading and wrong, have modified. Thanks for pointing this out!