Part of what confuses me about your objection is that it seems like averages of things can usually be treated the same as the individual things. E.g. an average number of apples is a number of apples, and average height is a height (“Bob is taller than Alice” is treated the same as “men are taller than women”). The sky is blue, by which we mean that the average photon frequency is in the range defined as blue; we also just say “a blue photon”.
A possible counter-example I can think of is temperature. Temperature is the average [something like] kinetic energy of the molecules, and we don’t tend to think of it as kinetic energy. It seems to be somehow transmuted in nature through its averaging.
But entropy doesn’t feel like this to me. I feel comfortable saying “the entropy of a binomial distribution”, and throughout the sequence I’m clear about the “average entropy” thing just to remind the reader where it comes from.
I think it’s different because entropy is an expectation of a thing which depends on the probability distribution that you’re using to weight things.
Like, other things are maybe… A is the number of apples, sum of p×A is the expected number of apples under distribution p, sum of q×A is the expected number of apples under distribution q.
But entropy is… -log(p) is a thing, and sum of p × -log(p) is the entropy.
And the sum of q × -log(p) is… not entropy! (It’s “cross-entropy”)
That makes sense. In my post I’m saying that entropy is whatever binary string assignment you want, which does not depend on the probability distribution you’re using to weight things. And then if you want the minimum average string length, it becomes in terms of the probability distribution.
Ah, I missed this on a first skim and only got it recently, so some of my comments are probably missing this context in important ways. Sorry, that’s on me.
Part of what confuses me about your objection is that it seems like averages of things can usually be treated the same as the individual things. E.g. an average number of apples is a number of apples, and average height is a height (“Bob is taller than Alice” is treated the same as “men are taller than women”). The sky is blue, by which we mean that the average photon frequency is in the range defined as blue; we also just say “a blue photon”.
A possible counter-example I can think of is temperature. Temperature is the average [something like] kinetic energy of the molecules, and we don’t tend to think of it as kinetic energy. It seems to be somehow transmuted in nature through its averaging.
But entropy doesn’t feel like this to me. I feel comfortable saying “the entropy of a binomial distribution”, and throughout the sequence I’m clear about the “average entropy” thing just to remind the reader where it comes from.
I think it’s different because entropy is an expectation of a thing which depends on the probability distribution that you’re using to weight things.
Like, other things are maybe… A is the number of apples, sum of p×A is the expected number of apples under distribution p, sum of q×A is the expected number of apples under distribution q.
But entropy is… -log(p) is a thing, and sum of p × -log(p) is the entropy.
And the sum of q × -log(p) is… not entropy! (It’s “cross-entropy”)
That makes sense. In my post I’m saying that entropy is whatever binary string assignment you want, which does not depend on the probability distribution you’re using to weight things. And then if you want the minimum average string length, it becomes in terms of the probability distribution.
Ah, I missed this on a first skim and only got it recently, so some of my comments are probably missing this context in important ways. Sorry, that’s on me.