I’m fine with choosing some other name, but I think all of the different “entropies” (in stat mech, information theory, etc) refer to weighted averages over a set of states, whose probability-or-whatever adds up to 1. To me that suggests that this should also be true of the abstract version.
So I stand by the claim that the negative logarithm of probability-or-whatever should have some different name, so that people don’t get confused by the ([other thing], entropy) → (entropy, average entropy) terminology switch.
I think “average entropy” is also (slightly) misleading because it suggests that the -log(p)’s of individual states are independent of the choice of which microstates are in your macrostate, which I think is maybe the root problem I have with footnote 17. (See new comment in that subthread)
Part of what confuses me about your objection is that it seems like averages of things can usually be treated the same as the individual things. E.g. an average number of apples is a number of apples, and average height is a height (“Bob is taller than Alice” is treated the same as “men are taller than women”). The sky is blue, by which we mean that the average photon frequency is in the range defined as blue; we also just say “a blue photon”.
A possible counter-example I can think of is temperature. Temperature is the average [something like] kinetic energy of the molecules, and we don’t tend to think of it as kinetic energy. It seems to be somehow transmuted in nature through its averaging.
But entropy doesn’t feel like this to me. I feel comfortable saying “the entropy of a binomial distribution”, and throughout the sequence I’m clear about the “average entropy” thing just to remind the reader where it comes from.
I think it’s different because entropy is an expectation of a thing which depends on the probability distribution that you’re using to weight things.
Like, other things are maybe… A is the number of apples, sum of p×A is the expected number of apples under distribution p, sum of q×A is the expected number of apples under distribution q.
But entropy is… -log(p) is a thing, and sum of p × -log(p) is the entropy.
And the sum of q × -log(p) is… not entropy! (It’s “cross-entropy”)
That makes sense. In my post I’m saying that entropy is whatever binary string assignment you want, which does not depend on the probability distribution you’re using to weight things. And then if you want the minimum average string length, it becomes in terms of the probability distribution.
Ah, I missed this on a first skim and only got it recently, so some of my comments are probably missing this context in important ways. Sorry, that’s on me.
One thing I’m not very confident about is how working scientists use the concept of “macrostate”. If I had good resources for that I might change some of how the sequence is written, because I don’t want to create any confusion for people who use this sequence to learn and then go on to work in a related field. (...That said, it’s not like people aren’t already confused. I kind of expect most working scientists to be confused about entropy outside their exact domain’s use.)
In probability theory, you have outcomes (individual possibilities), events (sets of possibilities), and distributions (assignments of probabilities to all possible outcomes).
“microstate”: outcome.
“macrostate”: sorta ambiguous between event and distribution.
“entropy of an outcome”: not a thing working scientists or mathematicians say, ever, as far as I know.
“entropy of an event”: not a thing either.
“entropy of a distribution”: that’s a thing!
“entropy of a macrostate”: people say this, so they must mean a distribution when they are saying this phrase.
I think you’re within your rights to use “macrostate” in any reasonable way that you like. My beef is entirely about the type signature of “entropy” with regard to distributions and events/outcomes.
Here’s another thing that might be adding to our confusion. It just so happens that in the particular system that is this universe, all states with the same total energy are equally likely. That’s not true for most systems (which don’t even have a concept of energy), and so it doesn’t seem like a part of abstract entropy to me. So e.g. macrostates don’t necessarily contain microstates of equal probability (which I think you’ve implied a couple times).
I thought I recalled that “macrostate” was only used for the “microcanonical ensemble” (fancy phrase for a uniform-over-all-microstates-with-same-(E,N,V) probability distribution), but in fact it’s a little ambiguous.
Wikipedia says
Treatments on statistical mechanics[2][3] define a macrostate as follows: a particular set of values of energy, the number of particles, and the volume of an isolated thermodynamic system is said to specify a particular macrostate of it.
which implies microcanonical ensemble (the other are parametrized by things other than (E, N, V) triples), but then later it talks about both the canonical and microcanonical ensemble.
I think a lot of our confusion comes from way physicists equivocate between macrostates as a set of microstates (with the probability distribution) unspecified) and as a probability distribution. Wiki’s “definition” is ambiguous: a particular (E, N, V) triple specifies both a set of microstates (with those values) and a distribution (uniform over that set).
In contrast, the canonical ensemble is a probability distribution defined by a triple (T,N,V), with each microstate having probability proportional to exp(- E / kT) if it has particle number N and volume V, otherwise probability zero. I’m not sure what “a macrostate specified by (T,N,V)” should mean here: either the set of microstates with (N, V) (and any E), or the non-uniform distribution I just described.
(By the way: note that when T is being used here, it doesn’t mean the average energy, kinetic or otherwise. kT isn’t the actual energy of anything, it’s just the slope of the exponential decay of probability with respect to energy. A consequence of this definition is that the expected kinetic energy in some contexts is proportional to temperature, but this expectation is for a probability distribution over many microstates that may have more or less kinetic energy than that. Another consequence is that for large systems, the average kinetic energy of particles in the actual true microstate is very likely to be very close to (some multiple of) kT, but this is because of the law of large numbers and is not true for small systems. Note that there’s two different senses of “average” here.)
I agree that equal probabilities / uniform distributions are not a fundamental part of anything here and are just a useful special case to consider.
I’m not quite sure what the cruxes of our disagreement are yet. So I’m going to write up some more of how I’m thinking about things, which I think might be relevant.
When we decide to model a system and assign its states entropy, there’s a question of what set of states we’re including. Often, we’re modelling part of the real universe. The real universe is in only one state at any given time. But we’re ignorant of a bunch of parts of it (and we’re also ignorant about exactly what states it will evolve into over time). So to do some analysis, we decide on some stuff we do know about its state, and then we decide to include all states compatible with that information. But this is all just epistemic. There’s no one true set that encompasses all possible states; there’s just states that we’re considering possible.
And then there’s the concept of a macrostate. Maybe we use the word macrostate to refer to the set of all states that we’ve decided are possible. But then maybe we decide to make an observation about the system, one that will reduce the number of possible states consistent with all our observations. Before we make the observation, I think it’s reasonable to say that for every possible outcome of the observation, there’s a macrostate consistent with that outcome. The probability that we will find the system to be in that macrostate is the sum of the probability of its microstates. Thus the macrostate has p<1 before the observation, and p=1 after the observation. This feels pretty normal to me.
We can do this for any property that we can observe, and that’s why I defined a macrostate as, “collections of microstates … connotively characterized by a generalized property of the state”.
I also don’t see why it couldn’t be a set containing a singe state; a set of one thing is still a set. Whether that one thing has probability 1 or not depends on what you’re deciding to do with your uncertainty model.
I think the crux of our disagreement [edit: one of our disagreements] is whether the macrostate we’re discussing can be chosen independently of the “uncertainty model” at all.
When physicists talk about “the entropy of a macrostate”, they always mean something of the form:
There are a bunch of p’s that add up to 1. We want the sum of p × (-log p) over all p’s. [EXPECTATION of -log p aka ENTROPY of the distribution]
They never mean something of the form:
There are a bunch of p’s that add up to 1. We want the sum of p × (-log p) over just some of the p’s. [???]
Or:
There are a bunch of p’s that add up to 1. We want the sum of p × (-log p) over just some of the p’s, divided by the sum of p over the same p’s. [CONDITIONAL EXPECTATION of -log p given some event]
Or:
There are a bunch of p’s that add up to 1. We want the sum of (-log p) over just some of the p’s, divided by the number of p’s we included. [ARITHMETIC MEAN of -log p over some event]
This also applies to information theorists talking about Shannon entropy.
I think that’s the basic crux here.
This is perhaps confusing because “macrostate” is often claimed to have something to do with a subset of the microstates. So you might be forgiven for thinking “entropy of a macrostate” in statmech means:
For some arbitrary distribution p, consider a separately-chosen “macrostate” A (a set of outcomes). Compute the sum of p × (-log p) over every p whose corresponding outcome is in A, maybe divided by the total probability of A or something.
But in fact this is not what is meant!
Instead, “entropy of a macrostate” means the following:
For some “macrostate”, whatever the hell that means, we construct a probability distribution p. Maybe that’s the macrostate itself, maybe it’s a distribution corresponding to the macrostate, usage varies. But the macrostate determines the distribution, either way. Compute the sum of p × (-log p) over every p.
EDIT: all of this applies even more to negentropy. The “S_max” in that formula is always the entropy of the highest-entropy possible distribution, not anything to do with a single microstate.
I’m fine with choosing some other name, but I think all of the different “entropies” (in stat mech, information theory, etc) refer to weighted averages over a set of states, whose probability-or-whatever adds up to 1. To me that suggests that this should also be true of the abstract version.
So I stand by the claim that the negative logarithm of probability-or-whatever should have some different name, so that people don’t get confused by the ([other thing], entropy) → (entropy, average entropy) terminology switch.
I think “average entropy” is also (slightly) misleading because it suggests that the -log(p)’s of individual states are independent of the choice of which microstates are in your macrostate, which I think is maybe the root problem I have with footnote 17. (See new comment in that subthread)
Part of what confuses me about your objection is that it seems like averages of things can usually be treated the same as the individual things. E.g. an average number of apples is a number of apples, and average height is a height (“Bob is taller than Alice” is treated the same as “men are taller than women”). The sky is blue, by which we mean that the average photon frequency is in the range defined as blue; we also just say “a blue photon”.
A possible counter-example I can think of is temperature. Temperature is the average [something like] kinetic energy of the molecules, and we don’t tend to think of it as kinetic energy. It seems to be somehow transmuted in nature through its averaging.
But entropy doesn’t feel like this to me. I feel comfortable saying “the entropy of a binomial distribution”, and throughout the sequence I’m clear about the “average entropy” thing just to remind the reader where it comes from.
I think it’s different because entropy is an expectation of a thing which depends on the probability distribution that you’re using to weight things.
Like, other things are maybe… A is the number of apples, sum of p×A is the expected number of apples under distribution p, sum of q×A is the expected number of apples under distribution q.
But entropy is… -log(p) is a thing, and sum of p × -log(p) is the entropy.
And the sum of q × -log(p) is… not entropy! (It’s “cross-entropy”)
That makes sense. In my post I’m saying that entropy is whatever binary string assignment you want, which does not depend on the probability distribution you’re using to weight things. And then if you want the minimum average string length, it becomes in terms of the probability distribution.
Ah, I missed this on a first skim and only got it recently, so some of my comments are probably missing this context in important ways. Sorry, that’s on me.
One thing I’m not very confident about is how working scientists use the concept of “macrostate”. If I had good resources for that I might change some of how the sequence is written, because I don’t want to create any confusion for people who use this sequence to learn and then go on to work in a related field. (...That said, it’s not like people aren’t already confused. I kind of expect most working scientists to be confused about entropy outside their exact domain’s use.)
I think it might be a bit of a mess, tbh.
In probability theory, you have outcomes (individual possibilities), events (sets of possibilities), and distributions (assignments of probabilities to all possible outcomes).
“microstate”: outcome.
“macrostate”: sorta ambiguous between event and distribution.
“entropy of an outcome”: not a thing working scientists or mathematicians say, ever, as far as I know.
“entropy of an event”: not a thing either.
“entropy of a distribution”: that’s a thing!
“entropy of a macrostate”: people say this, so they must mean a distribution when they are saying this phrase.
I think you’re within your rights to use “macrostate” in any reasonable way that you like. My beef is entirely about the type signature of “entropy” with regard to distributions and events/outcomes.
Here’s another thing that might be adding to our confusion. It just so happens that in the particular system that is this universe, all states with the same total energy are equally likely. That’s not true for most systems (which don’t even have a concept of energy), and so it doesn’t seem like a part of abstract entropy to me. So e.g. macrostates don’t necessarily contain microstates of equal probability (which I think you’ve implied a couple times).
Honestly, I’m confused about this now.
I thought I recalled that “macrostate” was only used for the “microcanonical ensemble” (fancy phrase for a uniform-over-all-microstates-with-same-(E,N,V) probability distribution), but in fact it’s a little ambiguous.
Wikipedia says
which implies microcanonical ensemble (the other are parametrized by things other than (E, N, V) triples), but then later it talks about both the canonical and microcanonical ensemble.
I think a lot of our confusion comes from way physicists equivocate between macrostates as a set of microstates (with the probability distribution) unspecified) and as a probability distribution. Wiki’s “definition” is ambiguous: a particular (E, N, V) triple specifies both a set of microstates (with those values) and a distribution (uniform over that set).
In contrast, the canonical ensemble is a probability distribution defined by a triple (T,N,V), with each microstate having probability proportional to exp(- E / kT) if it has particle number N and volume V, otherwise probability zero. I’m not sure what “a macrostate specified by (T,N,V)” should mean here: either the set of microstates with (N, V) (and any E), or the non-uniform distribution I just described.
(By the way: note that when T is being used here, it doesn’t mean the average energy, kinetic or otherwise. kT isn’t the actual energy of anything, it’s just the slope of the exponential decay of probability with respect to energy. A consequence of this definition is that the expected kinetic energy in some contexts is proportional to temperature, but this expectation is for a probability distribution over many microstates that may have more or less kinetic energy than that. Another consequence is that for large systems, the average kinetic energy of particles in the actual true microstate is very likely to be very close to (some multiple of) kT, but this is because of the law of large numbers and is not true for small systems. Note that there’s two different senses of “average” here.)
I agree that equal probabilities / uniform distributions are not a fundamental part of anything here and are just a useful special case to consider.
I’m not quite sure what the cruxes of our disagreement are yet. So I’m going to write up some more of how I’m thinking about things, which I think might be relevant.
When we decide to model a system and assign its states entropy, there’s a question of what set of states we’re including. Often, we’re modelling part of the real universe. The real universe is in only one state at any given time. But we’re ignorant of a bunch of parts of it (and we’re also ignorant about exactly what states it will evolve into over time). So to do some analysis, we decide on some stuff we do know about its state, and then we decide to include all states compatible with that information. But this is all just epistemic. There’s no one true set that encompasses all possible states; there’s just states that we’re considering possible.
And then there’s the concept of a macrostate. Maybe we use the word macrostate to refer to the set of all states that we’ve decided are possible. But then maybe we decide to make an observation about the system, one that will reduce the number of possible states consistent with all our observations. Before we make the observation, I think it’s reasonable to say that for every possible outcome of the observation, there’s a macrostate consistent with that outcome. The probability that we will find the system to be in that macrostate is the sum of the probability of its microstates. Thus the macrostate has p<1 before the observation, and p=1 after the observation. This feels pretty normal to me.
We can do this for any property that we can observe, and that’s why I defined a macrostate as, “collections of microstates … connotively characterized by a generalized property of the state”.
I also don’t see why it couldn’t be a set containing a singe state; a set of one thing is still a set. Whether that one thing has probability 1 or not depends on what you’re deciding to do with your uncertainty model.
I think the crux of our disagreement [edit: one of our disagreements] is whether the macrostate we’re discussing can be chosen independently of the “uncertainty model” at all.
When physicists talk about “the entropy of a macrostate”, they always mean something of the form:
There are a bunch of p’s that add up to 1. We want the sum of p × (-log p) over all p’s. [EXPECTATION of -log p aka ENTROPY of the distribution]
They never mean something of the form:
There are a bunch of p’s that add up to 1. We want the sum of p × (-log p) over just some of the p’s. [???]
Or:
There are a bunch of p’s that add up to 1. We want the sum of p × (-log p) over just some of the p’s, divided by the sum of p over the same p’s. [CONDITIONAL EXPECTATION of -log p given some event]
Or:
There are a bunch of p’s that add up to 1. We want the sum of (-log p) over just some of the p’s, divided by the number of p’s we included. [ARITHMETIC MEAN of -log p over some event]
This also applies to information theorists talking about Shannon entropy.
I think that’s the basic crux here.
This is perhaps confusing because “macrostate” is often claimed to have something to do with a subset of the microstates. So you might be forgiven for thinking “entropy of a macrostate” in statmech means:
For some arbitrary distribution p, consider a separately-chosen “macrostate” A (a set of outcomes). Compute the sum of p × (-log p) over every p whose corresponding outcome is in A, maybe divided by the total probability of A or something.
But in fact this is not what is meant!
Instead, “entropy of a macrostate” means the following:
For some “macrostate”, whatever the hell that means, we construct a probability distribution p. Maybe that’s the macrostate itself, maybe it’s a distribution corresponding to the macrostate, usage varies. But the macrostate determines the distribution, either way. Compute the sum of p × (-log p) over every p.
EDIT: all of this applies even more to negentropy. The “S_max” in that formula is always the entropy of the highest-entropy possible distribution, not anything to do with a single microstate.