This piece reads to me like the output of someone who worked hard to comprehend a topic in full, without accepting the rest of society’s cloudy bullshit / papered-over-confusion / historical baggage in place of answers. And in a particularly thorny case, no less. And with significant effort made to articulate the comprehension clearly and intuitively to others.
For instance: saying “if we’re going to call all of these disparate concepts ‘entropy’, then let’s call the length of the name of a microstate ‘entropy’ also; this will tie the whole conceptual framework together and make what follows more intuitive” is a bold move, and looks like the product of swallowing the whole topic and then digesting it down into something organized and comprehensible. It strikes me as a unit of conceptual labor.
I almost agree, but I really do stand by my claim that Alex has nicely identified the correct abstract thing and then named the wrong part of it entropy.
[EDIT: I now think the abstract thing I describe below—statistical entropy—is not the full thing Alex is going for. A more precise claim is: Alex is describing some general thing, and calling part of it “entropy”. When I map that thing onto domains like statmech or information theory, his “entropy” doesn’t map onto the thing called “entropy” in those domains, even though the things called “entropy” in those domains do map onto each other. This might be because he wants it to map onto “algorithmic entropy” in the K-complexity setting, but I think this doesn’t justify the mismatch.]
The abstract thing [EDIT: “statistical entropy”] is shaped something like: there are many things (call ’em microstates).
Each thing has a “weight”, p. (Let’s not call it “probability” because that has too much baggage.)
We care a lot about the negative log of p. However, in none of the manifestations of this abstract concept is that called “entropy”.
We also care about the average of -log(p) over every possible microstate, weighted by p. That’s called “entropy” in every manifestation of this pattern (if the word is used at all), never “average entropy”.
I don’t see why it helps intuition to give these things the same name, and especially not why you would want to replace the various specific “entropy”s with an abstract “average entropy”.
I’m also unsure whether I would have made Alex’s naming choice. (I think he suggested that this naming fits with something he wants to do with K complexity, but I haven’t understood that yet, and will wait and see before weighing in myself.)
Also, to state the obvious, noticing that the concept wants a short name (if we are to tie a bunch of other things together and organize them properly) feels to me like a unit of conceptual progress regardless of whether I personally like the proposed pun.
On a completely different note, one of my personal spicy takes is that when we’re working in this domain, we should be working in log base 1⁄2 (or 1/e or suchlike, namely with a base 0 < b < 1). Which is very natural, because we’re counting the number of halvings (of probability / in statespace) that it takes to single out a (cluster of) state(s). This convention dispells a bunch of annoying negative signs.
(I also humbly propose the notation ə, pronounced “schwa”, for 1/e.)
((In my personal notation I use lug, pronounced /ləg/, for log base ə, and lug2 and etc., but I’m not yet confident that this is a good convention.))
I think he suggested that this naming fits with something he wants to do with K complexity
I didn’t mean something I’m doing, I meant that the field of K-complexity just straight-forwardly uses the word “entropy” to refer to it. Let me see if I can dig up some references.
K-complexity is apparently sometimes called “algorithmic entropy” (but not just “entropy”, I don’t think?)
Wiktionary quotes Niels Henrik Gregersen:
Algorithmic entropy is closely related to statistically defined entropy, the statistical entropy of an ensemble being, for any concisely describable ensemble, very nearly equal to the ensemble average of the algorithmic entropy of its members
I think this might be the crux!
Note the weird type mismatch: “the statistical entropy of an ensemble [...] the ensemble average of the algorithmic entropy of its members”.
So my story would be something like the following:
Many fields (thermodynamics, statistical mechanics, information theory, probability) use “entropy” to mean something equivalent to “the expectation of -log(p) for a distribution p”. Let’s call this “statistical entropy”, but in practice people call it “entropy”.
Algorithmic information theorists have an interestingly related but distinct concept, which they sometimes call “algorithmic entropy”.
Whoops, hang on a sec. Did you want your “abstract entropy” to encompass both of these?
If so, I didn’t realize that until now! That changes a lot, and I apologize sincerely if waiting for the K-complexity stuff would’ve dissipated a lot of the confusion.
Things I think contributed to my confusion:
(1) Your introduction only directly mentions / links to domain-specific types of entropy that are firmly under (type 1) “statistical entropy”
(2) This intro post doesn’t yet touch on (type 2) algorithmic entropy, and is instead a mix of type-1 and your abstract thing where description length and probability distribution are decoupled.
(3) I suspect you were misled by the unpedagogical phrase “entropy of a macrostate” from statmech, and didn’t realize that (as used in that field) the distribution involved is determined by the macrostate in a prescribed way (or is the macrostate).
I would add a big fat disclaimer that this series is NOT just limited to type-1 entropy, and (unless you disagree with my taxonomy here) emphasize heavily that you’re including type-2 entropy.
Did you want your “abstract entropy” to encompass both of these?
Indeed I definitely do.
I would add a big fat disclaimer
There are a bunch of places where I think I flagged relevant things, and I’m curious if these seem like enough to you;
The whole post is called “abstract entropy”, which should tell you that it’s at least a little different from any “standard” form of entropy
The third example, “It helps us understand strategies for (and limits on) file compression”, is implicitly about K-complexity
This whole paragraph: “Many people reading this will have some previous facts about entropy stored in their minds, and this can sometimes be disorienting when it’s not yet clear how those facts are consistent with what I’m describing. You’re welcome to skip ahead to the relevant parts and see if they’re re-orienting; otherwise, if you can get through the whole explanation, I hope that it will eventually be addressed!”
Me being clear that I’m not a domain expert
Footnote [4], which talks about Turing machines and links to my post on Solomonoff induction
Me going on and on about binary strings and how we’re associating these with individual state—I dunno, to me this just screams K-complexity to anyone who’s heard of it
“I just defined entropy as a property of specific states, but in many contexts you don’t care at all about specific states...”
… “I’ll talk about this in a future post; I think that “order” is synonymous with Kolmogorov complexity.” …
I struggled with writing the intro section of this post because it felt like there were half a dozen disclaimer-type things that I wanted to get out of the way first. But each one is only relevant to a subset of people, and eventually I need to get to the content. I’m not even expecting most readers to be holding any such type-1/type-2 distinction in their mind to start, so I’d have to go out of my way to explain it before giving the disclaimer.
All that aside, I am very open to the idea that we should be calling the single-state thing something different. The “minimum average” form is the great majority of use cases.
I initially interpreted “abstract entropy” as meaning statistical entropy as opposed to thermodynamic or stat-mech or information-theoretic entropy. I think very few people encounter the phrase “algorithmic entropy” enough for it to be salient to them, so most confusion about entropy in different domains is about statistical entropy in physics and info theory. (Maybe this is different for LW readers!)
This was reinforced by the introduction because I took the mentions of file compression and assigning binary strings to states to be about (Shannon-style) coding theory, which uses statistical entropy heavily to talk about these same things and is a much bigger part of most CS textbooks/courses. (It uses phrases like “length of a codeword”, “expected length of a code [under some distribution]”, etc. and then has lots of theorems about statistical entropy being related to expected length of an optimal code.)
After getting that pattern going, I had enough momentum to see “Solomonoff”, think “sure, it’s a probability distribution, presumably he’s going to do something statistical-entropy-like with it”, and completely missed the statements that you were going to be interpreting K complexity itself as a kind of entropy. I also missed the statement about random variables not being necessary.
I suspect this would also happen to many other people who have encountered stat mech and/or information theory, and maybe even K complexity but not the phrase “algorithmic entropy”, but I could be wrong.
A disclaimer is probably not actually necessary, though, on reflection; I care a lot more about the “minimum average” qualifiers both being included in statistical-entropy contexts. I don’t know exactly how to unify this with “algorithmic entropy” but I’ll wait and see what you do :)
I like the schwa and lug proposals. Trying to anticipate problems, I do suspect newcomers will see ‘lug’, and find themselves confused, if it has never been explained to them. It even seems possible they may not connect it to logarithms sans explanation
Also, to state the obvious, noticing that the concept wants a short name (if we are to tie a bunch of other things together and organize them properly) feels to me like a unit of conceptual progress regardless of whether I personally like the proposed pun
Agreed!
schwa and lug
Yeah, shorthand for this seems handy. I like these a lot, especially schwa, although I’m a little worried about ambiguous handwriting. My contest entry is nl (for “negative logarithm” or ”ln but flipped”).
(Let’s not call it “probability” because that has too much baggage.)
This aside raises concerns for me, like it makes me worry that maybe we’re more deeply not on the same page. It seems to me like the weighing is just straight-forward probability, and that it’s important to call it that.
This piece reads to me like the output of someone who worked hard to comprehend a topic in full, without accepting the rest of society’s cloudy bullshit / papered-over-confusion / historical baggage in place of answers. And in a particularly thorny case, no less. And with significant effort made to articulate the comprehension clearly and intuitively to others.
For instance: saying “if we’re going to call all of these disparate concepts ‘entropy’, then let’s call the length of the name of a microstate ‘entropy’ also; this will tie the whole conceptual framework together and make what follows more intuitive” is a bold move, and looks like the product of swallowing the whole topic and then digesting it down into something organized and comprehensible. It strikes me as a unit of conceptual labor.
Respect.
I’m excited to see where this goes.
I almost agree, but I really do stand by my claim that Alex has nicely identified the correct abstract thing and then named the wrong part of it entropy.
[EDIT: I now think the abstract thing I describe below—statistical entropy—is not the full thing Alex is going for. A more precise claim is: Alex is describing some general thing, and calling part of it “entropy”. When I map that thing onto domains like statmech or information theory, his “entropy” doesn’t map onto the thing called “entropy” in those domains, even though the things called “entropy” in those domains do map onto each other. This might be because he wants it to map onto “algorithmic entropy” in the K-complexity setting, but I think this doesn’t justify the mismatch.]
The abstract thing [EDIT: “statistical entropy”] is shaped something like: there are many things (call ’em microstates).
Each thing has a “weight”, p.
(Let’s not call it “probability” because that has too much baggage.)We care a lot about the negative log of p. However, in none of the manifestations of this abstract concept is that called “entropy”.
We also care about the average of -log(p) over every possible microstate, weighted by p. That’s called “entropy” in every manifestation of this pattern (if the word is used at all), never “average entropy”.
I don’t see why it helps intuition to give these things the same name, and especially not why you would want to replace the various specific “entropy”s with an abstract “average entropy”.
I’m also unsure whether I would have made Alex’s naming choice. (I think he suggested that this naming fits with something he wants to do with K complexity, but I haven’t understood that yet, and will wait and see before weighing in myself.)
Also, to state the obvious, noticing that the concept wants a short name (if we are to tie a bunch of other things together and organize them properly) feels to me like a unit of conceptual progress regardless of whether I personally like the proposed pun.
On a completely different note, one of my personal spicy takes is that when we’re working in this domain, we should be working in log base 1⁄2 (or 1/e or suchlike, namely with a base 0 < b < 1). Which is very natural, because we’re counting the number of halvings (of probability / in statespace) that it takes to single out a (cluster of) state(s). This convention dispells a bunch of annoying negative signs.
(I also humbly propose the notation ə, pronounced “schwa”, for 1/e.)
((In my personal notation I use
lug
, pronounced /ləg/, for log base ə, andlug2
and etc., but I’m not yet confident that this is a good convention.))I didn’t mean something I’m doing, I meant that the field of K-complexity just straight-forwardly uses the word “entropy” to refer to it. Let me see if I can dig up some references.
K-complexity is apparently sometimes called “algorithmic entropy” (but not just “entropy”, I don’t think?)
Wiktionary quotes Niels Henrik Gregersen:
I think this might be the crux!
Note the weird type mismatch: “the statistical entropy of an ensemble [...] the ensemble average of the algorithmic entropy of its members”.
So my story would be something like the following:
Many fields (thermodynamics, statistical mechanics, information theory, probability) use “entropy” to mean something equivalent to “the expectation of -log(p) for a distribution p”. Let’s call this “statistical entropy”, but in practice people call it “entropy”.
Algorithmic information theorists have an interestingly related but distinct concept, which they sometimes call “algorithmic entropy”.
Whoops, hang on a sec. Did you want your “abstract entropy” to encompass both of these?
If so, I didn’t realize that until now! That changes a lot, and I apologize sincerely if waiting for the K-complexity stuff would’ve dissipated a lot of the confusion.
Things I think contributed to my confusion:
(1) Your introduction only directly mentions / links to domain-specific types of entropy that are firmly under (type 1) “statistical entropy”
(2) This intro post doesn’t yet touch on (type 2) algorithmic entropy, and is instead a mix of type-1 and your abstract thing where description length and probability distribution are decoupled.
(3) I suspect you were misled by the unpedagogical phrase “entropy of a macrostate” from statmech, and didn’t realize that (as used in that field) the distribution involved is determined by the macrostate in a prescribed way (or is the macrostate).
I would add a big fat disclaimer that this series is NOT just limited to type-1 entropy, and (unless you disagree with my taxonomy here) emphasize heavily that you’re including type-2 entropy.
Indeed I definitely do.
There are a bunch of places where I think I flagged relevant things, and I’m curious if these seem like enough to you;
The whole post is called “abstract entropy”, which should tell you that it’s at least a little different from any “standard” form of entropy
The third example, “It helps us understand strategies for (and limits on) file compression”, is implicitly about K-complexity
This whole paragraph: “Many people reading this will have some previous facts about entropy stored in their minds, and this can sometimes be disorienting when it’s not yet clear how those facts are consistent with what I’m describing. You’re welcome to skip ahead to the relevant parts and see if they’re re-orienting; otherwise, if you can get through the whole explanation, I hope that it will eventually be addressed!”
Me being clear that I’m not a domain expert
Footnote [4], which talks about Turing machines and links to my post on Solomonoff induction
Me going on and on about binary strings and how we’re associating these with individual state—I dunno, to me this just screams K-complexity to anyone who’s heard of it
“I just defined entropy as a property of specific states, but in many contexts you don’t care at all about specific states...”
… “I’ll talk about this in a future post; I think that “order” is synonymous with Kolmogorov complexity.” …
I struggled with writing the intro section of this post because it felt like there were half a dozen disclaimer-type things that I wanted to get out of the way first. But each one is only relevant to a subset of people, and eventually I need to get to the content. I’m not even expecting most readers to be holding any such type-1/type-2 distinction in their mind to start, so I’d have to go out of my way to explain it before giving the disclaimer.
All that aside, I am very open to the idea that we should be calling the single-state thing something different. The “minimum average” form is the great majority of use cases.
I initially interpreted “abstract entropy” as meaning statistical entropy as opposed to thermodynamic or stat-mech or information-theoretic entropy. I think very few people encounter the phrase “algorithmic entropy” enough for it to be salient to them, so most confusion about entropy in different domains is about statistical entropy in physics and info theory. (Maybe this is different for LW readers!)
This was reinforced by the introduction because I took the mentions of file compression and assigning binary strings to states to be about (Shannon-style) coding theory, which uses statistical entropy heavily to talk about these same things and is a much bigger part of most CS textbooks/courses. (It uses phrases like “length of a codeword”, “expected length of a code [under some distribution]”, etc. and then has lots of theorems about statistical entropy being related to expected length of an optimal code.)
After getting that pattern going, I had enough momentum to see “Solomonoff”, think “sure, it’s a probability distribution, presumably he’s going to do something statistical-entropy-like with it”, and completely missed the statements that you were going to be interpreting K complexity itself as a kind of entropy. I also missed the statement about random variables not being necessary.
I suspect this would also happen to many other people who have encountered stat mech and/or information theory, and maybe even K complexity but not the phrase “algorithmic entropy”, but I could be wrong.
A disclaimer is probably not actually necessary, though, on reflection; I care a lot more about the “minimum average” qualifiers both being included in statistical-entropy contexts. I don’t know exactly how to unify this with “algorithmic entropy” but I’ll wait and see what you do :)
I like the schwa and lug proposals. Trying to anticipate problems, I do suspect newcomers will see ‘lug’, and find themselves confused, if it has never been explained to them. It even seems possible they may not connect it to logarithms sans explanation
Agreed!
Yeah, shorthand for this seems handy. I like these a lot, especially schwa, although I’m a little worried about ambiguous handwriting. My contest entry is
nl
(for “negative logarithm” or ”ln
but flipped”).Omfg, I love hearing your spicy takes. (I think I remember you advocating hard tabs, and trinary logic.)
XD XD guys I literally can’t
This aside raises concerns for me, like it makes me worry that maybe we’re more deeply not on the same page. It seems to me like the weighing is just straight-forward probability, and that it’s important to call it that.
I think I was overzealous with this aside and regret it.
I worry that the word “probability” has connotations that are too strong or are misleading for some use cases of abstract entropy.
But this is definitely probability in the mathematical sense, yes.
Maybe I wish mathematical “probability” had a name with weaker connotations.
Extremely pleased with this reception! I indeed feel pretty seen by it.