While I think this post overall gives good intuition for the subject, it also creates some needless confusion.
Your concept of “abstract entropy” is just Shannon entropy applied to uniform distributions. Introducing Shannon entropy directly, while slightly harder, gives you a bunch of the ideas in this post more or less “for free”:
Macrostates are just events and microstates are atomic outcomes (as defined in probability theory). Any rules how the two relate to each other follow directly from the foundations of probability.
The fact that E[-log x] is the only reasonable function that can serve as a measure of information. You haven’t actually mentioned this (yet?), but having an axiomatic characterization of entropy hammers home that all of this stuff must be the way it is because of how probability works. For example, your “pseudo-entropy” of Rubik’s cube states (distance from the solved state) might be intuitive, but it is wrong!
derived concepts, such as conditional entropy and mutual information, fall out naturally from the probabilistic view.
the fact that an optimal bit/guess must divide the probability mass in half, not the probability space.
I hope I don’t come across as overly harsh. I know that entropy is often introduced in confusing ways in various sciences, specially physics, where its hopelessly intertwined with the concrete subject matter. But that’s definitely not the case in computer science (more correctly called informatics), which should be the field you’re looking at if you want to deeply understand the concept.
While I think this post overall gives good intuition for the subject, it also creates some needless confusion.
Your concept of “abstract entropy” is just Shannon entropy applied to uniform distributions. Introducing Shannon entropy directly, while slightly harder, gives you a bunch of the ideas in this post more or less “for free”:
Macrostates are just events and microstates are atomic outcomes (as defined in probability theory). Any rules how the two relate to each other follow directly from the foundations of probability.
The fact that E[-log x] is the only reasonable function that can serve as a measure of information. You haven’t actually mentioned this (yet?), but having an axiomatic characterization of entropy hammers home that all of this stuff must be the way it is because of how probability works. For example, your “pseudo-entropy” of Rubik’s cube states (distance from the solved state) might be intuitive, but it is wrong!
derived concepts, such as conditional entropy and mutual information, fall out naturally from the probabilistic view.
the fact that an optimal bit/guess must divide the probability mass in half, not the probability space.
I hope I don’t come across as overly harsh. I know that entropy is often introduced in confusing ways in various sciences, specially physics, where its hopelessly intertwined with the concrete subject matter. But that’s definitely not the case in computer science (more correctly called informatics), which should be the field you’re looking at if you want to deeply understand the concept.