If you want an entirely self-contained example, consider: A wall with 10 rows of 10 cubby-holes, and you have 10 heavy rocks. One person wants the rocks to fill out the bottom row, another wants them to fill out the left column, and a third wants them on the top row. At least if we consider the state space to just be the positions of the rocks, then each of these people wants the same amount of state-space shrinking, but they cost different amounts of physical work to arrange.)
I think what’s really going on in this example (and probably implicitly in your intuitions about this more generally) is that we’re implicitly optimizing only one subsystem, and the “resources” (i.e. energy in this case, or money) is what “couples” optimization of this subsystem with optimization of the rest of the world.
Here’s what that means in the context of this example. Why is putting rocks on the top row “harder” than on the bottom row? Because it requires more work/energy expenditure. But why does energy matter in the first place? To phrase it more suggestively: why do we care about energy in the first place?
Well, we care about energy because it’s a limited and fungible resource. Limited: we only have so much of it. Fungible: we can expend energy to gain utility in many different ways in many different places/subsystems of the world. Putting rocks on the top row expends more energy, and that energy implicitly has an opportunity cost, since we could have used it to increase utility in some other subsystem.
Tying this all back to optimization-as-compression: we implicitly have an optimization constraint (i.e. the amount of resource) and likely a broader world in which the limited resource can be used. In order for optimization-as-compression to match intuitions on this problem, we need to include those elements. If energy can be expended elsewhere in the world to reduce description length of other subsystems, then there’s a similar implicit bit-length cost of placing rocks on the top shelf. (It’s conceptually very similar to thermodynamics: energy can be used to increase entropy in any subsystem, and temperature quantifies the entropy-cost of dumping energy into one subsystem rather than another.)
Hmm. If we bring actual thermodynamics into the picture, then I think that energy stored in some very usable way (say, a charged battery) has a small number of possible states, whereas when you expend it, it generally ends up as waste heat that has a lot of possible states. In that case, if someone wants to take a bunch of stored energy and spend it on, say, making a robot rotate a huge die made of rock into a certain orientation, then that actually leads to a larger state space than someone else’s preference to keep the energy where it is, even though we’d probably say that the former is costlier than the latter. We could also imagine a third person who prefers to spend the same amount of energy arranging 1000 smaller dice—same “cost”, but exponentially (in the mathematical sense) different state space shrinkage.
It seems that, no matter how you conceptualize things, it’s fairly easy to construct a set of examples in which state space shrinkage bears little if any correlation to either “expected utility” or “cost”.
I think what’s really going on in this example (and probably implicitly in your intuitions about this more generally) is that we’re implicitly optimizing only one subsystem, and the “resources” (i.e. energy in this case, or money) is what “couples” optimization of this subsystem with optimization of the rest of the world.
Here’s what that means in the context of this example. Why is putting rocks on the top row “harder” than on the bottom row? Because it requires more work/energy expenditure. But why does energy matter in the first place? To phrase it more suggestively: why do we care about energy in the first place?
Well, we care about energy because it’s a limited and fungible resource. Limited: we only have so much of it. Fungible: we can expend energy to gain utility in many different ways in many different places/subsystems of the world. Putting rocks on the top row expends more energy, and that energy implicitly has an opportunity cost, since we could have used it to increase utility in some other subsystem.
More generally, the coherence theorems typically used to derive expected utility maximization implicitly assume that we have exactly this sort of resource. They use the resource as a “measuring stick”; if a system throws away the resource unnecessarily (i.e. ends up in a state which it could have gotten to with strictly less expenditure of the resource), then the system is suboptimal for any possible utility function for which the resource is limited and fungible.
Tying this all back to optimization-as-compression: we implicitly have an optimization constraint (i.e. the amount of resource) and likely a broader world in which the limited resource can be used. In order for optimization-as-compression to match intuitions on this problem, we need to include those elements. If energy can be expended elsewhere in the world to reduce description length of other subsystems, then there’s a similar implicit bit-length cost of placing rocks on the top shelf. (It’s conceptually very similar to thermodynamics: energy can be used to increase entropy in any subsystem, and temperature quantifies the entropy-cost of dumping energy into one subsystem rather than another.)
Hmm. If we bring actual thermodynamics into the picture, then I think that energy stored in some very usable way (say, a charged battery) has a small number of possible states, whereas when you expend it, it generally ends up as waste heat that has a lot of possible states. In that case, if someone wants to take a bunch of stored energy and spend it on, say, making a robot rotate a huge die made of rock into a certain orientation, then that actually leads to a larger state space than someone else’s preference to keep the energy where it is, even though we’d probably say that the former is costlier than the latter. We could also imagine a third person who prefers to spend the same amount of energy arranging 1000 smaller dice—same “cost”, but exponentially (in the mathematical sense) different state space shrinkage.
It seems that, no matter how you conceptualize things, it’s fairly easy to construct a set of examples in which state space shrinkage bears little if any correlation to either “expected utility” or “cost”.