The reason I think entropy minimization is basically an ok choice here is that there’s not much restriction on which variable’s entropy is minimized. There’s enough freedom that we can transform an expected-utility-maximization problem into an entropy-minimization problem.
In particular, suppose we have a utility variable U, and we want to maximize E[U]. As long as possible values of U are bounded above, we can subtract a constant without changing anything, making U strictly negative. Then, we define a new random variable Z, which is generated from U in such a way that its entropy (given U) is -U bits. For instance, we could let Z be a list of ⌊−U⌋50⁄50 coin flips, plus one biased coin flip with bias chosen so the entropy of the coin flip is −U−⌊−U⌋, i.e. the fractional part of U. Then, minimizing entropy of Z (unconditional on U) is equivalent to maximizing E[U].
I had a little tinker with this. It’s straightforward to choose a utility function where maximising it is equivalent to minimizing H(Z) - just set U(z)=logp(z).
As far as I can see, the other way round is basically as you suggested, but a tiny bit more fiddly. We can indeed produce a nonpositive U′ from which we make a new RV Z′ with H(Z′|z)=−U(z) as you suggested (e.g. with coinflips etc). But a simple shift of U isn’t enough. We need U′(z)=mU(z)−logp(z)+c (for some scalar m and c) - note the logp(z) term.
We take the z′ outcomes to be partitioned by z, i.e. they’re ′z happened and also I got z′ coinflip outcome’. Then P(z′)=P(z)P(z′|z) (where z is understood to be the particular z associated with z′). That means H(Z|Z′)=0 so H(Z′)=H(Z)+H(Z′|Z) (you can check this by spelling things out pointfully and rearranging, but I realised that I was just rederiving conditional entropy laws).
so minimizing H(Z′) is equivalent to maximising EzU(z).
Requiring U′(z)=mU(z)−logp(z)+c to be nonpositive for all z maybe places more constraints on things? Certainly U needs to be bounded above as you said. It’s also a bit weird and embedded as you hinted, because this utility function depends on the probability of the outcome, which is the thing being controlled/regulated by the decisioner. I don’t know if there are systems where this might not be well-defined even for bounded U, haven’t dug into it.
Okay, I agree that if you remove their determinism & full observability assumption (as you did in the post), it seems like your construction should work.
I still think that the original paper seems awful (because it’s their responsibility to justify choices like this in order to explain how their result captures the intuitive meaning of a ‘good regulator’).
The reason I think entropy minimization is basically an ok choice here is that there’s not much restriction on which variable’s entropy is minimized. There’s enough freedom that we can transform an expected-utility-maximization problem into an entropy-minimization problem.
In particular, suppose we have a utility variable U, and we want to maximize E[U]. As long as possible values of U are bounded above, we can subtract a constant without changing anything, making U strictly negative. Then, we define a new random variable Z, which is generated from U in such a way that its entropy (given U) is -U bits. For instance, we could let Z be a list of ⌊−U⌋ 50⁄50 coin flips, plus one biased coin flip with bias chosen so the entropy of the coin flip is −U−⌊−U⌋, i.e. the fractional part of U. Then, minimizing entropy of Z (unconditional on U) is equivalent to maximizing E[U].
I had a little tinker with this. It’s straightforward to choose a utility function where maximising it is equivalent to minimizing H(Z) - just set U(z)=logp(z).
As far as I can see, the other way round is basically as you suggested, but a tiny bit more fiddly. We can indeed produce a nonpositive U′ from which we make a new RV Z′ with H(Z′|z)=−U(z) as you suggested (e.g. with coinflips etc). But a simple shift of U isn’t enough. We need U′(z)=mU(z)−logp(z)+c (for some scalar m and c) - note the logp(z) term.
We take the z′ outcomes to be partitioned by z, i.e. they’re ′z happened and also I got z′ coinflip outcome’. Then P(z′)=P(z)P(z′|z) (where z is understood to be the particular z associated with z′). That means H(Z|Z′)=0 so H(Z′)=H(Z)+H(Z′|Z) (you can check this by spelling things out pointfully and rearranging, but I realised that I was just rederiving conditional entropy laws).
Then
−H(Z′)=−H(Z)−H(Z′|Z)=−H(Z)+EzU′(z)=−H(Z)+mEzU(z)−Ezlogp(z)+c=mEzU(z)+c
so minimizing H(Z′) is equivalent to maximising EzU(z).
Requiring U′(z)=mU(z)−logp(z)+c to be nonpositive for all z maybe places more constraints on things? Certainly U needs to be bounded above as you said. It’s also a bit weird and embedded as you hinted, because this utility function depends on the probability of the outcome, which is the thing being controlled/regulated by the decisioner. I don’t know if there are systems where this might not be well-defined even for bounded U, haven’t dug into it.
Okay, I agree that if you remove their determinism & full observability assumption (as you did in the post), it seems like your construction should work.
I still think that the original paper seems awful (because it’s their responsibility to justify choices like this in order to explain how their result captures the intuitive meaning of a ‘good regulator’).
Oh absolutely, the original is still awful and their proof does not work with the construction I just gave.
BTW, this got a huge grin out of me: