johnswentworth comments on Fixing The Good Regulator Theorem

johnswentworth 10 Feb 2021 19:23 UTC
LW: 9 AF: 6
AF
The reason I think entropy minimization is basically an ok choice here is that there’s not much restriction on which variable’s entropy is minimized. There’s enough freedom that we can transform an expected-utility-maximization problem into an entropy-minimization problem.
In particular, suppose we have a utility variable U, and we want to maximize E[U]. As long as possible values of U are bounded above, we can subtract a constant without changing anything, making U strictly negative. Then, we define a new random variable Z, which is generated from U in such a way that its entropy (given U) is -U bits. For instance, we could let Z be a list of $⌊ - U ⌋$ ⁵⁰⁄₅₀ coin flips, plus one biased coin flip with bias chosen so the entropy of the coin flip is $- U - ⌊ - U ⌋$ , i.e. the fractional part of U. Then, minimizing entropy of Z (unconditional on U) is equivalent to maximizing E[U].
- Oliver Sourbut 24 Nov 2023 15:50 UTC
  6 points
  0
  Parent
  I had a little tinker with this. It’s straightforward to choose a utility function where maximising it is equivalent to minimizing $H (Z)$ - just set $U (z) = log p (z)$ .
  
  As far as I can see, the other way round is basically as you suggested, but a tiny bit more fiddly. We can indeed produce a nonpositive $U^{'}$ from which we make a new RV $Z^{'}$ with $H (Z^{'} | z) = - U (z)$ as you suggested (e.g. with coinflips etc). But a simple shift of $U$ isn’t enough. We need $U^{'} (z) = m U (z) - log p (z) + c$ (for some scalar $m$ and $c$ ) - note the $log p (z)$ term.
  
  We take the $z^{'}$ outcomes to be partitioned by $z$ , i.e. they’re ′ $z$ happened and also I got $z^{'}$ coinflip outcome’. Then $P (z^{'}) = P (z) P (z^{'} | z)$ (where $z$ is understood to be the particular $z$ associated with $z^{'}$ ). That means $H (Z | Z^{'}) = 0$ so $H (Z^{'}) = H (Z) + H (Z^{'} | Z)$ (you can check this by spelling things out pointfully and rearranging, but I realised that I was just rederiving conditional entropy laws).
  
  Then
  
  $\begin{matrix} - H (Z^{'}) & = - H (Z) - H (Z^{'} | Z) = - H (Z) + E_{z} U^{'} (z) = - H (Z) + m E_{z} U (z) - E_{z} log p (z) + c = m E_{z} U (z) + c \end{matrix}$
  
  so minimizing $H (Z^{'})$ is equivalent to maximising $E_{z} U (z)$ .
  
  Requiring $U^{'} (z) = m U (z) - log p (z) + c$ to be nonpositive for all $z$ maybe places more constraints on things? Certainly $U$ needs to be bounded above as you said. It’s also a bit weird and embedded as you hinted, because this utility function depends on the probability of the outcome, which is the thing being controlled/regulated by the decisioner. I don’t know if there are systems where this might not be well-defined even for bounded $U$ , haven’t dug into it.
- TurnTrout 10 Feb 2021 19:32 UTC
  LW: 4 AF: 3
  AF Parent
  Okay, I agree that if you remove their determinism & full observability assumption (as you did in the post), it seems like your construction should work.
  I still think that the original paper seems awful (because it’s their responsibility to justify choices like this in order to explain how their result captures the intuitive meaning of a ‘good regulator’).
  - johnswentworth 10 Feb 2021 19:36 UTC
    LW: 8 AF: 6
    AF Parent
    Oh absolutely, the original is still awful and their proof does not work with the construction I just gave.
    BTW, this got a huge grin out of me:
    Status: strong opinions, weakly held. not a control theorist; not only ready to eat my words, but I’ve already set the table.
    As I understand it, the original good regulator theorem seems even dumber than you point out.