Oliver Sourbut comments on Fixing The Good Regulator Theorem

Oliver Sourbut 24 Nov 2023 15:50 UTC
6 points
0
I had a little tinker with this. It’s straightforward to choose a utility function where maximising it is equivalent to minimizing $H (Z)$ - just set $U (z) = log p (z)$ .

As far as I can see, the other way round is basically as you suggested, but a tiny bit more fiddly. We can indeed produce a nonpositive $U^{'}$ from which we make a new RV $Z^{'}$ with $H (Z^{'} | z) = - U (z)$ as you suggested (e.g. with coinflips etc). But a simple shift of $U$ isn’t enough. We need $U^{'} (z) = m U (z) - log p (z) + c$ (for some scalar $m$ and $c$ ) - note the $log p (z)$ term.

We take the $z^{'}$ outcomes to be partitioned by $z$ , i.e. they’re ′ $z$ happened and also I got $z^{'}$ coinflip outcome’. Then $P (z^{'}) = P (z) P (z^{'} | z)$ (where $z$ is understood to be the particular $z$ associated with $z^{'}$ ). That means $H (Z | Z^{'}) = 0$ so $H (Z^{'}) = H (Z) + H (Z^{'} | Z)$ (you can check this by spelling things out pointfully and rearranging, but I realised that I was just rederiving conditional entropy laws).

Then

$\begin{matrix} - H (Z^{'}) & = - H (Z) - H (Z^{'} | Z) = - H (Z) + E_{z} U^{'} (z) = - H (Z) + m E_{z} U (z) - E_{z} log p (z) + c = m E_{z} U (z) + c \end{matrix}$

so minimizing $H (Z^{'})$ is equivalent to maximising $E_{z} U (z)$ .

Requiring $U^{'} (z) = m U (z) - log p (z) + c$ to be nonpositive for all $z$ maybe places more constraints on things? Certainly $U$ needs to be bounded above as you said. It’s also a bit weird and embedded as you hinted, because this utility function depends on the probability of the outcome, which is the thing being controlled/regulated by the decisioner. I don’t know if there are systems where this might not be well-defined even for bounded $U$ , haven’t dug into it.