Martín Soto comments on [missing post]

Martín Soto 15 Nov 2022 23:27 UTC
2 points
0
What prior over policies?
Some kind of simplicity prior, as mentioned here.
Suppose the prior over policies is max-entropy (uniform over all action sequences). If the number of “actions” is greater than the number of bits it takes to specify my brain^[1], it seems like it would conclude that my utility function is something like “1 if {acts exactly like [insert exact copy of my brain] would}, else 0″.
Yes. In fact I’m not even sure we need your assumption about bits. Say policies are sequences of actions, and suppose at each time step we have $N$ actions available. Then, in our process of approximating your perfect/overfitted utility “1 if {acts exactly like [insert exact copy of my brain] would}, else 0”, adding one more specified action to our $U$ can be understood as adding one more symbol to its generating program, and so incrementing $K (U)$ by 1. But also, adding one more (perfect) specified action multiplies the denominator probability by $\frac{1}{N}$ (since the prior is uniform). So as long as $N > 2$ , $P r [U]$ will be unbounded when approximating your utility.
And of course, this is solved by the simplicity prior, because this makes it easier for simple $U$ s to achieve low denominator probability. So a way simpler $U$ (less overfitted to $G$ *) will achieve almost the same low denominator probability as your function, because the only policies that maximize $U$ better than $G$ * are too complex.