abramdemski comments on Policy Alignment

abramdemski 13 Jul 2018 1:02 UTC
LW: 3 AF: 2
AF
“Policy alignment” seems like an improvement, especially since “policy approval” invokes government policy.
With respect to the rest:
On the one hand, I’m tempted to say that to the extent you recognize how confused you are about what probabilities are, and that this confusion has to do with how you reason in the real world, your $P_{H}$ is going to change a lot when updated on certain philosophical arguments. As a result, optimizing a strategy updatelessly via $P_{H}$ is going to take that into account, shifting behavior significantly in contingencies in which various philosophical arguments emerge, and potentially putting a significant amount of processing power toward searching for such arguments.
On the other hand, I buy my “policy alignment” proposal only to the extent that I buy UDT, which is not entirely. I don’t know how to think about UDT together with the shifting probabilities which come from logical induction. The problem is similar to the one you outline: just as it is unclear that a human should think its own $P_{H}$ has any useful content which should be locked in forever in an updateless reasoner, it is similarly unclear that a fixed logical inductor state (after running for a finite amount of time) has any useful content which one would want to lock in forever.
I don’t yet know how to think about this problem. I suspect there’s something non-obvious to be said about the extent to which $P_{H}$ trusts other belief distributions (IE, something at least a bit more compelling than the answer I gave first, but not entirely different in form).