abramdemski comments on Policy Alignment

abramdemski 18 Jul 2018 22:17 UTC
LW: 2 AF: 1
0
AF
I think assuming that you have access to the proof of what Omega does means that you have already determined your own behavior.
You may not recognize it as such, especially if Omega is using a different axiom system than you. So, you can still be ignorant of what you’ll do while knowing what Omega’s prediction of you is. This makes it impossible for your probability distribution to treat the two as correlated anymore.
but if that’s taken to be _part of the prior_, then it seems you no longer have the chance to (acausally) influence what Omega does
Yeah, that’s the problem here.
And if it’s not part of the prior, then I think a value-learning agent with a good decision theory can get the $500.
Only if the agent takes that one proof out of the prior, but still has enough structure in the prior to see how the decision problem plays out. This is the problem of constructing a thin prior. You can (more or less) solve any decision problem by making the agent sufficiently updateless, but you run up against the problem of making it too updateless, at which point it behaves in absurd ways (lacking enough structure to even understand the consequences of policies correctly).
Hence the intuition that the correct prior to be updateless with respect to is the human one (which is, essentially, the main point of the post).