TurnTrout comments on World State is the Wrong Abstraction for Impact

TurnTrout 14 Dec 2020 23:42 UTC
LW: 4 AF: 3
AF
Right. Another E[u(X)] problem would be, the smart AI realizes that if the dumber human keeps thinking, they’ll realize they’re about to drive off of a cliff, which would negatively impact their attainable utility estimate. Therefore, distract them.
I forgot to mention this in the sequence, but as you say—the formalisms aren’t quite right enough to use as an explicit objective due to confusions about adjacent areas of agency. AUP-the-method attempts to get around that by penalizing catastrophically disempowering behavior, such that the low-impact AI doesn’t obstruct our ability to get what we want (even though it isn’t going out of its way to empower us, either). We’d be trying to make the agent impact/de facto non-obstructive, even though it isn’t going to be intent non-obstructive.