TurnTrout comments on Attainable Utility Preservation: Concepts

TurnTrout 26 Jul 2020 12:56 UTC
LW: 2 AF: 1
AF
I was initially writing a comment about how AUP $_{conceptual}$ doesn’t seem to work in every case because there are actions that are catastrophic without raising its power (such as killing someone)
And why exactly would it be motivated to kill someone? This is generally incentivized only insofar as it leads to… power gain, it seems. I think that AUP $_{conceptual}$ should work just fine for penalizing-increases-only.
It does seem that AUP $_{conceptual}$ will make it so an agent doesn’t want to be shut off, though.
I think this is much less of a problem in the “penalize increases with respect to agent inaction” scenario.
- Rafael Harth 26 Jul 2020 13:20 UTC
  2 points
  Parent
  And why exactly would it be motivated to kill someone? This is generally incentivized only insofar as it leads to… power gain, it seems. I think that AUP $_{conceptual}$ should work just fine for penalizing-increases-only.
  The case I had in mind was “you have an AI assistant trained to keep you healthy, and the objective is operationalized in such a way that it maxes out if you’re dead (because then you can’t get sick)”. If the AI kills you, that doesn’t seem to increase its power in any way – it would probably lead to other people shutting it off, which is a decrease in power. Or, more generally, any objective that can be achieved by just destroying stuff.
  - TurnTrout 26 Jul 2020 18:19 UTC
    2 points
    Parent
    Yes, sure, but those aren’t catastrophes in the way I’ve defined it here (see also Toby Ord’s The Precipice; he espouses a similar definition). It’s not an existential threat, but you’re right that the agent might still do bad things.