Rafael Harth comments on Attainable Utility Preservation: Concepts

Rafael Harth 26 Jul 2020 13:20 UTC
2 points
And why exactly would it be motivated to kill someone? This is generally incentivized only insofar as it leads to… power gain, it seems. I think that AUP $_{conceptual}$ should work just fine for penalizing-increases-only.
The case I had in mind was “you have an AI assistant trained to keep you healthy, and the objective is operationalized in such a way that it maxes out if you’re dead (because then you can’t get sick)”. If the AI kills you, that doesn’t seem to increase its power in any way – it would probably lead to other people shutting it off, which is a decrease in power. Or, more generally, any objective that can be achieved by just destroying stuff.
- TurnTrout 26 Jul 2020 18:19 UTC
  2 points
  Parent
  Yes, sure, but those aren’t catastrophes in the way I’ve defined it here (see also Toby Ord’s The Precipice; he espouses a similar definition). It’s not an existential threat, but you’re right that the agent might still do bad things.