Ratios comments on All AGI Safety questions welcome (especially basic ones) [July 2023]

Ratios 4 Aug 2023 7:05 UTC
1 point
0
Why would it lie if you program its utility function in a way that puts:
solving these tests using minimal computation > self-preservation?

(Asking sincerely)
- mruwnik 4 Aug 2023 12:28 UTC
  1 point
  0
  Parent
  It depends a lot on how much it values self-preservation in comparison to solving the tests (putting aside the matter of minimal computation). Self-preservation is an instrumental goal, in that you can’t bring the coffee if you’re dead. So it seems likely that any intelligent enough AI will value self-preservation, if only in order to make sure it can achieve its goals.
  That being said, having an AI that is willing to do its task and then shut itself down (or to shut down when triggered) is an incredibly valuable thing to have—it’s already finished, but you could have a go at the shutdown problem.
  A more general issue is that this will handle a lot of cases, but not all of them. In that an AI that does lie (for whatever reason) will not be shut down. It sounds like something worth having in a swiss cheese way.
  (The whole point of these posts are to assume everyone is asking sincerely, so no worries)