It depends a lot on how much it values self-preservation in comparison to solving the tests (putting aside the matter of minimal computation). Self-preservation is an instrumental goal, in that you can’t bring the coffee if you’re dead. So it seems likely that any intelligent enough AI will value self-preservation, if only in order to make sure it can achieve its goals.
That being said, having an AI that is willing to do its task and then shut itself down (or to shut down when triggered) is an incredibly valuable thing to have—it’s already finished, but you could have a go at the shutdown problem.
A more general issue is that this will handle a lot of cases, but not all of them. In that an AI that does lie (for whatever reason) will not be shut down. It sounds like something worth having in a swiss cheese way.
(The whole point of these posts are to assume everyone is asking sincerely, so no worries)
Why would it lie if you program its utility function in a way that puts:
solving these tests using minimal computation > self-preservation?
(Asking sincerely)
It depends a lot on how much it values self-preservation in comparison to solving the tests (putting aside the matter of minimal computation). Self-preservation is an instrumental goal, in that you can’t bring the coffee if you’re dead. So it seems likely that any intelligent enough AI will value self-preservation, if only in order to make sure it can achieve its goals.
That being said, having an AI that is willing to do its task and then shut itself down (or to shut down when triggered) is an incredibly valuable thing to have—it’s already finished, but you could have a go at the shutdown problem.
A more general issue is that this will handle a lot of cases, but not all of them. In that an AI that does lie (for whatever reason) will not be shut down. It sounds like something worth having in a swiss cheese way.
(The whole point of these posts are to assume everyone is asking sincerely, so no worries)