I still don’t see how it’s relevant, since I don’t see a reason why we would want to create an AI with a utility function like that. The problem goes away if we remove the “and then turning yourself off” part, right? Why would we give the AI a utility function that assigns 0 utility to an outcome where we get everything we want but it never turns itself off?
Why would we give the AI a utility function that assigns 0 utility to an outcome where we get everything we want but it never turns itself off?
The designer of that AI might have (naively?) thought this was a clever way of solving the friendliness problem. Do the thing I want, and then make sure to never do anything again. Surely that won’t lead to the whole universe being tiled with paperclips, etc.
This can arise indirectly, or through design, or for a host of reasons. That was the first thought that popped into my mind; I’m sure other relevant examples can be had. We might not assign such a utility—then again, we (or someone) might, which makes it relevant.
I still don’t see how it’s relevant, since I don’t see a reason why we would want to create an AI with a utility function like that. The problem goes away if we remove the “and then turning yourself off” part, right? Why would we give the AI a utility function that assigns 0 utility to an outcome where we get everything we want but it never turns itself off?
The designer of that AI might have (naively?) thought this was a clever way of solving the friendliness problem. Do the thing I want, and then make sure to never do anything again. Surely that won’t lead to the whole universe being tiled with paperclips, etc.
This can arise indirectly, or through design, or for a host of reasons. That was the first thought that popped into my mind; I’m sure other relevant examples can be had. We might not assign such a utility—then again, we (or someone) might, which makes it relevant.