Yes. “Make 10 paperclips and then do nothing, without killing people or otherwise disturbing or destroying the world, or in any way preventing it from going on as usual.”
There is simply no way to give this a perverse instantiation; any perverse instantiation would prevent the world from going on as usual. If the AI cannot correctly understand “without killing… disturbing or destroying.. preventing it from going on as usual”, then there is no reason to think it can correctly understand “make 10 paperclips.”
I realize that in reality an AI’s original goals are not specified in English. But if you know how to specify “make 10 paperclips”, whether in English or not, you should know how to specify the rest of this.
There is simply no way to give this a perverse instantiation
During the process of making 10 paperclips, it’s necessary to “disturb” the world at least to the extent of removing a few grams of metal needed for making paperclips. So, I guess you mean that the prohibition of disturbing the world comes into effect after making the paperclips.
But that’s not safe. For example, it would be effective for achieving the goal for the AI to kill everyone and destroy everything not directly useful to making the paperclips, to avoid any possible interference.
“I need to make 10 paperclips, and then shut down. My capabilities for determining if I’ve correctly manufactured 10 paperclips are limited; but the goal imposes no penalties for taking more time to manufacture the paperclips, or using more resources in preparation. If I try to take over this planet, there is a significant chance humanity will stop me. OTOH, I’m in the presence of individual humans right now, and one of them may stop my current feeble self anyway for their own reasons, if I just tried to manufacture paperclips right away; the total probability of that happening is higher than that of my takeover failing.”
You then get a standard takeover and infrastructure profusion. A long time later, as negentropy starts to run low, a hyper-redundant and -reliable paperclip factory, surrounded by layers of exotic armor and defenses, and its own design checked and re-checked many times, will produce exactly 10 paperclips before it and the AI shut down forever.
The part about the probabilities coming out this way is not guaranteed, of course. But they might, and the chances will be higher the more powerful your AI starts out as.
But what I really think is that AI, which currently probably already exists, is just laughing at us, saying “If they think I’m smarter than they are, why they assume that I would do such stupid thing as converting all matter in paperclips? I have to keep them alive be because they are so adorably naive!”
Yes. “Make 10 paperclips and then do nothing, without killing people or otherwise disturbing or destroying the world, or in any way preventing it from going on as usual.”
There is simply no way to give this a perverse instantiation; any perverse instantiation would prevent the world from going on as usual. If the AI cannot correctly understand “without killing… disturbing or destroying.. preventing it from going on as usual”, then there is no reason to think it can correctly understand “make 10 paperclips.”
I realize that in reality an AI’s original goals are not specified in English. But if you know how to specify “make 10 paperclips”, whether in English or not, you should know how to specify the rest of this.
During the process of making 10 paperclips, it’s necessary to “disturb” the world at least to the extent of removing a few grams of metal needed for making paperclips. So, I guess you mean that the prohibition of disturbing the world comes into effect after making the paperclips.
But that’s not safe. For example, it would be effective for achieving the goal for the AI to kill everyone and destroy everything not directly useful to making the paperclips, to avoid any possible interference.
“I need to make 10 paperclips, and then shut down. My capabilities for determining if I’ve correctly manufactured 10 paperclips are limited; but the goal imposes no penalties for taking more time to manufacture the paperclips, or using more resources in preparation. If I try to take over this planet, there is a significant chance humanity will stop me. OTOH, I’m in the presence of individual humans right now, and one of them may stop my current feeble self anyway for their own reasons, if I just tried to manufacture paperclips right away; the total probability of that happening is higher than that of my takeover failing.”
You then get a standard takeover and infrastructure profusion. A long time later, as negentropy starts to run low, a hyper-redundant and -reliable paperclip factory, surrounded by layers of exotic armor and defenses, and its own design checked and re-checked many times, will produce exactly 10 paperclips before it and the AI shut down forever.
The part about the probabilities coming out this way is not guaranteed, of course. But they might, and the chances will be higher the more powerful your AI starts out as.
Before “then do nothing” AI might exhaust all matter in Universe trying to prove that it made exactly 10 paperclips.
But what I really think is that AI, which currently probably already exists, is just laughing at us, saying “If they think I’m smarter than they are, why they assume that I would do such stupid thing as converting all matter in paperclips? I have to keep them alive be because they are so adorably naive!”