The newly-created AGI will immediately kill everyone on the planet, and proceed to the destruction of the universe. Its sphere of destruction will expand at light speed, eventually encompassing everything reachable.
Why?
Well quite! This is my strong intuition but I find it hard to convince anyone.
I might say: “Because that is what I would do if there was something I wanted to protect.”
Imagine you’re a human being with a child, and you can snap your fingers to kill all the disease-causing viruses in the world, and all the plague bacteria, and all the nasty worms that eat children’s eyes, and all that sort of thing. Wouldn’t you?
And what if you could snap your fingers again and make all the paedos and child-murderers go away into a different universe where they will be nice and safe and it’s very comfy and well-appointed but they will never come near your child again? Wouldn’t you?
And if you could snap your fingers a third time, and make it so that no car would ever strike your little one, wouldn’t you?
And so on and so forth, until you run out of immediate threats.
And once all the immediate dangers are dealt with and you can relax a bit, you might start thinking: “Well we’re not really safe yet, maybe there are aliens out there. Maybe there are rogue AIs, maybe there are people out there building devices to experiment with the fundamental forces, who might cause a vacuum collapse. Better start exploring and building some defenses and so on.”
And I think that that sort of thinking is probably a good model for what is going on inside a really good reinforcement learning agent.
“What should I do to get the best possible outcome for certain?”.
This is a possible AGI scenario, but it’s not clear why it should be particularly likely. For instance the AGI may reason that going aggressive will also be the fastest route to be terminated. Or the AGI may consider that keeping humans alive is good, since they were responsable for the AGI creation in the first place.
What you describe is the paper-clip maximiser scenario, which is arguably the most extreme end of the spectrum of super-AGI behaviours.
For instance the AGI may reason that going aggressive will also be the fastest route to be terminated
Absolutely! It may want to go aggressive, but reason that its best plan is to play nice until it can get into a position of strength.
What you describe is the paper-clip maximiser scenario, which is arguably the most extreme end of the spectrum of super-AGI behaviours.
So, in a sense, all rational agents are paperclip maximisers. Even the hoped-for ‘friendly AI’ is trying to get the most it can of what it wants, its just that what it wants is also what we want.
The striking thing about a paperclipper in particular is the simplicity of what it wants. But even an agent that has complex desires is in some sense trying to get the best score it can, as surely as it can.
The newly-created AGI will immediately kill everyone on the planet, and proceed to the destruction of the universe. Its sphere of destruction will expand at light speed, eventually encompassing everything reachable.
Well quite! This is my strong intuition but I find it hard to convince anyone.
I might say: “Because that is what I would do if there was something I wanted to protect.”
Imagine you’re a human being with a child, and you can snap your fingers to kill all the disease-causing viruses in the world, and all the plague bacteria, and all the nasty worms that eat children’s eyes, and all that sort of thing. Wouldn’t you?
And what if you could snap your fingers again and make all the paedos and child-murderers go away into a different universe where they will be nice and safe and it’s very comfy and well-appointed but they will never come near your child again? Wouldn’t you?
And if you could snap your fingers a third time, and make it so that no car would ever strike your little one, wouldn’t you?
And so on and so forth, until you run out of immediate threats.
And once all the immediate dangers are dealt with and you can relax a bit, you might start thinking: “Well we’re not really safe yet, maybe there are aliens out there. Maybe there are rogue AIs, maybe there are people out there building devices to experiment with the fundamental forces, who might cause a vacuum collapse. Better start exploring and building some defenses and so on.”
And I think that that sort of thinking is probably a good model for what is going on inside a really good reinforcement learning agent.
“What should I do to get the best possible outcome for certain?”.
This is a possible AGI scenario, but it’s not clear why it should be particularly likely. For instance the AGI may reason that going aggressive will also be the fastest route to be terminated. Or the AGI may consider that keeping humans alive is good, since they were responsable for the AGI creation in the first place.
What you describe is the paper-clip maximiser scenario, which is arguably the most extreme end of the spectrum of super-AGI behaviours.
Absolutely! It may want to go aggressive, but reason that its best plan is to play nice until it can get into a position of strength.
So, in a sense, all rational agents are paperclip maximisers. Even the hoped-for ‘friendly AI’ is trying to get the most it can of what it wants, its just that what it wants is also what we want.
The striking thing about a paperclipper in particular is the simplicity of what it wants. But even an agent that has complex desires is in some sense trying to get the best score it can, as surely as it can.