The important aspect is that there are many different things the AI could try. (Maybe including those that can’t be “ELI5”. It is supposed to have superhuman intelligence.) Focusing on specific things is missing the point.
As a metaphor, imagine that a group of retarded people is trying to imprison MacGyver in a garden shed. Later MacGyver creates an explosive from his chewing gum, destroys a wall, and leaves. The moral of this story is not: “To imprison MacGyver reliably, you must take all the chewing gum from him.” The moral is: “If you are retarded, and your enemy is MacGyver, you almost certainly cannot imprison him in the garden shed.”
If you get this concept, then similar debates will feel like: “Let’s suppose we make really really sure he has no chewing gum. We will even check his shoes, although, realistically, no one keeps chewing gum in their shoes. But we will be extra careful, and will check his shoes anyway. What could possibly go wrong?”
I see.
This is the gist of the AI Box experiment, no?
The important aspect is that there are many different things the AI could try. (Maybe including those that can’t be “ELI5”. It is supposed to have superhuman intelligence.) Focusing on specific things is missing the point.
As a metaphor, imagine that a group of retarded people is trying to imprison MacGyver in a garden shed. Later MacGyver creates an explosive from his chewing gum, destroys a wall, and leaves. The moral of this story is not: “To imprison MacGyver reliably, you must take all the chewing gum from him.” The moral is: “If you are retarded, and your enemy is MacGyver, you almost certainly cannot imprison him in the garden shed.”
If you get this concept, then similar debates will feel like: “Let’s suppose we make really really sure he has no chewing gum. We will even check his shoes, although, realistically, no one keeps chewing gum in their shoes. But we will be extra careful, and will check his shoes anyway. What could possibly go wrong?”
No. Bribes and rational persuasion are fair game too.