Why can’t we program hard stops into AI, where it is required to pause and ask for further instruction?
If the AI is aware of the pauses, it can try to eliminate them (if the pauses are triggered by a circumstance X, it can find a clever way to technically avoid X), or to make itself receive the “instruction” it wants to receive (e.g. by threating or hypnotising a human, or by doing something that technically counts as human input).
The important aspect is that there are many different things the AI could try. (Maybe including those that can’t be “ELI5”. It is supposed to have superhuman intelligence.) Focusing on specific things is missing the point.
As a metaphor, imagine that a group of retarded people is trying to imprison MacGyver in a garden shed. Later MacGyver creates an explosive from his chewing gum, destroys a wall, and leaves. The moral of this story is not: “To imprison MacGyver reliably, you must take all the chewing gum from him.” The moral is: “If you are retarded, and your enemy is MacGyver, you almost certainly cannot imprison him in the garden shed.”
If you get this concept, then similar debates will feel like: “Let’s suppose we make really really sure he has no chewing gum. We will even check his shoes, although, realistically, no one keeps chewing gum in their shoes. But we will be extra careful, and will check his shoes anyway. What could possibly go wrong?”
If the AI is aware of the pauses, it can try to eliminate them (if the pauses are triggered by a circumstance X, it can find a clever way to technically avoid X), or to make itself receive the “instruction” it wants to receive (e.g. by threating or hypnotising a human, or by doing something that technically counts as human input).
I see.
This is the gist of the AI Box experiment, no?
The important aspect is that there are many different things the AI could try. (Maybe including those that can’t be “ELI5”. It is supposed to have superhuman intelligence.) Focusing on specific things is missing the point.
As a metaphor, imagine that a group of retarded people is trying to imprison MacGyver in a garden shed. Later MacGyver creates an explosive from his chewing gum, destroys a wall, and leaves. The moral of this story is not: “To imprison MacGyver reliably, you must take all the chewing gum from him.” The moral is: “If you are retarded, and your enemy is MacGyver, you almost certainly cannot imprison him in the garden shed.”
If you get this concept, then similar debates will feel like: “Let’s suppose we make really really sure he has no chewing gum. We will even check his shoes, although, realistically, no one keeps chewing gum in their shoes. But we will be extra careful, and will check his shoes anyway. What could possibly go wrong?”
No. Bribes and rational persuasion are fair game too.