I’ve been wondering about the limits of AI boxing myself. The ultimate hazard from an AI that is set up to even possibly be useful is when an AI is set up to transmit a (very limited) message before self-destruct (and so that the AI cannot witness the result of any of its actions including that message) and that message is still hazardous.
THat, or that the AI can somehow pervert mundane computing hardware into manipulators and sensors through software alone despite the fact that mundane computing hardware is designed to be the opposite of that
The ultimate hazard from an AI that is set up to even possibly be useful is when an AI is set up to transmit a (very limited) message before self-destruct (and so that the AI cannot witness the result of any of its actions including that message) and that message is still hazardous.
The self-fulfilling prophecy has been well-known in fiction for centuries. Or the ambiguous prophecy—consider what is said to have happened when Croesus asked the Oracle whether he should attack the Persians. “If you attack,” the Oracle reputedly said, “you will destroy a great empire.” Wanting to destroy the great Persian empire, and encouraged by this answer, Croesus immediately attacked...
...an action which led to the Persians promptly destroying Croesus’ empire.
Prophecy can be a weapon, and it can be turned against those who know what the prophecy says.
I don’t stop being amazed that some people here seriously think that a lesser intelligence can reliably contain a greater one. There are just too many outs a lesser intelligence cannot even conceive of. Even the best cage thought up by five-year-olds will not contain an adult, and the gap between a human and a full-blown AGI is much larger than that.
I agree that the boxing problem is BIG and that a useful super-AI cannot be boxed.
However, some things can just be too one-sided. I’m pretty much entirely talking about 1. No manipulators and 2. No communication.
Even a 5 year old can come up with three meter thick metal boxes.
My sample is: AI using 1000kg of present day computer hardware, powered by battery, attached to no manipulators, sensors, or comms hardware whatsoever, encased in multiple layers of vacuums, Faraday cages, antimatter screens, gaps containing ionizing and neutron radiation, and vibration isolators, inside a completely separate spacecraft with no extra fuel and controlled by primitive (possibly analog or mechanical) flight contollers, containing multiple nukes with fail-deadly time-delay or remote-activated fuses. The ship is placed at a Venusian Lagrange point and the AI (pre-loaded with some degree of knowledge) is turned on and allowed to FOOM for a period of time T before:
Option A) The nukes explode
Option B) N output bytes are copied by an isolated supervisor system and transmitted; the nukes then explode before there is time for a light-speed round trip from Venus Lagrange to Earth and back.
I seem to remember hearing something about a one-line AI-Box, for extremely compact Oddly Persuasive Arguments or infohazards. Still, however, that involved the possibility of being able to unbox the AI whereas this does not. Some people have implied that even Option A above is dangerous which seems impossible to me unless the AI has extremely detailed, specific almost atom-by-atom physics and physical data PLUS just the right sort of (poorly designed?) electronics. I find this… preposterous. This is of course a completely useless AI and does not obviate the need for Friendliness, or at least Obedience / Limitedness.
You probably meant to reply to ikrase.
I’ve been wondering about the limits of AI boxing myself. The ultimate hazard from an AI that is set up to even possibly be useful is when an AI is set up to transmit a (very limited) message before self-destruct (and so that the AI cannot witness the result of any of its actions including that message) and that message is still hazardous.
THat, or that the AI can somehow pervert mundane computing hardware into manipulators and sensors through software alone despite the fact that mundane computing hardware is designed to be the opposite of that
The self-fulfilling prophecy has been well-known in fiction for centuries. Or the ambiguous prophecy—consider what is said to have happened when Croesus asked the Oracle whether he should attack the Persians. “If you attack,” the Oracle reputedly said, “you will destroy a great empire.” Wanting to destroy the great Persian empire, and encouraged by this answer, Croesus immediately attacked...
...an action which led to the Persians promptly destroying Croesus’ empire.
Prophecy can be a weapon, and it can be turned against those who know what the prophecy says.
I don’t stop being amazed that some people here seriously think that a lesser intelligence can reliably contain a greater one. There are just too many outs a lesser intelligence cannot even conceive of. Even the best cage thought up by five-year-olds will not contain an adult, and the gap between a human and a full-blown AGI is much larger than that.
I agree that the boxing problem is BIG and that a useful super-AI cannot be boxed.
However, some things can just be too one-sided. I’m pretty much entirely talking about 1. No manipulators and 2. No communication.
Even a 5 year old can come up with three meter thick metal boxes.
My sample is: AI using 1000kg of present day computer hardware, powered by battery, attached to no manipulators, sensors, or comms hardware whatsoever, encased in multiple layers of vacuums, Faraday cages, antimatter screens, gaps containing ionizing and neutron radiation, and vibration isolators, inside a completely separate spacecraft with no extra fuel and controlled by primitive (possibly analog or mechanical) flight contollers, containing multiple nukes with fail-deadly time-delay or remote-activated fuses. The ship is placed at a Venusian Lagrange point and the AI (pre-loaded with some degree of knowledge) is turned on and allowed to FOOM for a period of time T before:
Option A) The nukes explode
Option B) N output bytes are copied by an isolated supervisor system and transmitted; the nukes then explode before there is time for a light-speed round trip from Venus Lagrange to Earth and back.
I seem to remember hearing something about a one-line AI-Box, for extremely compact Oddly Persuasive Arguments or infohazards. Still, however, that involved the possibility of being able to unbox the AI whereas this does not. Some people have implied that even Option A above is dangerous which seems impossible to me unless the AI has extremely detailed, specific almost atom-by-atom physics and physical data PLUS just the right sort of (poorly designed?) electronics. I find this… preposterous. This is of course a completely useless AI and does not obviate the need for Friendliness, or at least Obedience / Limitedness.
Your conclusion is good, this premise isn’t:
“Let’s throw them down that well!”
“I am going to lock you in Daddy’s Jail cell!”
Many of the best cages thought up by five year olds will easily contain an adult (and sometimes accidentally outright kill or incapacitate them).