It seems unlikely that the first people to build fooming AGI will box it sufficiently thoughtfully.
I think it’s likely to work if implemented very carefully by the first people to build AGI. For instance, if they were careful, a
team of 100 people could manually watch everything the AI thinks, stopping its execution after every step and spending a year poring over its thoughts. With lots of fail-safes, with people assigned to watch researchers in case they try anything, with several nested layers of people watching so that if the AI infects an inner layer of people, an outer layer can just pull a lever and kill them all, etc. And with the AI inside several layers of simulated realities, so that if it does bad things in an inner layer we just kill it, and so on. Plus a thousand other precautions that we can think up if we have a couple centuries. Basically, there are asymmetries such that a little bit of human effort can make it astronomically more difficult for an AI to escape. But it seems likely that we won’t take advantage of all these asymmetries, especially if e.g. there’s something like an arms race.
(See also this, which details several ways to box AIs.)
Seems like an ad hominum attack. Why wouldn’t the people working on this be aware of the issues? My contrarian point is that people concerned about FAI should be working on AI boxing instead.
It seems unlikely that the first people to build fooming AGI will box it sufficiently thoughtfully.
I think it’s likely to work if implemented very carefully by the first people to build AGI. For instance, if they were careful, a team of 100 people could manually watch everything the AI thinks, stopping its execution after every step and spending a year poring over its thoughts. With lots of fail-safes, with people assigned to watch researchers in case they try anything, with several nested layers of people watching so that if the AI infects an inner layer of people, an outer layer can just pull a lever and kill them all, etc. And with the AI inside several layers of simulated realities, so that if it does bad things in an inner layer we just kill it, and so on. Plus a thousand other precautions that we can think up if we have a couple centuries. Basically, there are asymmetries such that a little bit of human effort can make it astronomically more difficult for an AI to escape. But it seems likely that we won’t take advantage of all these asymmetries, especially if e.g. there’s something like an arms race.
(See also this, which details several ways to box AIs.)
Seems like an ad hominum attack. Why wouldn’t the people working on this be aware of the issues? My contrarian point is that people concerned about FAI should be working on AI boxing instead.