Edit: I want to be clear that I think its fairly likely that SOME AI could do what’s described above, and have absolute runaway capacity, but that it’s likely the first few won’t. Prototypes just don’t work very well, in general. And yes, an AI is a bit different if it is self-modifying and intelligence comes closely tied with agency. But there’s still tons of possibilities for an AI that is very unlikely to exponentially self-improve (consider an in silico brain simulation; hardware requirements and a fine-grained control over input and simulation speed would seem to make this pretty safe).
I find the story outlined above plausible, but not extremely probable, especially with regards to whatever the first AGI worth the name looks like. It will be smarter than people in some ways, but mostly dumber, or severely limited in some capacity. The architecture may be self-modifying, but it will be possible to monitor it to some degree, and also to prevent the system from having a full view of itself (in fact it is easier to prevent the system from having a complete view of itself than not).
The runaway scenario you outline is more likely if the AI is allowed to run rampant, which is why any attempt to build an AGI should include plenty of security protocols, completely orthogonal to any in-code attempts to enforce friendliness (you can’t argue against redundancy when it comes to safety!). Sandbox the AI. Apart from a fixed initial store of information (could be as large as the whole web), let nothing get in or out except for some very low bandwidth channel (text, video). Make the barrier physical if necessary.
And have the whole thing run by paranoid bureaucrats. Treat the AI as dangerous if not contained. I’ll elaborate more later, but while there certainly is a failure mode for any security protocol, that’s not the same as saying that protocols aren’t worth developing and valuable for a wide range of circumstances.
A prison can contain a prisoner smarter than the architect or any of the guards or wardens. Not one infinitely smarter, but there’s definitely a margin of safety.
let nothing get in or out except for some very low bandwidth channel (text, video)
You may want to read this. Basically it is the scenario you describe, except for a smart human taking the place of an AI, and it turns out to be insufficient to contain the AI.
Yeah, I’ve seen this. It’s pretty frustrating because of the secrecy. All I know is two guys let Yudkowsky out of the box.
But I think there are two reasons why that scenario is actually very favorable to the AI.
1) An AI that is a bit dumber than humans in all ways, dumber in some ways and smarter in others, or just a little bit smarter than humans can still teach you a lot about further AIs you’d want to builid, and it seems at least plausible that an AI that’s 2xHuman intelligence will come along before one that’s 1000x human intelligence. We want to get as much out of the 2x-10x AIs as possible before we try building, and IF it is possble to avoid accidentally making that 1000x AI before you can make sure it is safe, then there could be a point where you are dealing with 10x AIs and where you can do so safely.
So don’t pit your engineer against an unboundedly transhuman AI. Pit your engineer against a slightly transhuman AI.
2) Security-wise, you can do a ton better than somebody who is just “really really sure they wouldn’t let the AI out”. You can have the AI talk only to people who literally cannot let the AI out, and on top of that ensure that everyone is briefed on the risks. Make sure that to the extent possible, letting the AI out is literally not an option. It is always an option to attempt to do so covertly, or to attempt to convince people to break self-imposed rules, sure, but how many prisoners in history have talked their way out of prison?
You can even tell people “Okay, we’re never ever letting this AI out. It is a prototype and considered dangerous. But hard, diligent work with attention to safety protocols will ensure we can someday build a safe AI that will have every capacity and more that this one does, to the enormous benefit of humanity”.
3) If you look at this answer and say “well you sound like you think you’re that much better than the people who took the challenge and lost not even to a transhuman AI but to Yudkowsky” and you’d be a little right. But mostly I believe that if we ever face that actual situation, the challenge the AI faces will be more akin to trying to cause a nuclear weapon launch with a telephone; not just one person but an entire formidable physical and bureaucratic security apparatus will (or SHOULD) stand in the way.
There’s also the issue of the political problem of FAI. You might not have a choice but to rely on less-than-desirable safety protocols because of external pressures. Say a genuinely feasible plan for AGI is developed that would take 2 years of research, while the most optimistic friendliness researchers doubt that the architecture could be made provably safe in sooner than 10. How will you confront the problem of convincing every lab in the world with sufficient resources to hold off for 10 years? Would that even be possible?
I guess we would have to look at the past and see how political and research institutions have behaved in the face of technologies with enormous potential but which were also known to pose a concrete existential risk (Yes I am talking about nuclear weapons).
Edit: I want to be clear that I think its fairly likely that SOME AI could do what’s described above, and have absolute runaway capacity, but that it’s likely the first few won’t. Prototypes just don’t work very well, in general. And yes, an AI is a bit different if it is self-modifying and intelligence comes closely tied with agency. But there’s still tons of possibilities for an AI that is very unlikely to exponentially self-improve (consider an in silico brain simulation; hardware requirements and a fine-grained control over input and simulation speed would seem to make this pretty safe).
I find the story outlined above plausible, but not extremely probable, especially with regards to whatever the first AGI worth the name looks like. It will be smarter than people in some ways, but mostly dumber, or severely limited in some capacity. The architecture may be self-modifying, but it will be possible to monitor it to some degree, and also to prevent the system from having a full view of itself (in fact it is easier to prevent the system from having a complete view of itself than not).
The runaway scenario you outline is more likely if the AI is allowed to run rampant, which is why any attempt to build an AGI should include plenty of security protocols, completely orthogonal to any in-code attempts to enforce friendliness (you can’t argue against redundancy when it comes to safety!). Sandbox the AI. Apart from a fixed initial store of information (could be as large as the whole web), let nothing get in or out except for some very low bandwidth channel (text, video). Make the barrier physical if necessary.
And have the whole thing run by paranoid bureaucrats. Treat the AI as dangerous if not contained. I’ll elaborate more later, but while there certainly is a failure mode for any security protocol, that’s not the same as saying that protocols aren’t worth developing and valuable for a wide range of circumstances.
A prison can contain a prisoner smarter than the architect or any of the guards or wardens. Not one infinitely smarter, but there’s definitely a margin of safety.
You may want to read this. Basically it is the scenario you describe, except for a smart human taking the place of an AI, and it turns out to be insufficient to contain the AI.
Yeah, I’ve seen this. It’s pretty frustrating because of the secrecy. All I know is two guys let Yudkowsky out of the box.
But I think there are two reasons why that scenario is actually very favorable to the AI.
1) An AI that is a bit dumber than humans in all ways, dumber in some ways and smarter in others, or just a little bit smarter than humans can still teach you a lot about further AIs you’d want to builid, and it seems at least plausible that an AI that’s 2xHuman intelligence will come along before one that’s 1000x human intelligence. We want to get as much out of the 2x-10x AIs as possible before we try building, and IF it is possble to avoid accidentally making that 1000x AI before you can make sure it is safe, then there could be a point where you are dealing with 10x AIs and where you can do so safely.
So don’t pit your engineer against an unboundedly transhuman AI. Pit your engineer against a slightly transhuman AI.
2) Security-wise, you can do a ton better than somebody who is just “really really sure they wouldn’t let the AI out”. You can have the AI talk only to people who literally cannot let the AI out, and on top of that ensure that everyone is briefed on the risks. Make sure that to the extent possible, letting the AI out is literally not an option. It is always an option to attempt to do so covertly, or to attempt to convince people to break self-imposed rules, sure, but how many prisoners in history have talked their way out of prison?
You can even tell people “Okay, we’re never ever letting this AI out. It is a prototype and considered dangerous. But hard, diligent work with attention to safety protocols will ensure we can someday build a safe AI that will have every capacity and more that this one does, to the enormous benefit of humanity”.
3) If you look at this answer and say “well you sound like you think you’re that much better than the people who took the challenge and lost not even to a transhuman AI but to Yudkowsky” and you’d be a little right. But mostly I believe that if we ever face that actual situation, the challenge the AI faces will be more akin to trying to cause a nuclear weapon launch with a telephone; not just one person but an entire formidable physical and bureaucratic security apparatus will (or SHOULD) stand in the way.
There’s also the issue of the political problem of FAI. You might not have a choice but to rely on less-than-desirable safety protocols because of external pressures. Say a genuinely feasible plan for AGI is developed that would take 2 years of research, while the most optimistic friendliness researchers doubt that the architecture could be made provably safe in sooner than 10. How will you confront the problem of convincing every lab in the world with sufficient resources to hold off for 10 years? Would that even be possible?
I guess we would have to look at the past and see how political and research institutions have behaved in the face of technologies with enormous potential but which were also known to pose a concrete existential risk (Yes I am talking about nuclear weapons).
Did you just call Eliezer an AI..? X-D
I could give a serious response to this about “AI” being stand in for “the person playing the AI” however other responses I could give:
I am firmly of the opinion that the distinction between artificial and natural is artificial.
Why do you think that Bruce Schneier started scanning his computer for copies of Eliezer Yudkowsky
Yes, my comment was in the spirit of your second response :-)