Nice post! The moof scenario reminds me somewhat of Paul Christiano’s slow take-off scenario which you might enjoy reading about. This is basically my stance as well.
AI boxing is actually very easy for Hardware Bound AI. You put the AI inside of an air-gapped firewall and make sure it doesn’t have enough compute power to invent some novel form of transmission that isn’t known to all of science. Since there is a considerable computational gap between useful AI and “all of science”, you can do quite a bit with an AI in a box without worrying too much about it going rogue.
My major concern with AI boxing is the possibility that the AI might just convince people to let it out (ie remove the firewall, provide unbounded internet access, connect it to a Cloud). Maybe you can get around this by combining a limited AI output data stream with a very arduous gated process for letting the AI out in advance but I’m not very confident.
If the biggest threat from AI doesn’t come from AI Foom, but rather from Chinese-owned AI with a hostile world-view.
The biggest threat from AI comes from AI-owned AI with a hostile worldview—no matter whether how the AI gets created. If we can’t answer the question “how do we make sure AIs do the things we want them to do when we can’t tell them all the things they shouldn’t do?”, we might wind up with Something Very Smartscheming to take over the world while lacking at least one Important Human Value. Think Age of Em except the Ems aren’t even human.
Advancing AI research is actually one of the best things you can do to ensure a “peaceful rise” of AI in the future. The sooner we discover the core algorithms behind intelligence, the more time we will have to prepare for the coming revolution. The worst-case scenario still is that some time in the mid 2030′s a single research team comes up with a revolutionary new software that puts them miles ahead of anyone else. The more evenly distributed AI research is, the more mutually beneficial economic games will ensure the peaceful rise of AI.
Because I’m still worried about making sure AI is actually doing the things we want it to do, I’m worried that faster AI advancements will imperil this concern. Beyond that, I’m not really worried about economic dominance in the context of AI. Given a slow takeoff scenario, the economy will be booming like crazy wherever AI has been exercised to its technological capacities even before AGI emerges. In a world of abundant labor and so on, the need for mutually beneficial economic games with other human players, let alone countries, will be much less.
I’m a little worried about military dominance though—since the country with the best military AI may leverage it to radically gain a geopolitical upper-hand. Still, we were able to handle nuclear weapons so we should probably be able to handle this to.
Agree. My point was boxing a human-level AI is in principle easy (especially if that AI exists on a special purpose device of which there is only one in the world), but in practice someone somewhere is going to unbox AI before it is even developed.
The biggest threat from AI comes from AI-owned AI with a hostile worldview—no matter whether how the AI gets created. If we can’t answer the question “how do we make sure AIs do the things we want them to do when we can’t tell them all the things they shouldn’t do?”
Beyond that, I’m not really worried about economic dominance in the context of AI. Given a slow takeoff scenario, the economy will be booming like crazy wherever AI has been exercised to its technological capacities even before AGI emerges.
I think there’s a connection between these two things, but probably I haven’t made it terribly clear. The reason I talked about economic interactions, is because they’re the best framework we currently have for describing positive-sum interactions between entities with vastly different levels of power.
I am certain that my bank knows much more about finance than I do. Likewise, my insurance company knows much more about insurance than I do. And my ISP probably knows more about networking than I do (although sometimes I wonder). If any of these entities wanted to totally screw me over at any point, they probably could. The reason I am able to successfully interact with them is not because they fear my retaliation or share my worldviews. But it is because they exist in a wider economy in which maintaining their reputation is valuable because it allows them to engage in positive-sum trades in the future.
Note that the degree to which this is true varies widely across time and space. People who are socially outcast in countries with poor rule of law cannot trust the bank. I propose that we ought to have less faith in our ability to control AI or its worldview and place more effort into making sure that potential AIs exist in a sociopolitical environment where it is to their benefit not to destroy us.
The reason I called this post the “China alignment problem” is because the same techniques we might use to interact with China (a potentially economically powerful agent with an alien or even hostile worldview) are the same ones I think we should be using to align our interactions with AI. Our chances of changing China’s (or AIs) worldview to match our own are fairly slim, but our ability to ensure their “peaceful rise” is much greater.
I believe the best framework to do this is to establish a pluralistic society in which no single actor dominates, and where positive-sum trades are the default as enforced by collective action against those who threaten or abuse others.
Still, we were able to handle nuclear weapons so we should probably be able to handle this to.
Small nitpick, but “we were able to handle nuclear weapons” is a bit iffy. Looking up a list of near-misses during the Cold War is terrifying. Much less thinking about countries like Iran or North Korea going through a succession crisis.
I propose that we ought to have less faith in our ability to control AI or its worldview and place more effort into making sure that potential AIs exist in a sociopolitical environment where it is to their benefit not to destroy us.
This is probably the crux of our disagreement. If an AI is indeed powerful enough to wrest power from humanity, the catastrophic convergence conjecture implies that it by default will. And if the AI is indeed powerful enough to wrest power from humanity, I have difficulty envisioning things we could offer it in trade that it couldn’t just unilaterally satisfy for itself in a cheaper and more efficient manner.
As an intuition pump for this, I think that the AI-human power differential will be more similar to the human-animal differential than the company-human differential. In the latter case, the company actually relies on humans for continued support (something an AI that can roll-out human-level AI won’t need to do at some point) and thus has to maintain a level of trust. In the former case, well… people don’t really negotiate with animals at all.
Nice post! The moof scenario reminds me somewhat of Paul Christiano’s slow take-off scenario which you might enjoy reading about. This is basically my stance as well.
My major concern with AI boxing is the possibility that the AI might just convince people to let it out (ie remove the firewall, provide unbounded internet access, connect it to a Cloud). Maybe you can get around this by combining a limited AI output data stream with a very arduous gated process for letting the AI out in advance but I’m not very confident.
The biggest threat from AI comes from AI-owned AI with a hostile worldview—no matter whether how the AI gets created. If we can’t answer the question “how do we make sure AIs do the things we want them to do when we can’t tell them all the things they shouldn’t do?”, we might wind up with Something Very Smart scheming to take over the world while lacking at least one Important Human Value. Think Age of Em except the Ems aren’t even human.
Because I’m still worried about making sure AI is actually doing the things we want it to do, I’m worried that faster AI advancements will imperil this concern. Beyond that, I’m not really worried about economic dominance in the context of AI. Given a slow takeoff scenario, the economy will be booming like crazy wherever AI has been exercised to its technological capacities even before AGI emerges. In a world of abundant labor and so on, the need for mutually beneficial economic games with other human players, let alone countries, will be much less.
I’m a little worried about military dominance though—since the country with the best military AI may leverage it to radically gain a geopolitical upper-hand. Still, we were able to handle nuclear weapons so we should probably be able to handle this to.
Agree. My point was boxing a human-level AI is in principle easy (especially if that AI exists on a special purpose device of which there is only one in the world), but in practice someone somewhere is going to unbox AI before it is even developed.
I think there’s a connection between these two things, but probably I haven’t made it terribly clear. The reason I talked about economic interactions, is because they’re the best framework we currently have for describing positive-sum interactions between entities with vastly different levels of power.
I am certain that my bank knows much more about finance than I do. Likewise, my insurance company knows much more about insurance than I do. And my ISP probably knows more about networking than I do (although sometimes I wonder). If any of these entities wanted to totally screw me over at any point, they probably could. The reason I am able to successfully interact with them is not because they fear my retaliation or share my worldviews. But it is because they exist in a wider economy in which maintaining their reputation is valuable because it allows them to engage in positive-sum trades in the future.
Note that the degree to which this is true varies widely across time and space. People who are socially outcast in countries with poor rule of law cannot trust the bank. I propose that we ought to have less faith in our ability to control AI or its worldview and place more effort into making sure that potential AIs exist in a sociopolitical environment where it is to their benefit not to destroy us.
The reason I called this post the “China alignment problem” is because the same techniques we might use to interact with China (a potentially economically powerful agent with an alien or even hostile worldview) are the same ones I think we should be using to align our interactions with AI. Our chances of changing China’s (or AIs) worldview to match our own are fairly slim, but our ability to ensure their “peaceful rise” is much greater.
I believe the best framework to do this is to establish a pluralistic society in which no single actor dominates, and where positive-sum trades are the default as enforced by collective action against those who threaten or abuse others.
Small nitpick, but “we were able to handle nuclear weapons” is a bit iffy. Looking up a list of near-misses during the Cold War is terrifying. Much less thinking about countries like Iran or North Korea going through a succession crisis.
This is probably the crux of our disagreement. If an AI is indeed powerful enough to wrest power from humanity, the catastrophic convergence conjecture implies that it by default will. And if the AI is indeed powerful enough to wrest power from humanity, I have difficulty envisioning things we could offer it in trade that it couldn’t just unilaterally satisfy for itself in a cheaper and more efficient manner.
As an intuition pump for this, I think that the AI-human power differential will be more similar to the human-animal differential than the company-human differential. In the latter case, the company actually relies on humans for continued support (something an AI that can roll-out human-level AI won’t need to do at some point) and thus has to maintain a level of trust. In the former case, well… people don’t really negotiate with animals at all.