Logan Zoellner comments on AI Boxing for Hardware-bound agents (aka the China alignment problem)

Logan Zoellner 9 May 2020 7:31 UTC
1 point
My major concern with AI boxing is the possibility that the AI might just convince people to let it out
Agree. My point was boxing a human-level AI is in principle easy (especially if that AI exists on a special purpose device of which there is only one in the world), but in practice someone somewhere is going to unbox AI before it is even developed.
The biggest threat from AI comes from AI-owned AI with a hostile worldview—no matter whether how the AI gets created. If we can’t answer the question “how do we make sure AIs do the things we want them to do when we can’t tell them all the things they shouldn’t do?”
Beyond that, I’m not really worried about economic dominance in the context of AI. Given a slow takeoff scenario, the economy will be booming like crazy wherever AI has been exercised to its technological capacities even before AGI emerges.

I think there’s a connection between these two things, but probably I haven’t made it terribly clear. The reason I talked about economic interactions, is because they’re the best framework we currently have for describing positive-sum interactions between entities with vastly different levels of power.
I am certain that my bank knows much more about finance than I do. Likewise, my insurance company knows much more about insurance than I do. And my ISP probably knows more about networking than I do (although sometimes I wonder). If any of these entities wanted to totally screw me over at any point, they probably could. The reason I am able to successfully interact with them is not because they fear my retaliation or share my worldviews. But it is because they exist in a wider economy in which maintaining their reputation is valuable because it allows them to engage in positive-sum trades in the future.
Note that the degree to which this is true varies widely across time and space. People who are socially outcast in countries with poor rule of law cannot trust the bank. I propose that we ought to have less faith in our ability to control AI or its worldview and place more effort into making sure that potential AIs exist in a sociopolitical environment where it is to their benefit not to destroy us.
The reason I called this post the “China alignment problem” is because the same techniques we might use to interact with China (a potentially economically powerful agent with an alien or even hostile worldview) are the same ones I think we should be using to align our interactions with AI. Our chances of changing China’s (or AIs) worldview to match our own are fairly slim, but our ability to ensure their “peaceful rise” is much greater.
I believe the best framework to do this is to establish a pluralistic society in which no single actor dominates, and where positive-sum trades are the default as enforced by collective action against those who threaten or abuse others.

Still, we were able to handle nuclear weapons so we should probably be able to handle this to.
Small nitpick, but “we were able to handle nuclear weapons” is a bit iffy. Looking up a list of near-misses during the Cold War is terrifying. Much less thinking about countries like Iran or North Korea going through a succession crisis.
- Isnasene 10 May 2020 6:56 UTC
  6 points
  Parent
  I propose that we ought to have less faith in our ability to control AI or its worldview and place more effort into making sure that potential AIs exist in a sociopolitical environment where it is to their benefit not to destroy us.
  This is probably the crux of our disagreement. If an AI is indeed powerful enough to wrest power from humanity, the catastrophic convergence conjecture implies that it by default will. And if the AI is indeed powerful enough to wrest power from humanity, I have difficulty envisioning things we could offer it in trade that it couldn’t just unilaterally satisfy for itself in a cheaper and more efficient manner.
  As an intuition pump for this, I think that the AI-human power differential will be more similar to the human-animal differential than the company-human differential. In the latter case, the company actually relies on humans for continued support (something an AI that can roll-out human-level AI won’t need to do at some point) and thus has to maintain a level of trust. In the former case, well… people don’t really negotiate with animals at all.