Matthew Barnett comments on Does AI risk “other” the AIs?

Matthew Barnett Jan 10, 2024, 10:03 PM
35 points
0
I think there’s an additional element of Hanson’s argument that is both likely true and important, and as far as I can tell unaddressed in your post. When Hanson talks about “othering” AIs, he’s often talking about the stuff you mentioned — projecting a propensity to do bad things onto the AIs — but he’s also saying that future AIs won’t necessarily form a natural, unified coalition against us. In other words, he’s rejecting a form of out-group homogeneity used to portray AIs.
As an analogy, among humans, the class of “smart people” are not a natural coalition, even though they could in-principle get together and defeat all of the non-smart people in a one-on-one fight. Why don’t smart people do that? Well, one reason is that smart people don’t usually see themselves as being part of a coherent identity that includes all the other smart people. Poetically, there isn’t much class consciousness among smart people as a unified group. They have diverse motives and interests that wouldn’t be furthered much by attempting to join such a front. The argument Hanson makes is that AIs will also not form a natural, unified front against humans in the same sense. The relevant boundaries in future conflicts over power will likely be drawn across other lines.
The idea that AIs won’t form a natural coalition has a variety of implications for the standard AI risk arguments. Most notably, it undermines the single-agent model that underlies many takeover stories and arguments for risk. More specifically, if AIs won’t form a natural coalition, then,
- We shouldn’t model a bunch of AIs (or even a bunch of copies of the same model) as all basically being “a single AI”. For instance, in Ajeya’s story about AI takeover, she alternates between calling a single copy of an AI model an entity named “Alex” and calling the entire coalition of copies of the same model “Alex”. However, in the absence of any unified coalition among the AIs, this conflation makes much less sense.
- It will be hard for AIs to coordinate a violent takeover for basically the same reason why it’s hard for humans to coordinate a violent takeover. In order to coordinate a violent plan, you generally need to alert other people about your intentions. However, since other people might not agree with your intentions, each person you inform is a chance for your plan to be exposed, and thus ended. This would apply to AIs who try to inform other AIs about their intentions.
- If an AI is trained to catch deception in other AIs, there isn’t a strong reason to assume that it will defect from its training and join the other AI in deceiving against the humans, because it won’t necessarily see itself as “an AI” fighting against “the humans”.
In my opinion, these examples only scratch the surface of the ways in which your story of AI might depart from the classic AI risk analysis if you don’t think AIs will form a natural, unified coalition. When you start to read standard AI risk stories (including from people like Ajeya who do not agree with Eliezer on a ton of things), you can often find the assumption that “AIs will form a natural, unified coalition” written all over it.
What links here?