Matthew Barnett comments on Matthew Barnett’s Shortform

Matthew Barnett 26 Jan 2024 18:52 UTC
2 points

Rogue AIs are quite likely to at least attempt to ally with humans and opposing human groups will indeed try to make some usage of AI. So the situation might look like “rogue AIs+humans” vs AIs+humans. But, I think there are good reasons to think that the non-rogue AIs will still be misaligned and might be ambivalent about which side they prefer.

I think if there’s a future conflict between AIs, with humans split between sides of the conflict, it just doesn’t make sense to talk about “misalignment” being the main cause for concern here. AIs are just additional agents in the world, who have separate values from each other just like how humans (and human groups) have separate values from each other. AIs might have on-average cognitive advantages over humans in such a world, but the tribal frame of thinking “us (aligned) vs. AIs (misaligned)” simply falls apart in such scenarios.

(This is all with the caveat that AIs could make war more likely for reasons other than misalignment, for example by accelerating technological progress and bringing about the creation of powerful weapons.)
- ryan_greenblatt 26 Jan 2024 20:07 UTC
  2 points
  Parent
  Sure, but I might think a given situation would nearly entirely resolved without misalignment. (Edit, without technical issues with misalignment, e.g. if AI creators could trivially avoid serious misalignment.)
  
  E.g. if an AI escapes from OpenAI’s servers and then allies with North Korea, the situation would have been solved without misalignment issues.
  
  You could also solve or mitigate this type of problem in the example by resolving all human conflicts (so the AI doesn’t have a group to ally with), but this might be quite a bit harder than solving technical problems related to misalignment (either via control type approaches or removing misalignment).
  - Matthew Barnett 26 Jan 2024 20:54 UTC
    4 points
    Parent
    What do you mean by “misalignment”? In a regime with autonomous AI agents, I usually understand “misalignment” to mean “has different values from some other agent”. In this frame, you can be misaligned with some people but not others. If an AI is aligned with North Korea, then it’s not really “misaligned” in the abstract—it’s just aligned with someone who we don’t want it to be aligned with. Likewise, if OpenAI develops AI that’s aligned with the United States, but unaligned with North Korea, this mostly just seems like the same problem but in reverse.
    In general, conflicts don’t really seem well-described as issues of “misalignment”. Sure, in the absence of all misalignment, wars would probably not occur (though they may still happen due to misunderstandings and empirical disagreements). But for the most part, wars seem better described as arising from a breakdown of institutions that are normally tasked with keeping the peace. You can have a system of lawful yet mutually-misaligned agents who keep the peace, just as you can have an anarchic system with mutually-misaligned agents in a state of constant war. Misalignment just (mostly) doesn’t seem to be the thing causing the issue here.
    You could also solve or mitigate the problem by resolving all human conflicts (so the AI doesn’t have a group to ally with)
    Note that I’m not saying
    AIs will aid in existing human conflicts, picking sides along the ordinary lines we see today
    I am saying:
    AIs will likely have conflicts amongst themselves, just as humans have conflicts amongst themselves, and future conflicts (when considering all of society) don’t seem particularly likely to be AI vs. human, as opposed to AI vs AI (with humans split between these groups).
    - ryan_greenblatt 26 Jan 2024 21:38 UTC
      2 points
      Parent
      
      Note that I’m not saying
      
      Yep, I was just refering to my example scenario and scenarios like this.
      
      Like the basic question is the extent to which human groups form a cartel/monopoly on human labor vs ally with different AI groups. (And existing conflict between human groups makes a full cartel much less likely.)
    - ryan_greenblatt 26 Jan 2024 21:30 UTC
      2 points
      Parent
      Sorry, by “without misalignment” I mean “without misalignment related technical problems”. As in, it’s trivial to avoid misalignment from the perspective of ai creators.
      - Matthew Barnett 26 Jan 2024 21:58 UTC
        2 points
        Parent
        This doesn’t clear up the confusion for me. That mostly pushes my question to “what are misalignment related technical problems?” Is the problem of an AI escaping a server and aligning with North Korea a technical or a political problem? How could we tell? Is this still in the regime where we are using AIs as tools, or are you talking about a regime where AIs are autonomous agents?
        ryan_greenblatt 26 Jan 2024 22:08 UTC
        2 points
        Parent
        I mean, it could be resolved in principle by technical means and might be resovable by political means as well. I’m assuming the AI creator didn’t want the AI to escape to north korea and therefore failed at some technical solution to this.
        
        I’m imagining very powerful AIs, e.g. AIs that can speed up R&D by large factors. These are probably running autonomously, but in a way which is de jure controlled by the AI lab.