Rohin Shah comments on Eight claims about multi-agent AGI safety

Rohin Shah 10 Jan 2021 19:08 UTC
LW: 3 AF: 3
0
AF
There are quite a few arguments for why we should move beyond the standard single-AGI safety paradigm.
Fwiw, I would classify all of 5-8 as reasons that AI governance should care about multiple AI systems (which it always has); I don’t see why they require technical AI alignment research to move beyond the single-AGI paradigm.
(Here “AI alignment” is the problem of “how do you ensure that your AI system is not adversarially optimizing against you”, and not making any claims about what other AI systems will do.)
- Richard_Ngo 10 Jan 2021 19:29 UTC
  LW: 6 AF: 4
  0
  AF Parent
  I’d say that each of #5-#8 changes the parts of “AI alignment” that you focus on. For example, you may be confident that your AI system is not optimising against you, without being confident that 1000 copies of your AI system working together won’t be optimising against you. Or you might be confident that your AI system won’t do anything dangerous in almost all situations, but no longer confident once you realise that threats are adversarially selected to be extreme.
  Whether you count these shifts as “moving beyond the standard paradigm” depends, I guess, on how much they change alignment research in practice. It seems like proponents of #7 and #8 believe that, conditional on those claims, alignment researchers’ priorities should shift significantly. And #5 has already contributed to a shift away from the agent foundations paradigm. On the other hand, I’m a proponent of #6, and I don’t currently believe that this claim should significantly change alignment research (although maybe further thought will identify some ways).
  I think I’ll edit the line you quoted to say “beyond standard single-AGI safety paradigms” to clarify that there’s no single paradigm everyone buys into.
  - Rohin Shah 10 Jan 2021 19:54 UTC
    LW: 4 AF: 3
    0
    AF Parent
    Whether you count these shifts as “moving beyond the standard paradigm” depends, I guess, on how much they change alignment research in practice. It seems like proponents of #7 and #8 believe that, conditional on those claims, alignment researchers’ priorities should shift significantly.
    I would say that proponents of #7 and #8 believe that longtermists’ priorities should shift significantly (in the case of #8, might just be negative utilitarians). They are proposing that we focus on other problems that are not AI alignment (as I defined it above).
    This might just be a semantic disagreement, but I do think it’s an important point—I wouldn’t want people to say things like “people argue that it will become easier to engineer biological weapons than to build AGI, and therefore biosecurity is more important. Thus we need to move beyond the AGI paradigm to the emerging technologies paradigm”. Like, it’s correct, but it is creating too much generality; it is important to be able to focus on specific problems and make claims about those problems. Arguments 7-8 feel to me like “look, there’s this other problem besides AI alignment that might be more important”; I don’t deny that this could change what you do, but it doesn’t change what the field of AI alignment should do.
    (You might say that you were talking about AI safety generally, and not AI alignment, but then I dispute that AI safety ever had a “single-AGI” paradigm; people have been talking about multipolar outcomes for a long time.)
    And #5 has already contributed to a shift away from the agent foundations paradigm.
    Yes, but not to a multiagent paradigm, which I thought was your main claim.
    - Richard_Ngo 10 Jan 2021 20:44 UTC
      LW: 5 AF: 4
      0
      AF Parent
      This all seems straightforwardly correct, so I’ve changed the line in question accordingly. Thanks for the correction :)
      One caveat: technical work to address #8 currently involves either preventing AGIs from being misaligned in ways that lead them to make threats, or preventing AGIs from being aligned in ways which make them susceptible to threats. The former seems to qualify as an aspect of the “alignment problem”, the latter not so much. I should have used the former as an example in my original reply to you, rather than using the latter.