Rohin Shah comments on Eight claims about multi-agent AGI safety

Rohin Shah 10 Jan 2021 19:54 UTC
LW: 4 AF: 3
0
AF
Whether you count these shifts as “moving beyond the standard paradigm” depends, I guess, on how much they change alignment research in practice. It seems like proponents of #7 and #8 believe that, conditional on those claims, alignment researchers’ priorities should shift significantly.
I would say that proponents of #7 and #8 believe that longtermists’ priorities should shift significantly (in the case of #8, might just be negative utilitarians). They are proposing that we focus on other problems that are not AI alignment (as I defined it above).
This might just be a semantic disagreement, but I do think it’s an important point—I wouldn’t want people to say things like “people argue that it will become easier to engineer biological weapons than to build AGI, and therefore biosecurity is more important. Thus we need to move beyond the AGI paradigm to the emerging technologies paradigm”. Like, it’s correct, but it is creating too much generality; it is important to be able to focus on specific problems and make claims about those problems. Arguments 7-8 feel to me like “look, there’s this other problem besides AI alignment that might be more important”; I don’t deny that this could change what you do, but it doesn’t change what the field of AI alignment should do.
(You might say that you were talking about AI safety generally, and not AI alignment, but then I dispute that AI safety ever had a “single-AGI” paradigm; people have been talking about multipolar outcomes for a long time.)
And #5 has already contributed to a shift away from the agent foundations paradigm.
Yes, but not to a multiagent paradigm, which I thought was your main claim.
- Richard_Ngo 10 Jan 2021 20:44 UTC
  LW: 5 AF: 4
  0
  AF Parent
  This all seems straightforwardly correct, so I’ve changed the line in question accordingly. Thanks for the correction :)
  One caveat: technical work to address #8 currently involves either preventing AGIs from being misaligned in ways that lead them to make threats, or preventing AGIs from being aligned in ways which make them susceptible to threats. The former seems to qualify as an aspect of the “alignment problem”, the latter not so much. I should have used the former as an example in my original reply to you, rather than using the latter.