I have since been given access to a sample of such non-public discussions. (The sample is small but I think at least somewhat representative.) Worryingly, it seems that there’s a disconnect between the kind of global coordination that AI governance researchers are thinking and talking about, and the kind that technical AI safety researchers often talk about nowadays as necessary to ensure safety.
In short, the Google docs I’ve seen all seem to assume that a safe and competitive AGI can be achieved at some reasonable level of investment into technical safety, and the main coordination problem is how to prevent a “race to the bottom” whereby some actors try to obtain a lead in AI capabilities by underinvesting in safety. However, currentdiscussion among technical AI safety researchers suggest that a safe and competitive AGI perhaps can’t be achieved at any feasible level of investment into technical safety, and at a certain point we’ll probably need global coordination to stop, limit, or slow down progress in and/or deployment/use of AI capabilities.
Questions I’m trying to answer now: 1) Is my impression from the limited sample correct? 2) If so, how best to correct this communications gap (and prevent similar gaps in the future) between the two groups of people working on AI risk?
I appreciate how you turned the most useful private info into public conversation while largely minimising the amount of private info that had to become public.
To respond directly, yes, your observation matches my impression of folks working on governance issues who aren’t very involved in technical alignment (with the exception of Bostrom). I have no simple answer to the latter question.
how best to correct this communications gap (and prevent similar gaps in the future) between the two groups of people working on AI risk?
Convince the researchers at OpenAI, FHI and Open Phil, and maybe DeepMind and CHAI, that it’s not possible to get safe, competitive AI; then ask them to pass it on to governance researchers.
I have a feeling it’s not that simple. See the last part of “Generate evidence of difficulty” as a research purpose on biases. So for example I know at least one person who quit from an AI safety org (in part) because they became convinced that it’s too difficult to achieve safe, competitive AI (or at least the approach pursued by the org wasn’t going to work). Another person privately told me they have little idea how their research will eventually contribute to a safe, competitive AI, but hasn’t written anything like that publicly AFAIK. (And note that I don’t actually have that many opportunities to speak privately with other AI safety researchers.) Another thing is that most AI safety researchers probably don’t think it’s part of their job to “generate evidence of difficulty” so I have to convince them of that first.
Unless these problems are solved, I might be able to convince a few safety researchers to go to governance researchers and tell them they think it’s not possible to get safe, competitive AI, but their concerns will probably just be dismissed as outliers. I think a better step forward would be to build a private forum where these kinds of concerns can be more frankly discussed, as well as a culture where doing so is normative. This addresses some of the possible biases and I’m still not sure about the others.
This is pretty strongly different from my impressions, but I don’t think we could resolve the disagreement without talking about specific examples of people, so I’m inclined to set this aside.
i) are the kinds of transformative AI that we’re reasonably likely to get in the next 25 years are unalignable?
ii) how plausible are the extreme levels of cooperation Wei Dai wants
iii) how important is career capital/credibility?
I’m perhaps midway between Wei Dai’s view and the median governance view so may be an interesting example. I think we’re ~10% likely to get transformative general AI in the next 20 years, and ~6% likely to get an incorrigible one, and ~5.4% likely to get incorrigible general AI that’s insufficiently philosophically competent. Extreme cooperation seems ~5% likely, and is correlated with having general AI. It would be nice if more people worked on that, or on whatever more-realistic solutions would work for the transformative unsafe AGI scenario, but I’m happy for some double-digit percentage of governance researchers to keep working on less extreme (and more likely) solutions to build credibility.
I have since been given access to a sample of such non-public discussions. (The sample is small but I think at least somewhat representative.) Worryingly, it seems that there’s a disconnect between the kind of global coordination that AI governance researchers are thinking and talking about, and the kind that technical AI safety researchers often talk about nowadays as necessary to ensure safety.
In short, the Google docs I’ve seen all seem to assume that a safe and competitive AGI can be achieved at some reasonable level of investment into technical safety, and the main coordination problem is how to prevent a “race to the bottom” whereby some actors try to obtain a lead in AI capabilities by underinvesting in safety. However, current discussion among technical AI safety researchers suggest that a safe and competitive AGI perhaps can’t be achieved at any feasible level of investment into technical safety, and at a certain point we’ll probably need global coordination to stop, limit, or slow down progress in and/or deployment/use of AI capabilities.
Questions I’m trying to answer now: 1) Is my impression from the limited sample correct? 2) If so, how best to correct this communications gap (and prevent similar gaps in the future) between the two groups of people working on AI risk?
I appreciate how you turned the most useful private info into public conversation while largely minimising the amount of private info that had to become public.
To respond directly, yes, your observation matches my impression of folks working on governance issues who aren’t very involved in technical alignment (with the exception of Bostrom). I have no simple answer to the latter question.
Seems right to me, yes.
Convince the researchers at OpenAI, FHI and Open Phil, and maybe DeepMind and CHAI, that it’s not possible to get safe, competitive AI; then ask them to pass it on to governance researchers.
I have a feeling it’s not that simple. See the last part of “Generate evidence of difficulty” as a research purpose on biases. So for example I know at least one person who quit from an AI safety org (in part) because they became convinced that it’s too difficult to achieve safe, competitive AI (or at least the approach pursued by the org wasn’t going to work). Another person privately told me they have little idea how their research will eventually contribute to a safe, competitive AI, but hasn’t written anything like that publicly AFAIK. (And note that I don’t actually have that many opportunities to speak privately with other AI safety researchers.) Another thing is that most AI safety researchers probably don’t think it’s part of their job to “generate evidence of difficulty” so I have to convince them of that first.
Unless these problems are solved, I might be able to convince a few safety researchers to go to governance researchers and tell them they think it’s not possible to get safe, competitive AI, but their concerns will probably just be dismissed as outliers. I think a better step forward would be to build a private forum where these kinds of concerns can be more frankly discussed, as well as a culture where doing so is normative. This addresses some of the possible biases and I’m still not sure about the others.
This is pretty strongly different from my impressions, but I don’t think we could resolve the disagreement without talking about specific examples of people, so I’m inclined to set this aside.
I would guess three main disagreements are:
i) are the kinds of transformative AI that we’re reasonably likely to get in the next 25 years are unalignable?
ii) how plausible are the extreme levels of cooperation Wei Dai wants
iii) how important is career capital/credibility?
I’m perhaps midway between Wei Dai’s view and the median governance view so may be an interesting example. I think we’re ~10% likely to get transformative general AI in the next 20 years, and ~6% likely to get an incorrigible one, and ~5.4% likely to get incorrigible general AI that’s insufficiently philosophically competent. Extreme cooperation seems ~5% likely, and is correlated with having general AI. It would be nice if more people worked on that, or on whatever more-realistic solutions would work for the transformative unsafe AGI scenario, but I’m happy for some double-digit percentage of governance researchers to keep working on less extreme (and more likely) solutions to build credibility.