In my experience, a highly predictive feature of “agreeing with safety concerns” is how much someone thinks about/works on RL (or decision-making more broadly). For example, many scientists in RL-driven labs (DeepMind, OpenAI) agree with safety concerns, while there is almost no understanding of safety concerns (let alone agreement that they are valid) among researchers in NLP (mainly driven by supervised learning and unsupervised learning); it’s easier to intuitively motivate safety concerns from the perspective of RL and demos of where it fails (esp. with concerns like reward gaming and instrumental convergence). Thus, a useful way to decompose Rohin’s question is:
How many top researchers doing AGI-related work think about RL?
How many of the above researchers agree with safety concerns?
How many top researchers doing AGI-related work don’t think about RL?
How many of the above researchers agree with safety concerns?
We can add the numbers from 1.1 and 2.1 and divide by the total number of top AGI researchers (here, 2000).
I’d argue that more researchers are in category 2 (non-RL researchers) than 1 (RL researchers). A lot of recent progress towards AGI has been driven by improvements in representation learning, supervised learning, and unsupervised learning. Work in these areas is AGI-related in the relevant sense to Rohin; if we develop AGI without RL (e.g., GPT-10), we’d need the non-RL researchers who develop these models to coordinate and agree with safety concerns about e.g. releasing the models. I think it will continue to be the case for the foreseeable future that >50% of AGI-related researchers aren’t doing RL, as representation learning, supervised learning, unsupervised learning, etc. all seem quite important to developing AGI.
The upshot (downshot?) of the above is that we’ll probably need a good chunk of non-RL but AGI-related researchers to agree with safety concerns. Folks in this group seem less receptive to safety concerns (probably because they don’t obviously come up as often as in RL work), so I think it’ll take a pretty intuitively compelling demonstration/warning-shot to convince people in the non-RL group, visible enough to reach across several subfields in ML; preferably, these demos should be of direct relevance to the work in people doing non-RL work (i.e., directly showing how NLP systems are Stuart-Russell-style dangerous to convince NLP people). I think we’ll need pretty advanced systems to get these kinds of demos, roughly +/-5 years from when we get AGI (vs. Rohin’s prior estimate of 1-10 years before AGI). So overall, I think Rohin’s posterior should be shifted right ~5 years.
Here is my Elicit snapshot of what I think Rohin’s posterior will be after updating on all comments here. It seems like all the other comments are more optimistic than Rohin prior, so I predict that Rohin’s posterior will become more optimistic, even though I think the concerns I’ve outlined above outweigh/override some of the considerations in the other comments. In particular, I think you’ll get an overestimate of “agreement on safety concerns” by looking only at the NeurIPS/ICML community which is pretty RL-heavy relative to e.g. the NLP and Computer Vision communities (which will still face AGI-related coordination problems). The same can be said about researchers who explicitly self-identify with “Artificial Intelligence” or “Artificial General Intelligence” (historically focused on decision-making and games). Looking at the 100 most cited NLP researchers on Google Scholar, I found one who I could recognize as probably sympathetic to safety concerns (Wojciech Zaremba) and similar for Computer Vision.
Thanks, this is an important point I hadn’t considered before.
I don’t buy that the entire distribution should be shifted right 5 years—presumably as time goes on it will be more clear what subfields of AI are relevant to AGI, we’ll have better models of how AGI will be built, and safety arguments will be more tailored to the right subfields. I do buy that it should reduce the probability I assign in the near future (e.g. the next 5-10 years).
In my experience, a highly predictive feature of “agreeing with safety concerns” is how much someone thinks about/works on RL (or decision-making more broadly). For example, many scientists in RL-driven labs (DeepMind, OpenAI) agree with safety concerns, while there is almost no understanding of safety concerns (let alone agreement that they are valid) among researchers in NLP (mainly driven by supervised learning and unsupervised learning); it’s easier to intuitively motivate safety concerns from the perspective of RL and demos of where it fails (esp. with concerns like reward gaming and instrumental convergence). Thus, a useful way to decompose Rohin’s question is:
How many top researchers doing AGI-related work think about RL?
How many of the above researchers agree with safety concerns?
How many top researchers doing AGI-related work don’t think about RL?
How many of the above researchers agree with safety concerns?
We can add the numbers from 1.1 and 2.1 and divide by the total number of top AGI researchers (here, 2000).
I’d argue that more researchers are in category 2 (non-RL researchers) than 1 (RL researchers). A lot of recent progress towards AGI has been driven by improvements in representation learning, supervised learning, and unsupervised learning. Work in these areas is AGI-related in the relevant sense to Rohin; if we develop AGI without RL (e.g., GPT-10), we’d need the non-RL researchers who develop these models to coordinate and agree with safety concerns about e.g. releasing the models. I think it will continue to be the case for the foreseeable future that >50% of AGI-related researchers aren’t doing RL, as representation learning, supervised learning, unsupervised learning, etc. all seem quite important to developing AGI.
The upshot (downshot?) of the above is that we’ll probably need a good chunk of non-RL but AGI-related researchers to agree with safety concerns. Folks in this group seem less receptive to safety concerns (probably because they don’t obviously come up as often as in RL work), so I think it’ll take a pretty intuitively compelling demonstration/warning-shot to convince people in the non-RL group, visible enough to reach across several subfields in ML; preferably, these demos should be of direct relevance to the work in people doing non-RL work (i.e., directly showing how NLP systems are Stuart-Russell-style dangerous to convince NLP people). I think we’ll need pretty advanced systems to get these kinds of demos, roughly +/-5 years from when we get AGI (vs. Rohin’s prior estimate of 1-10 years before AGI). So overall, I think Rohin’s posterior should be shifted right ~5 years.
Here is my Elicit snapshot of what I think Rohin’s posterior will be after updating on all comments here. It seems like all the other comments are more optimistic than Rohin prior, so I predict that Rohin’s posterior will become more optimistic, even though I think the concerns I’ve outlined above outweigh/override some of the considerations in the other comments. In particular, I think you’ll get an overestimate of “agreement on safety concerns” by looking only at the NeurIPS/ICML community which is pretty RL-heavy relative to e.g. the NLP and Computer Vision communities (which will still face AGI-related coordination problems). The same can be said about researchers who explicitly self-identify with “Artificial Intelligence” or “Artificial General Intelligence” (historically focused on decision-making and games). Looking at the 100 most cited NLP researchers on Google Scholar, I found one who I could recognize as probably sympathetic to safety concerns (Wojciech Zaremba) and similar for Computer Vision.
Thanks, this is an important point I hadn’t considered before.
I don’t buy that the entire distribution should be shifted right 5 years—presumably as time goes on it will be more clear what subfields of AI are relevant to AGI, we’ll have better models of how AGI will be built, and safety arguments will be more tailored to the right subfields. I do buy that it should reduce the probability I assign in the near future (e.g. the next 5-10 years).