It’s really largely Eliezer and some MIRI people. Most alignment researchers (e.g. at ARC, Deepmind, Open AI, Anthropic, CHAI) and most of the community [ETA: had wrong link here before] disagree (I count myself among those who disagree, although I am concerned about a big risk here), and think MIRI doesn’t have good reasons to support the claim of almost certain doom.
In particular, other alignment researchers tend to think that competitive supervision (e.g. AIs competing for reward to provide assistance in AI control that humans evaluate positively, via methods such as debate and alignment bootstrapping, or ELK schemes) has a good chance of working well enough to make better controls and so on. For an AI apocalypse it’s not only required that unaligned superintelligent AI outwit humans, but that all the safety/control/interpretabilty gains yielded by AI along the way also fail, creating a very challenging situation for misaligned AI.
Hm? I was recently at a 10-15 person lunch for people with >75% on doom, that included a number of non-MIRI people, including at least one person each from FHI and DeepMind and CHAI.
(Many of the people had interacted with MIRI or at some time worked with/for them, but work at other places now.)
Just registering your comment feels a little overstated, but you’re right to say a lot of this emanates from some folks at MIRI. For one, I had been betting a lot on MIRI, and now feel like a lot more responsibility has fallen on my plate.
You’ve now linked to the same survey twice in difference discussions of this topic, even though this survey, as far as I can tell, provides no evidence of the position you are trying to argue for. To copy Thomas Kwa’s response to your previous comment:
I don’t see anything in the linked survey about a consensus view on total existential risk probability from AGI. The survey asked researchers to compare between different existential catastrophe scenarios, not about their total x-risk probability, and surely not about the probability of x-risk if AGI were developed now without further alignment research.
We asked researchers to estimate the probability of five AI risk scenarios, conditional on an existential catastrophe due to AI having occurred. There was also a catch-all “other scenarios” option.
[...]
Most of this community’s discussion about existential risk from AI focuses on scenarios involving one or more powerful, misaligned AI systems that take control of the future. This kind of concern is articulated most prominently in “Superintelligence” and “What failure looks like”, corresponding to three scenarios in our survey (the “Superintelligence” scenario, part 1 and part 2 of “What failure looks like”). The median respondent’s total (conditional) probability on these three scenarios was 50%, suggesting that this kind of concern about AI risk is still prevalent, but far from the only kind of risk that researchers are concerned about today.
It also seems straightforwardly wrong that it’s just Eliezer and some MIRI people. While there is a wide variance in opinions on probability of doom from people working in AI Alignment, there are many people at Redwood, OpenAI and other organizations who assign very high probability here. I don’t think it’s at all accurate to say this fits neatly along organizational boundaries, nor is it at all accurate to say that this is “only” a small group of people. My current best guess is if we surveyed people working full-time on x-risk motivated AI Alignment, about 35% of people would assign a probability of doom above 80%.
Whoops, you’re right that I linked the wrong survey. I see others posted the link to Rob’s survey (done in response to some previous similar claims) and I edited my comment to fix the link.
I think you can identify a cluster of near certain doom views, e.g. ‘logistic success curve’ and odds of success being on the order of magnitude of 1% (vs 10%, or 90%) based around MIRI/Eliezer, with a lot of epistemic deference involved (visible on LW). I would say it is largely attributable there and without sufficient support.
”My current best guess is if we surveyed people working full-time on x-risk motivated AI Alignment, about 35% of people would assign a probability of doom above 80%.”
My current best guess is if we surveyed people working full-time on x-risk motivated AI Alignment, about 35% of people would assign a probability of doom above 80%.
Depending on how you choose the survey population, I would bet that it’s fewer than 35%, at 2:1 odds.
(Though perhaps you’ve already updated against based on Rob’s survey results below; that survey happened because I offered to bet against a similar claim of doom probabilities from Rob, that I would have won if we had made the bet.)
I’d just say the numbers from the survey below? Maybe slightly updated towards doom; I think probably some of the respondents have been influenced by recent wave of doomism.
If you had a more rigorously defined population, such that I could predict the differences between that population and the population surveyed below, I could predict more differences.
My current best guess is if we surveyed people working full-time on x-risk motivated AI Alignment, about 35% of people would assign a probability of doom above 80%.
Not what you were asking for (time has passed, the Q is different, and the survey population is different too), but in my early 2021 survey of people who “[research] long-term AI topics, or who [have] done a lot of past work on such topics” at a half-dozen orgs, 3⁄27 ≈ 11% of those who marked “I’m doing (or have done) a lot of technical AI safety research.” gave an answer above 80% to at least one of my attempts to operationalize ‘x-risk from AI’. (And at least two of those three were MIRI people.)
The weaker claim “risk (on at least one of the operationalizations) is at least 80%” got agreement from 5⁄27 ≈ 19%, and “risk (on at least one of the operationalizations) is at least 66%” got agreement from 9⁄27 ≈ 33%.
MIRI doesn’t have good reasons to support the claim of almost certain doom
I recently asked Eliezer why he didn’t suspect ELK to be helpful, and it seemed that one of his major reasons was that Paul was “wrongly” excited about IDA. It seems that at this point in time, neither Paul nor Eliezer are excited about IDA, but Eliezer got to the conclusion first. Although, the IDA-bearishness may be for fundamentally different reasons—I haven’t tried to figure that out yet.
Have you been taking this into account re: your ELK bullishness? Obviously, this sort of point should be ignored in favor of object-level arguments about ELK, but to be honest, ELK is taking me a while to digest, so for me that has to wait.
It seems that at this point in time, neither Paul nor Eliezer are excited about IDA
I’m still excited about IDA.
I assume this is coming from me saying that you need big additional conceptual progress to have an indefinitely scalable scheme. And I do think that’s more skeptical than my strongest pro-IDA claim here in early 2017:
I think there is a very good chance, perhaps as high as 50%, that this basic strategy can eventually be used to train benign state-of-the-art model-free RL agents. [...] That does not mean that I think the conceptual issues are worked out conclusively, but it does mean that I think we’re at the point where we’d benefit from empirical information about what works in practice
That said:
I think it’s up for grabs whether we’ll end up with something that counts as “this basic strategy.” (I think imitative generalization is the kind of thing I had in mind in that sentence, but many of the ELK schemes we are thinking about definitely aren’t, it’s pretty arbitrary.)
Also note that in that post I’m talking about something that produces a benign agent in practice, and in the other I’m talking about “indefinitely scalable.” Though my probability on “produces a benign agent in practice” is also definitely lower.
Did Eliezer give any details about what exactly was wrong about Paul’s excitement? Might just be an intuition gained from years of experience, but the more details we know the better, I think.
Eliezer has an opaque intuition that weird recursion is hard to get right on the first try. I want to interview him and write this up, but I don’t know if I’m capable of asking the right questions. Probably someone should do it.
Eliezer thinks people tend to be too optimistic in general
I’ve heard other people have an intuition that IDA is unaligned because HCH is unaligned because real human bureaucracies are unaligned
I’ll add that when I asked John Wentworth why he was IDA-bearish, he mentioned the inefficiency of bureaucracies and told me to read the following post to learn why interfaces and coordination are hard: Interfaces as a Scarce Resource.
In particular, other alignment researchers tend to think that competitive supervision (e.g. AIs competing for reward to provide assistance in AI control that humans evaluate positively, via methods such as debate and alignment bootstrapping, or ELK schemes).
Nitpick: I think this should either be a comment or an answer to Yitz’ upcoming followup post, since it isn’t an attempt to convince them that humanity is doomed.
(I moved it to “comments” for this reason. I missed the party where Yitz said there’d be an upcoming followup post, although I think that’d be a good idea where this comment would make a good answer. I would be interested in seeing top-level posts arguing the opposite view)
[Edited to link correct survey.]
It’s really largely Eliezer and some MIRI people. Most alignment researchers (e.g. at ARC, Deepmind, Open AI, Anthropic, CHAI) and most of the community [ETA: had wrong link here before] disagree (I count myself among those who disagree, although I am concerned about a big risk here), and think MIRI doesn’t have good reasons to support the claim of almost certain doom.
In particular, other alignment researchers tend to think that competitive supervision (e.g. AIs competing for reward to provide assistance in AI control that humans evaluate positively, via methods such as debate and alignment bootstrapping, or ELK schemes) has a good chance of working well enough to make better controls and so on. For an AI apocalypse it’s not only required that unaligned superintelligent AI outwit humans, but that all the safety/control/interpretabilty gains yielded by AI along the way also fail, creating a very challenging situation for misaligned AI.
Hm? I was recently at a 10-15 person lunch for people with >75% on doom, that included a number of non-MIRI people, including at least one person each from FHI and DeepMind and CHAI.
(Many of the people had interacted with MIRI or at some time worked with/for them, but work at other places now.)
Just registering your comment feels a little overstated, but you’re right to say a lot of this emanates from some folks at MIRI. For one, I had been betting a lot on MIRI, and now feel like a lot more responsibility has fallen on my plate.
You’ve now linked to the same survey twice in difference discussions of this topic, even though this survey, as far as I can tell, provides no evidence of the position you are trying to argue for. To copy Thomas Kwa’s response to your previous comment:
It also seems straightforwardly wrong that it’s just Eliezer and some MIRI people. While there is a wide variance in opinions on probability of doom from people working in AI Alignment, there are many people at Redwood, OpenAI and other organizations who assign very high probability here. I don’t think it’s at all accurate to say this fits neatly along organizational boundaries, nor is it at all accurate to say that this is “only” a small group of people. My current best guess is if we surveyed people working full-time on x-risk motivated AI Alignment, about 35% of people would assign a probability of doom above 80%.
Whoops, you’re right that I linked the wrong survey. I see others posted the link to Rob’s survey (done in response to some previous similar claims) and I edited my comment to fix the link.
I think you can identify a cluster of near certain doom views, e.g. ‘logistic success curve’ and odds of success being on the order of magnitude of 1% (vs 10%, or 90%) based around MIRI/Eliezer, with a lot of epistemic deference involved (visible on LW). I would say it is largely attributable there and without sufficient support.
”My current best guess is if we surveyed people working full-time on x-risk motivated AI Alignment, about 35% of people would assign a probability of doom above 80%.”
What do you make of Rob’s survey results (correct link this time)?
Depending on how you choose the survey population, I would bet that it’s fewer than 35%, at 2:1 odds.
(Though perhaps you’ve already updated against based on Rob’s survey results below; that survey happened because I offered to bet against a similar claim of doom probabilities from Rob, that I would have won if we had made the bet.)
Where would you put the numbers, roughly?
I’d just say the numbers from the survey below? Maybe slightly updated towards doom; I think probably some of the respondents have been influenced by recent wave of doomism.
If you had a more rigorously defined population, such that I could predict the differences between that population and the population surveyed below, I could predict more differences.
Not what you were asking for (time has passed, the Q is different, and the survey population is different too), but in my early 2021 survey of people who “[research] long-term AI topics, or who [have] done a lot of past work on such topics” at a half-dozen orgs, 3⁄27 ≈ 11% of those who marked “I’m doing (or have done) a lot of technical AI safety research.” gave an answer above 80% to at least one of my attempts to operationalize ‘x-risk from AI’. (And at least two of those three were MIRI people.)
The weaker claim “risk (on at least one of the operationalizations) is at least 80%” got agreement from 5⁄27 ≈ 19%, and “risk (on at least one of the operationalizations) is at least 66%” got agreement from 9⁄27 ≈ 33%.
I recently asked Eliezer why he didn’t suspect ELK to be helpful, and it seemed that one of his major reasons was that Paul was “wrongly” excited about IDA. It seems that at this point in time, neither Paul nor Eliezer are excited about IDA, but Eliezer got to the conclusion first. Although, the IDA-bearishness may be for fundamentally different reasons—I haven’t tried to figure that out yet.
Have you been taking this into account re: your ELK bullishness? Obviously, this sort of point should be ignored in favor of object-level arguments about ELK, but to be honest, ELK is taking me a while to digest, so for me that has to wait.
I’m still excited about IDA.
I assume this is coming from me saying that you need big additional conceptual progress to have an indefinitely scalable scheme. And I do think that’s more skeptical than my strongest pro-IDA claim here in early 2017:
That said:
I think it’s up for grabs whether we’ll end up with something that counts as “this basic strategy.” (I think imitative generalization is the kind of thing I had in mind in that sentence, but many of the ELK schemes we are thinking about definitely aren’t, it’s pretty arbitrary.)
Also note that in that post I’m talking about something that produces a benign agent in practice, and in the other I’m talking about “indefinitely scalable.” Though my probability on “produces a benign agent in practice” is also definitely lower.
Did Eliezer give any details about what exactly was wrong about Paul’s excitement? Might just be an intuition gained from years of experience, but the more details we know the better, I think.
Some scattered thoughts in this direction:
this post
Eliezer has an opaque intuition that weird recursion is hard to get right on the first try. I want to interview him and write this up, but I don’t know if I’m capable of asking the right questions. Probably someone should do it.
Eliezer thinks people tend to be too optimistic in general
I’ve heard other people have an intuition that IDA is unaligned because HCH is unaligned because real human bureaucracies are unaligned
I found this comment where Eliezer has detailed criticism of Paul’s alignment agenda including finding problems with “weird recursion”
I’ll add that when I asked John Wentworth why he was IDA-bearish, he mentioned the inefficiency of bureaucracies and told me to read the following post to learn why interfaces and coordination are hard: Interfaces as a Scarce Resource.
Unfinished sentence?
Nitpick: I think this should either be a comment or an answer to Yitz’ upcoming followup post, since it isn’t an attempt to convince them that humanity is doomed.
(I moved it to “comments” for this reason. I missed the party where Yitz said there’d be an upcoming followup post, although I think that’d be a good idea where this comment would make a good answer. I would be interested in seeing top-level posts arguing the opposite view)