It’s a combination of not finding Paul+Katja’s counterarguments convincing (AI Impacts has a slightly different version of the post, I think of this as the Paul+Katja post since I don’t know how much each of them did), having various other arguments that they didn’t consider, and thinking they may be making mistakes in how they frame things and what questions they ask. I originally planned to write a line-by-line rebuttal of the Paul+Katja posts, but instead I ended up writing a sequence of posts that collectively constitute my (indirect) response. If you want a more direct response, I can put it on my list of things to do, haha… sorry… I am a bit overwhelmed… OK here’s maybe some quick (mostly cached) thoughts:
1. What we care about is point of no return, NOT GDP doubling in a year or whatever.
2. PONR seems not particularly correlated with GDP acceleration time or speed, and thus maybe Paul and I are just talking past each other—he’s asking and answering the wrong questions.
3. Slow takeoff means shorter timelines, so if our timelines are independently pretty short, we should update against slow takeoff. My timelines are independently pretty short. (See my other sequence.) Paul runs this argument in the other direction I think; since takeoff will be slow, and we aren’t seeing the beginnings of it now, timelines must be long. (I don’t know how heavily he leans on this argument though, probably not much. Ajeya does this too, and does it too much I think.) Also, concretely, if crazy AI stuff happens in <10 years, probably the EMH has failed in this domain and probably we can get AI by just scaling up stuff and therefore probably takeoff will be fairly fast (at least, it seems that way extrapolating from GPT-1, GPT-2, and GPT-3. One year apart, significantly qualitatively and quantitatively better. If that’s what progress looks like when we are entering the “human range” then we will cross it quickly, it seems.)
4. Discontinuities totally do sometimes happen. I think we shouldn’t expect them by default, but they aren’t super low-prior either; thus, we should do gears-level modelling of AI rather than trying to build a reference class or analogy to other tech.
5. Most of Paul+Katja’s arguments seem to be about continuity vs. discontinuity, which I think is the wrong question to be asking. What we care about is how long it takes (in clock time, or perhaps clock-time-given-compute-and-researcher-budget-X, given current and near-future ideas/algorithms) for AI capabilities to go from “meh” to “dangerous.” THEN once we have an estimate of that, we can use that estimate to start thinking about whether this will happen in a distributed way across the whole world economy, or in a concentrated way in a single AI project, etc. (Analogy: We shouldn’t try to predict greenhouse gas emissions by extrapolating world temperature trends, since that gets the causation backwards.)
6. I think the arguments Paul+Katja makes aren’t super convincing on their own terms. They are sufficient to convince me that the slow takeoff world they describe is possible and deserves serious consideration (more so than e.g. Age of Em or CAIS) but not overall convincing enough for me to say “Bostrom and Yudkowsky were probably wrong.” I could go through them one by one but I think I’ll stop here for now.
Thanks! My understanding of the Bostrom+Yudkowsky takeoff argument goes like this: at some point, some AI team will discover the final piece of deep math needed to create an AGI; they will then combine this final piece with all of the other existing insights and build an AGI, which will quickly gain in capability and take over the world. (You can search “a brain in a box in a basement” on this page or see here for some more quotes.)
In contrast, the scenario you imagine seems to be more like (I’m not very confident I am getting all of this right): there isn’t some piece of deep math needed in the final step. Instead, we already have the tools (mathematical, computational, data, etc.) needed to build an AGI, but nobody has decided to just go for it. When one project finally decides to go for an AGI, this EMH failure allows them to maintain enough of a lead to do crazy stuff (conquistadors, persuasion tools, etc.), and this leads to DSA. Or maybe the EMH failure isn’t even required, just enough of a clock time lead to be able to do the crazy stuff.
If the above is right, then it does seem quite different from Paul+Katja, but also different from Bostrom+Yudkowsky, since the reason why the outcome is unipolar is different. Whereas Bostrom+Yudkowsky say the reason one project is ahead is because there is some hard step at the end, you instead say it’s because of some combination of EMH failure and natural lag between projects.
Ah, this is helpful, thanks—I think we just have different interpretations of Bostrom+Yudkowsky. You’ve probably been around before I was and read more of their stuff, but I first got interested in this around 2013, pre-ordered Superintelligence and read it with keen interest, etc. and the scenario you describe as mine is what I always thought Bostrom+Yudkowsky believed was most likely, and the scenario you describe as theirs—involving “deep math” and “one hard step at the end” is something I thought they held up as an example of how things could be super fast, but not as what they actually believed was most likely.
From what I’ve read, Yudkowsky did seem to think there would be more insights and less “just make blob of compute bigger” about a decade or two ago, but he’s long since updated towards “dear lord, people really are just going to make big blobs of inscrutable matrices, the fools!” and I don’t think this counts as a point against his epistemics because predicting the future is hard and most everyone else around him did even worse, I’d bet.
Ok I see, thanks for explaining. I think what’s confusing to me is that Eliezer did stop talking about the deep math of intelligence sometime after 2011 and then started talking about big blobs of matrices as you say starting around 2016, but as far as I know he has never gone back to his older AI takeoff writings and been like “actually I don’t believe this stuff anymore; I think hard takeoff is actually more likely to be due to EMH failure and natural lag between projects”. (He has done similar things for his older writings that he no longer thinks is true, so I would have expected him to do the same for takeoff stuff if his beliefs had indeed changed.) So I’ve been under the impression that Eliezer actually believes his old writings are still correct, and that somehow his recent remarks and old writings are all consistent. He also hasn’t (as far as I know) written up a more complete sketch of how he thinks takeoff is likely to go given what we now know about ML. So when I see him saying things like what’s quoted in Rob’s OP, I feel like he is referring to the pre-2012 “deep math” takeoff argument. (I also don’t remember if Bostrom gave any sketch of how he expects hard takeoff to go in Superintelligence; I couldn’t find one after spending a bit of time.)
If you have any links/quotes related to the above, I would love to know!
(By the way, I was was a lurker on LessWrong starting back in 2010-2011, but was only vaguely familiar with AI risk stuff back then. It was only around the publication of Superintelligence that I started following along more closely, and only much later in 2017 that I started putting in significant amounts of my time into AI safety and making it my overwhelming priority. I did write several timelines though, and recently did a pretty thorough reading of AI takeoff arguments for a modeling project, so that is mostly where my knowledge of the older arguments comes from.)
For all I know you are right about Yudkowsky’s pre-2011 view about deep math. However, (a) that wasn’t Bostrom’s view AFAICT, and (b) I think that’s just not what this OP quote is talking about. From the OP:
I feel like a bunch of people have shifted a bunch in the type of AI x-risk that worries them (representative phrase is “from Yudkowsky/Bostrom to What Failure Looks Like part 2 part 1”) and I still don’t totally get why.
It’s Yudkowsky/Bostrom, not Yudkowsky. And it’s WFLLp1, not p2. Part 2 is the one where the AIs do a treacherous turn; part 1 is where actually everything is fine except that “you get what you measure” and our dumb obedient AIs are optimizing for the things we told them to optimize for rather than for what we want.
I am pretty confident that WFLLp1 is not the main thing we should be worrying about; WFLLp2 is closer, but even it involves this slow-takeoff view (in the strong sense, in which economy is growing fast before the point of no return) which I’ve argued against. I do not think that the reason people shifted from “yudkowsky/bostrom” (which in this context seems to mean “single AI project builds AI in the wrong way, AI takes over world” and to WFLLp1 is that people rationally considered all the arguments and decided that WFLLp1 was on balance more likely. I think instead that probably some sort of optimism bias was involved, and more importantly win by default (Yud + Bostrom stopped talking about their scenarios and arguing for them, whereas Paul wrote a bunch of detailed posts laying out his scenarios and arguments, and so in the absence of visible counterarguments Paul wins the debate by default). Part of my feeling about this is that it’s a failure on my part; when Paul+Katja wrote their big post on takeoff speeds I disagreed with it and considered writing a big point-by-point response, but never did, even after various people posted questions asking “has there been any serious response to Paul+Katja?”
Re (a): I looked at chapters 4 and 5 of Superintelligence again, and I can kind of see what you mean, but I’m also frustrated that Bostrom seems really non-committal in the book. He lists a whole bunch of possibilities but then doesn’t seem to actually come out and give his mainline visualization/”median future”. For example he looks at historical examples of technology races and compares how much lag there was, which seems a lot like the kind of thinking you are doing, but then he also says things like “For example, if human-level AI is delayed because one key insight long eludes programmers, then when the final breakthrough occurs, the AI might leapfrog from below to radically above human level without even touching the intermediary rungs.” which sounds like the deep math view. Another relevant quote:
Building a seed AI might require insights and algorithms developed over many decades by the scientific community around the world. But it is possible that the last critical breakthrough idea might come from a single individual or a small group that succeeds in putting everything together. This scenario is less realistic for some AI architectures than others. A system that has a large number of parts that need to be tweaked and tuned to work effectively together, and then painstakingly loaded with custom-made cognitive content, is likely to require a larger project. But if a seed AI could be instantiated as a simple system, one whose construction depends only on getting a few basic principles right, then the feat might be within the reach of a small team or an individual. The likelihood of the final breakthrough being made by a small project increases if most previous progress in the field has been published in the open literature or made available as open source software.
Re (b): I don’t disagree with you here. (The only part that worries me is, I don’t have a good idea of what percentage of “AI safety people” shifted from one view to the other, whether were were also new people with different views coming in to the field, etc.) I realize the OP was mainly about failure scenarios, but it did also mention takeoffs (“takeoffs won’t be too fast”) and I was most curious about that part.
I also wish I knew what Bostrom’s median future was like, though I perhaps understand why he didn’t put it in his book—the incentives all push against it. Predicting the future is hard and people will hold it against you if you fail, whereas if you never try at all and instead say lots of vague prophecies, people will laud you as a visionary prophet.
Re (b) cool, I think we are on the same page then. Re takeoff being too fast—I think a lot of people these days think there will be plenty of big scary warning shots and fire alarms that motivate lots of people to care about AI risk and take it seriously. I think that suggests that a lot of people expect a fairly slow takeoff, slower than I think is warranted. Might happen, yes, but I don’t think Paul & Katja’s arguments are that convincing that takeoff will be this slow. It’s a big source of uncertainty for me though.
It’s a combination of not finding Paul+Katja’s counterarguments convincing (AI Impacts has a slightly different version of the post, I think of this as the Paul+Katja post since I don’t know how much each of them did), having various other arguments that they didn’t consider, and thinking they may be making mistakes in how they frame things and what questions they ask. I originally planned to write a line-by-line rebuttal of the Paul+Katja posts, but instead I ended up writing a sequence of posts that collectively constitute my (indirect) response. If you want a more direct response, I can put it on my list of things to do, haha… sorry… I am a bit overwhelmed… OK here’s maybe some quick (mostly cached) thoughts:
1. What we care about is point of no return, NOT GDP doubling in a year or whatever.
2. PONR seems not particularly correlated with GDP acceleration time or speed, and thus maybe Paul and I are just talking past each other—he’s asking and answering the wrong questions.
3. Slow takeoff means shorter timelines, so if our timelines are independently pretty short, we should update against slow takeoff. My timelines are independently pretty short. (See my other sequence.) Paul runs this argument in the other direction I think; since takeoff will be slow, and we aren’t seeing the beginnings of it now, timelines must be long. (I don’t know how heavily he leans on this argument though, probably not much. Ajeya does this too, and does it too much I think.) Also, concretely, if crazy AI stuff happens in <10 years, probably the EMH has failed in this domain and probably we can get AI by just scaling up stuff and therefore probably takeoff will be fairly fast (at least, it seems that way extrapolating from GPT-1, GPT-2, and GPT-3. One year apart, significantly qualitatively and quantitatively better. If that’s what progress looks like when we are entering the “human range” then we will cross it quickly, it seems.)
4. Discontinuities totally do sometimes happen. I think we shouldn’t expect them by default, but they aren’t super low-prior either; thus, we should do gears-level modelling of AI rather than trying to build a reference class or analogy to other tech.
5. Most of Paul+Katja’s arguments seem to be about continuity vs. discontinuity, which I think is the wrong question to be asking. What we care about is how long it takes (in clock time, or perhaps clock-time-given-compute-and-researcher-budget-X, given current and near-future ideas/algorithms) for AI capabilities to go from “meh” to “dangerous.” THEN once we have an estimate of that, we can use that estimate to start thinking about whether this will happen in a distributed way across the whole world economy, or in a concentrated way in a single AI project, etc. (Analogy: We shouldn’t try to predict greenhouse gas emissions by extrapolating world temperature trends, since that gets the causation backwards.)
6. I think the arguments Paul+Katja makes aren’t super convincing on their own terms. They are sufficient to convince me that the slow takeoff world they describe is possible and deserves serious consideration (more so than e.g. Age of Em or CAIS) but not overall convincing enough for me to say “Bostrom and Yudkowsky were probably wrong.” I could go through them one by one but I think I’ll stop here for now.
Thanks! My understanding of the Bostrom+Yudkowsky takeoff argument goes like this: at some point, some AI team will discover the final piece of deep math needed to create an AGI; they will then combine this final piece with all of the other existing insights and build an AGI, which will quickly gain in capability and take over the world. (You can search “a brain in a box in a basement” on this page or see here for some more quotes.)
In contrast, the scenario you imagine seems to be more like (I’m not very confident I am getting all of this right): there isn’t some piece of deep math needed in the final step. Instead, we already have the tools (mathematical, computational, data, etc.) needed to build an AGI, but nobody has decided to just go for it. When one project finally decides to go for an AGI, this EMH failure allows them to maintain enough of a lead to do crazy stuff (conquistadors, persuasion tools, etc.), and this leads to DSA. Or maybe the EMH failure isn’t even required, just enough of a clock time lead to be able to do the crazy stuff.
If the above is right, then it does seem quite different from Paul+Katja, but also different from Bostrom+Yudkowsky, since the reason why the outcome is unipolar is different. Whereas Bostrom+Yudkowsky say the reason one project is ahead is because there is some hard step at the end, you instead say it’s because of some combination of EMH failure and natural lag between projects.
Ah, this is helpful, thanks—I think we just have different interpretations of Bostrom+Yudkowsky. You’ve probably been around before I was and read more of their stuff, but I first got interested in this around 2013, pre-ordered Superintelligence and read it with keen interest, etc. and the scenario you describe as mine is what I always thought Bostrom+Yudkowsky believed was most likely, and the scenario you describe as theirs—involving “deep math” and “one hard step at the end” is something I thought they held up as an example of how things could be super fast, but not as what they actually believed was most likely.
From what I’ve read, Yudkowsky did seem to think there would be more insights and less “just make blob of compute bigger” about a decade or two ago, but he’s long since updated towards “dear lord, people really are just going to make big blobs of inscrutable matrices, the fools!” and I don’t think this counts as a point against his epistemics because predicting the future is hard and most everyone else around him did even worse, I’d bet.
Ok I see, thanks for explaining. I think what’s confusing to me is that Eliezer did stop talking about the deep math of intelligence sometime after 2011 and then started talking about big blobs of matrices as you say starting around 2016, but as far as I know he has never gone back to his older AI takeoff writings and been like “actually I don’t believe this stuff anymore; I think hard takeoff is actually more likely to be due to EMH failure and natural lag between projects”. (He has done similar things for his older writings that he no longer thinks is true, so I would have expected him to do the same for takeoff stuff if his beliefs had indeed changed.) So I’ve been under the impression that Eliezer actually believes his old writings are still correct, and that somehow his recent remarks and old writings are all consistent. He also hasn’t (as far as I know) written up a more complete sketch of how he thinks takeoff is likely to go given what we now know about ML. So when I see him saying things like what’s quoted in Rob’s OP, I feel like he is referring to the pre-2012 “deep math” takeoff argument. (I also don’t remember if Bostrom gave any sketch of how he expects hard takeoff to go in Superintelligence; I couldn’t find one after spending a bit of time.)
If you have any links/quotes related to the above, I would love to know!
(By the way, I was was a lurker on LessWrong starting back in 2010-2011, but was only vaguely familiar with AI risk stuff back then. It was only around the publication of Superintelligence that I started following along more closely, and only much later in 2017 that I started putting in significant amounts of my time into AI safety and making it my overwhelming priority. I did write several timelines though, and recently did a pretty thorough reading of AI takeoff arguments for a modeling project, so that is mostly where my knowledge of the older arguments comes from.)
For all I know you are right about Yudkowsky’s pre-2011 view about deep math. However, (a) that wasn’t Bostrom’s view AFAICT, and (b) I think that’s just not what this OP quote is talking about. From the OP:
It’s Yudkowsky/Bostrom, not Yudkowsky. And it’s WFLLp1, not p2. Part 2 is the one where the AIs do a treacherous turn; part 1 is where actually everything is fine except that “you get what you measure” and our dumb obedient AIs are optimizing for the things we told them to optimize for rather than for what we want.
I am pretty confident that WFLLp1 is not the main thing we should be worrying about; WFLLp2 is closer, but even it involves this slow-takeoff view (in the strong sense, in which economy is growing fast before the point of no return) which I’ve argued against. I do not think that the reason people shifted from “yudkowsky/bostrom” (which in this context seems to mean “single AI project builds AI in the wrong way, AI takes over world” and to WFLLp1 is that people rationally considered all the arguments and decided that WFLLp1 was on balance more likely. I think instead that probably some sort of optimism bias was involved, and more importantly win by default (Yud + Bostrom stopped talking about their scenarios and arguing for them, whereas Paul wrote a bunch of detailed posts laying out his scenarios and arguments, and so in the absence of visible counterarguments Paul wins the debate by default). Part of my feeling about this is that it’s a failure on my part; when Paul+Katja wrote their big post on takeoff speeds I disagreed with it and considered writing a big point-by-point response, but never did, even after various people posted questions asking “has there been any serious response to Paul+Katja?”
Re (a): I looked at chapters 4 and 5 of Superintelligence again, and I can kind of see what you mean, but I’m also frustrated that Bostrom seems really non-committal in the book. He lists a whole bunch of possibilities but then doesn’t seem to actually come out and give his mainline visualization/”median future”. For example he looks at historical examples of technology races and compares how much lag there was, which seems a lot like the kind of thinking you are doing, but then he also says things like “For example, if human-level AI is delayed because one key insight long eludes programmers, then when the final breakthrough occurs, the AI might leapfrog from below to radically above human level without even touching the intermediary rungs.” which sounds like the deep math view. Another relevant quote:
Re (b): I don’t disagree with you here. (The only part that worries me is, I don’t have a good idea of what percentage of “AI safety people” shifted from one view to the other, whether were were also new people with different views coming in to the field, etc.) I realize the OP was mainly about failure scenarios, but it did also mention takeoffs (“takeoffs won’t be too fast”) and I was most curious about that part.
I also wish I knew what Bostrom’s median future was like, though I perhaps understand why he didn’t put it in his book—the incentives all push against it. Predicting the future is hard and people will hold it against you if you fail, whereas if you never try at all and instead say lots of vague prophecies, people will laud you as a visionary prophet.
Re (b) cool, I think we are on the same page then. Re takeoff being too fast—I think a lot of people these days think there will be plenty of big scary warning shots and fire alarms that motivate lots of people to care about AI risk and take it seriously. I think that suggests that a lot of people expect a fairly slow takeoff, slower than I think is warranted. Might happen, yes, but I don’t think Paul & Katja’s arguments are that convincing that takeoff will be this slow. It’s a big source of uncertainty for me though.