(I’m trying to answer and clarify some of the points in the comments based on my interpretation of Yudkowsky in this post. So take the interpretations with a grain of salt, not as “exactly what Yudkowsky meant”)
Progress in AI has largely been a function of increasing compute, human software research efforts, and serial time/steps. Throwing more compute at researchers has improved performance both directly and indirectly (e.g. by enabling more experiments, refining evaluation functions in chess, training neural networks, or making algorithms that work best with large compute more attractive).
Historically compute has grown by many orders of magnitude, while human labor applied to AI and supporting software by only a few. And on plausible decompositions of progress (allowing for adjustment of software to current hardware and vice versa), hardware growth accounts for more of the progress over time than human labor input growth.
So if you’re going to use an AI production function for tech forecasting based on inputs (which do relatively OK by the standards tech forecasting), it’s best to use all of compute, labor, and time, but it makes sense for compute to have pride of place and take in more modeling effort and attention, since it’s the biggest source of change (particularly when including software gains downstream of hardware technology and expenditures).
My summary of what you’re defending here: because hardware progress is (according to you) the major driver of AI innovation, then we should invest a lot of our forecasting resources into forecasting it, and we should leverage it as the strongest source of evidence available for thinking about AGI timelines.
I feel like this is not in contradiction with what Yudkowsky wrote in this post? I doubt he agrees that just additional compute is the main driver of progress (after all, the Bitter Lesson mostly tells you that insights and innovations leveraging more compute will beat hardcorded ones), but insofar as he expect us to have next to no knowledge of how to build AGI until around 2 years before it is done (and then only for those with the Thelian secret), then compute is indeed the next best thing that we have to estimate timelines.
Yet Yudkowsky’s point is that being the next best thing doesn’t mean it’s any good.
Thinking about hardware has a lot of helpful implications for constraining timelines:
Evolutionary anchors, combined with paleontological and other information (if you’re worried about Rare Earth miracles), mostly cut off extremely high input estimates for AGI development, like Robin Hanson’s, and we can say from known human advantages relative to evolution that credence should be suppressed some distance short of that (moreso with more software progress)
Evolution being an upper bound makes sense, and I think Yudkowsky agrees. But it’s an upper bound on the whole human optimization process, and the search space of the human optimization is tricky to think about. I see much of Yudkowsky’s criticisms of biological estimates here as saying “this biological anchor doesn’t express the cost of evolution’s optimization in terms of human optimization, but instead goes for a proxy which doesn’t tell you anything”.
So if someone captured both evolution and human optimization in the same search space, and found an upper bound on the cost (in terms of optimization power) that evolution spent to find humans, then I expect Yudkowsky would agree that this is an upper bound for the optimization power that human will use. But he might still retort that translating optimization power into compute is not obvious.
You should have lower a priori credence in smaller-than-insect brains yielding AGI than more middle of the range compute budgets
Okay, I’m going to propose what I think is the chain of arguments you’re using here:
Currently, we can train what sounds like the compute equivalent of insect brains, and yet we don’t have AGI. Hence we’re not currently able to build AGI with “smaller-than-insect brains”, which means AGI is less likely to be created with “smaller-than-insect brains”.
I agree that we don’t have AGI
The “compute equivalent” stuff is difficult, as I mentioned above, but I don’t think this is the main issue here.
Going from “we don’t know how to do that now” to “we should expect that it is not how we will do it” doesn’t really work IMO. As Yudkowsky points out, the requirements for AGI are constantly dropping, and maybe a new insight will turn out to make smaller neural nets far more powerful, before the bigger models reach AGI
Evolution created insect-sized brains and they were clearly not AGI, so we have evidence against AGI with that amount of resources.
Here the fact that evolution is far worse an optimizer than humans breaks most of the connection between evolution creating insects and humans creating AGI. Evolution merely shows that insects can be made with insect-sized brains, not that AGI cannot be extracted by better use of the same resources.
From my perspective this is exactly what Yudkowsky is arguing against in this post: it’s not because you know of a bunch of paths through search space that you know what a cleverer optimizer could find. There are ways to use a bunch of paths as data to understand the search space, but you then need either to argue that they are somehow dense in the search space, or that the sort of paths you’re interested in look similar to this bunch of paths. And at the moment, I don’t see an argument in any of these forms.
By default we should expect AGI to have a decent minimal size because of it’s complexity, hence smaller models have a lower credence.
Agree with the principle (sounds improbable that AGI will be made in 10 lines of LISP), but the threshold is where most of the difficulty lies: how much is too little? A 100 neurons sounds clearly too small, but when you reach insect-sized brains, it’s not obvious (at least to me) that better use of resources couldn’t bring you most of the way to AGI.
(I wonder if there’s an availability bias here where the only good models we have nowadays are huge, hence we expect that AGI must be a huge model?)
It lets you see you should concentrate probability mass in the next decade or so because of the rapid scaleup of compute investment (with a supporting argument from the increased growth of AI R&D effort) covering a substantial share of the orders of magnitude between where we are and levels that we should expect are overkill
I think this is where the crux of can the current paradigm just scale matters a lot. The main point Yudkowsky uses in the dialogue to argue against your concentration of probability mass is that he doesn’t agree that deep learning scales that way to AGI. In his view (on which I’m not clear yet, and that’s not a view that I’ve seen anyone who actually studies LMs have), the increase in performance will break before. And as such, the concentration of probability mass shouldn’t happen, because the fact that you can reach the anchor is irrelevant since we don’t know a way to turn compute into AGI (according to Yudkowsky’s view).
It gets you likely AGI this century, and on the closer part of that, with a pretty flat prior over orders of magnitude of inputs that will go into success of magnitude of inputs
Here too, it depends on transforming the optimization power of evolution into compute and other requirements, and then know how this compute is supposed to get transformed into efficiency and AGI. (That being said, I think Yudkowsky agrees with the conclusion, just not that specific way of reaching it).
It suggests lower annual probability later on if Moore’s Law and friends are dead, with stagnant inputs to AI
Not clear to me what you mean here (might be clearer with the right link to the section of Cotra’s report about this). But note that based on Yudkowsky’s model in this post, the cost to make AGI should continue to drop as long as the world doesn’t end, which creates a weird situation where the probability of AGI keeps increasing with time (Not sure how to turn that into a distribution though...)
These are all useful things highlighted by Ajeya’s model, and by earlier work like Moravec’s. In particular, I think Moravec’s forecasting methods are looking pretty good, given the difficulty of the problem. He and Kurzweil (like the computing industry generally) were surprised by the death of Dennard scaling and general price-performance of computing growth slowing, and we’re definitely years behind his forecasts in AI capability, but we are seeing a very compute-intensive AI boom in the right region of compute space. Moravec also did anticipate it would take a lot more compute than one lifetime run to get to AGI. He suggested human-level AGI would be in the vicinity of human-like compute quantities being cheap and available for R&D. This old discussion is flawed, but makes me feel the dialogue is straw-manning Moravec to some extent.
This is in the same spirit as a bunch of comments on this post, and I feel like it’s missing the point of the post? Like, it’s not about Moravec’s estimate being wildly wrong, it’s about the unsoundedness of the methods by which Moravec reaches his conclusion. Your analysis doesn’t give such evidence for Moravec predicting accuracy that we should expect he has a really strong method that just looks bad to Yudkowksy but is actually sound. And I feel points like that don’t go at all for the cruxes (the soundness of the method), instead they mostly correct a “too harsh judgment” by Yudkowsky, without invalidating his points.
Ajeya’s model puts most of the modeling work on hardware, but it is intentionally expressive enough to let you represent a lot of different views about software research progress, you just have to contribute more of that yourself when adjusting weights on the different scenarios, or effective software contribution year by year. You can even represent a breakdown of the expectation that software and hardware significantly trade off over time, and very specific accounts of the AI software landscape and development paths. Regardless modeling the most importantly changing input to AGI is useful, and I think this dialogue misleads with respect to that by equivocating between hardware not being the only contributing factor and not being an extremely important to dominant driver of progress.
Hum, my impression here is that Yudkowsky is actually arguing that he is modeling AGI timelines that way; and if you don’t add unwarranted assumptions and don’t misuse the analogies to biological anchors, then you get his model, which is completely unable to give the sort of answer Cotra’s model is outputting.
Or said differently, I expect that Yudkowsky thinks that if you reason correctly and only use actual evidence instead of unsound lines of reasoning, you get his model; but doing that in the explicit context of biological anchors is like trying to quit sugar in a sweetshop: the whole setting just makes that far harder. And given that he expects that he get the right constraints on models without the biological anchors stuff, then it’s completely redundant AND unhelpful.
(I’m trying to answer and clarify some of the points in the comments based on my interpretation of Yudkowsky in this post. So take the interpretations with a grain of salt, not as “exactly what Yudkowsky meant”)
My summary of what you’re defending here: because hardware progress is (according to you) the major driver of AI innovation, then we should invest a lot of our forecasting resources into forecasting it, and we should leverage it as the strongest source of evidence available for thinking about AGI timelines.
I feel like this is not in contradiction with what Yudkowsky wrote in this post? I doubt he agrees that just additional compute is the main driver of progress (after all, the Bitter Lesson mostly tells you that insights and innovations leveraging more compute will beat hardcorded ones), but insofar as he expect us to have next to no knowledge of how to build AGI until around 2 years before it is done (and then only for those with the Thelian secret), then compute is indeed the next best thing that we have to estimate timelines.
Yet Yudkowsky’s point is that being the next best thing doesn’t mean it’s any good.
Evolution being an upper bound makes sense, and I think Yudkowsky agrees. But it’s an upper bound on the whole human optimization process, and the search space of the human optimization is tricky to think about. I see much of Yudkowsky’s criticisms of biological estimates here as saying “this biological anchor doesn’t express the cost of evolution’s optimization in terms of human optimization, but instead goes for a proxy which doesn’t tell you anything”.
So if someone captured both evolution and human optimization in the same search space, and found an upper bound on the cost (in terms of optimization power) that evolution spent to find humans, then I expect Yudkowsky would agree that this is an upper bound for the optimization power that human will use. But he might still retort that translating optimization power into compute is not obvious.
Okay, I’m going to propose what I think is the chain of arguments you’re using here:
Currently, we can train what sounds like the compute equivalent of insect brains, and yet we don’t have AGI. Hence we’re not currently able to build AGI with “smaller-than-insect brains”, which means AGI is less likely to be created with “smaller-than-insect brains”.
I agree that we don’t have AGI
The “compute equivalent” stuff is difficult, as I mentioned above, but I don’t think this is the main issue here.
Going from “we don’t know how to do that now” to “we should expect that it is not how we will do it” doesn’t really work IMO. As Yudkowsky points out, the requirements for AGI are constantly dropping, and maybe a new insight will turn out to make smaller neural nets far more powerful, before the bigger models reach AGI
Evolution created insect-sized brains and they were clearly not AGI, so we have evidence against AGI with that amount of resources.
Here the fact that evolution is far worse an optimizer than humans breaks most of the connection between evolution creating insects and humans creating AGI. Evolution merely shows that insects can be made with insect-sized brains, not that AGI cannot be extracted by better use of the same resources.
From my perspective this is exactly what Yudkowsky is arguing against in this post: it’s not because you know of a bunch of paths through search space that you know what a cleverer optimizer could find. There are ways to use a bunch of paths as data to understand the search space, but you then need either to argue that they are somehow dense in the search space, or that the sort of paths you’re interested in look similar to this bunch of paths. And at the moment, I don’t see an argument in any of these forms.
By default we should expect AGI to have a decent minimal size because of it’s complexity, hence smaller models have a lower credence.
Agree with the principle (sounds improbable that AGI will be made in 10 lines of LISP), but the threshold is where most of the difficulty lies: how much is too little? A 100 neurons sounds clearly too small, but when you reach insect-sized brains, it’s not obvious (at least to me) that better use of resources couldn’t bring you most of the way to AGI.
(I wonder if there’s an availability bias here where the only good models we have nowadays are huge, hence we expect that AGI must be a huge model?)
I think this is where the crux of can the current paradigm just scale matters a lot. The main point Yudkowsky uses in the dialogue to argue against your concentration of probability mass is that he doesn’t agree that deep learning scales that way to AGI. In his view (on which I’m not clear yet, and that’s not a view that I’ve seen anyone who actually studies LMs have), the increase in performance will break before. And as such, the concentration of probability mass shouldn’t happen, because the fact that you can reach the anchor is irrelevant since we don’t know a way to turn compute into AGI (according to Yudkowsky’s view).
Here too, it depends on transforming the optimization power of evolution into compute and other requirements, and then know how this compute is supposed to get transformed into efficiency and AGI. (That being said, I think Yudkowsky agrees with the conclusion, just not that specific way of reaching it).
Not clear to me what you mean here (might be clearer with the right link to the section of Cotra’s report about this). But note that based on Yudkowsky’s model in this post, the cost to make AGI should continue to drop as long as the world doesn’t end, which creates a weird situation where the probability of AGI keeps increasing with time (Not sure how to turn that into a distribution though...)
This is in the same spirit as a bunch of comments on this post, and I feel like it’s missing the point of the post? Like, it’s not about Moravec’s estimate being wildly wrong, it’s about the unsoundedness of the methods by which Moravec reaches his conclusion. Your analysis doesn’t give such evidence for Moravec predicting accuracy that we should expect he has a really strong method that just looks bad to Yudkowksy but is actually sound. And I feel points like that don’t go at all for the cruxes (the soundness of the method), instead they mostly correct a “too harsh judgment” by Yudkowsky, without invalidating his points.
Hum, my impression here is that Yudkowsky is actually arguing that he is modeling AGI timelines that way; and if you don’t add unwarranted assumptions and don’t misuse the analogies to biological anchors, then you get his model, which is completely unable to give the sort of answer Cotra’s model is outputting.
Or said differently, I expect that Yudkowsky thinks that if you reason correctly and only use actual evidence instead of unsound lines of reasoning, you get his model; but doing that in the explicit context of biological anchors is like trying to quit sugar in a sweetshop: the whole setting just makes that far harder. And given that he expects that he get the right constraints on models without the biological anchors stuff, then it’s completely redundant AND unhelpful.