I do agree that the halving-of-compute-costs-every-2.5-years estimate seems too slow to me; it seems like that’s the rate of “normal incremental progress” but that when you account for the sort of really important ideas (or accumulations of ideas, or shifts in research direction towards more fruitful paths) that happen about once a decade, the rate should be faster than that.
I don’t think this is what Yudkowsky is saying at all in the post. Actually, I think he is saying the exact opposite: that 2.5 years estimate is too fast as an estimate that is supposed to always work. If I understand correctly, his point is that you have significantly less than that most of the time, except during the initial growth after paradigms shifts where you’re pushing as much compute as you can on your new paradigm. (That being said, Yudkowsky seems to agree with you that this should make us directionally update towards AGI arriving in less time)
My interpretation seems backed by this quote (and the fact that he’s presenting these points as if they’re clearly wrong):
Eliezer: Backtesting this viewpoint on the previous history of computer science, it seems to me to assert that it should be possible to:
Train a pre-Transformer RNN/CNN-based model, not using any other techniques invented after 2017, to GPT-2 levels of performance, using only around 2x as much compute as GPT-2;
Play pro-level Go using 8-16 times as much computing power as AlphaGo, but only 2006 levels of technology.
[...]
Your model apparently suggests that we have gotten around 50 times more efficient at turning computation into intelligence since that time; so, we should be able to replicate any modern feat of deep learning performed in 2021, using techniques from before deep learning and around fifty times as much computing power.
This seems true but changing the subject. Insofar as the subject is “what should our probability distribution over date-of-AGI-creation look like” then Ajeya’s framework (broadly construed) is the right way to think about it IMO. Separately, we should worry that this will never let us predict with confidence that it is happening in X years, and thus we should be trying to have a general policy that lets us react quickly to e.g. two years of warning.
I don’t understand how Yudkowsky can be changing the subject when his subject has never been about “probability distribution over date-of-AGI-creation”? His point IMO is that this is a bad question to ask, not because you wouldn’t want the true answer if you could magically get it, but because we don’t have and won’t have even close to the amount of evidence needed to do this non-trivially until 2 years before AGI (and maybe not even then, because you need to know the Thielian secrets). As such, to reach an answer that fit that type, you must contort the evidence and extract more bits of information that the analogies actually contain, which means that this is a recipe for saying nonsense.
(Note that I’m not arguing Yudkowsky is right, just that I think this is his point, and that your comment is missing it — might be wrong about all of those ^^)
I think OpenPhil is totally right here. My own stance is that the 2050-centered distribution is a directional overestimate because e.g. the long-horizon anchor is a soft upper bound (in fact I think the medium-horizon anchor is a soft upper bound too, see Fun with +12 OOMs.)
Here too this sounds like missing Yudkowsky’s point, which is made in the paragraph just after your original quote:
Eliezer: Mmm… there’s some justice to that, now that I’ve come to write out this part of the dialogue. Okay, let me revise my earlier stated opinion: I think that your biological estimate is a trick that never works and, on its own terms, would tell us very little about AGI arrival times at all. Separately, I think from my own model that your timeline distributions happen to be too long.
My interpretation is that he’s saying that:
The model, and the whole approach, is a fundamentally bad and misguided way of thinking about these questions, which falls in the many ways he’s arguing for before in the dialogue
If he stops talking about whether the model is bad, and just looks at its output, then he thinks that’s an overestimate compared to the output of his own model.
Thanks for this comment (and the other comment below also).
I think we don’t really disagree that much here. I may have just poorly communicated, slash maybe I’m objecting to the way Yudkowsky said things because I read it as implying things I disagree with.
I don’t think this is what Yudkowsky is saying at all in the post. Actually, I think he is saying the exact opposite: that 2.5 years estimate is too fast as an estimate that is supposed to always work. If I understand correctly, his point is that you have significantly less than that most of the time, except during the initial growth after paradigms shifts where you’re pushing as much compute as you can on your new paradigm. (That being said, Yudkowsky seems to agree with you that this should make us directionally update towards AGI arriving in less time)
That’s what I think too—normal incremental progress is probably slower than 2.5-year doubling, but there’s also occasional breakthrough progress which is much faster, and it all balances out to a faster-than-2.5-year-doubling, but in such a way that makes it really hard to predict, because so much hangs on whether and when breakthroughs happen. I think I just miscommunicated.
Eliezer: Mmm… there’s some justice to that, now that I’ve come to write out this part of the dialogue. Okay, let me revise my earlier stated opinion: I think that your biological estimate is a trick that never works and, on its own terms, would tell us very little about AGI arrival times at all. Separately, I think from my own model that your timeline distributions happen to be too long.
--The model, and the whole approach, is a fundamentally bad and misguided way of thinking about these questions, which falls in the many ways he’s arguing for before in the dialogue.
--If he stops talking about whether the model is bad, and just looks at its output, then he thinks that’s an overestimate compared to the output of his own model.
Here I think I share your interpretation of Yudkowsky; I just disagree with Yudkowsky. I agree on the second part; the model overestimates median TAI arrival time. But I disagree on the first part—I think that having a probability distribution over when to expect TAI / AGI / AI-PONR etc. is pretty important/decision-relevant, e.g. for advising people on whether to go to grad school, or for deciding what sort of research project to undertake. (Perhaps Yudkowsky agrees with this much.) And I think that Ajeya’s framework is the best framework I know of for getting that distribution. I think any reasonable distribution should be formed by Ajeya’s framework, or some more complicated model that builds off of it (adding more bells and whistles such as e.g. a data-availability constraint or a probability-of-paradigm-shift mechanic.). Insofar as Yudkowsky was arguing against this, and saying that we need to throw out the whole model and start from scratch with a different model, I was not convinced. (Though maybe I need to reread the post and/or your steelman summary)
Here I think I share your interpretation of Yudkowsky; I just disagree with Yudkowsky. I agree on the second part; the model overestimates median TAI arrival time. But I disagree on the first part—I think that having a probability distribution over when to expect TAI / AGI / AI-PONR etc. is pretty important/decision-relevant, e.g. for advising people on whether to go to grad school, or for deciding what sort of research project to undertake. (Perhaps Yudkowsky agrees with this much.)
Hum, I would say Yudkowsky seems to agree with the value of a probability distribution for timelines.
So to me it seems “obvious” that my view of optimization is only strong enough to produce loose, qualitative conclusions, and that it can only be matched to its retrodiction of history, or wielded to pro- duce future predictions, on the level of qualitative physics.
“Things should speed up here,” I could maybe say. But not “The doubling time of this exponential should be cut in half.”
I aspire to a deeper understanding of intelligence than this, mind you. But I’m not sure that even perfect Bayesian enlightenment would let me predict quantitatively how long it will take an AI to solve various problems in advance of it solving them. That might just rest on features of an unexplored solution space which I can’t guess in advance, even though I understand the process that searches.
On the other hand, my interpretation of Yudkowsky strongly disagree with the second part of your paragraph:
And I think that Ajeya’s framework is the best framework I know of for getting that distribution. I think any reasonable distribution should be formed by Ajeya’s framework, or some more complicated model that builds off of it (adding more bells and whistles such as e.g. a data-availability constraint or a probability-of-paradigm-shift mechanic.). Insofar as Yudkowsky was arguing against this, and saying that we need to throw out the whole model and start from scratch with a different model, I was not convinced. (Though maybe I need to reread the post and/or your steelman summary)
So my interpretation of the text is that Yudkowsky says that you need to know how compute will be transformed into AGI to estimate the timelines (then you can plug your estimates for the compute), and that the default of any approach which relies on biological analogies for that part will be sprouting nonsense, because evolution and biology optimize in fundamentally different ways than human researchers do.
For each of the three examples, he goes into more detail about the way this is instantiated. My understanding of his criticism of Ajeya’s model is that he disagrees that just current deep learning algorithms are actually a recipe for turning compute into AGI, and so saying “we keep to current deep learning and estimated the required compute” doesn’t make sense and doesn’t solve the question of how to turn compute into AGI. (Note that his might be the place where you or someone defending Ajeya’s model want to disagree with Yudkowsky. I’m just pointing that this is a more productive place to debate him because that might actually make him change his mind — or change your mind if he convinces you)
The more general argument (the reason why “the trick” doesn’t work) is that if you actually have a way of transforming compute into AGI, that means you know how to build AGI. And if you do, you’re very, very close to the end of the timeline.
I guess I would say: Ajeya’s framework/model can incorporate this objection; this isn’t a “get rid of the whole framework” objection but rather a “tweak the model in the following way” objection.
Like, I agree that it would be bad if everyone who used Ajeya’s model had to put 100% of their probability mass into the six bio anchors she chose. That’s super misleading/biasing/ignores loads of other possible ways AGI might happen. But I don’t think of this as a necessary part of Ajeya’s model; when I use it, I throw out the six bio anchors and just directly input my probability distribution over OOMs of compute. My distribution is informed by the bio anchors, of course, but that’s not the only thing that informs it.
First, I want to clarify that I feel we’re going into a more interesting place, where there’s a better chance that you might find a point that invalidates Yudkowsky’s argument, and can thus convince him of the value of the model.
But it’s also important to realize that IMO, Yudkowsky is not just saying that biological anchors are bad. The more general problem (which is also developed in this post) is that predicting the Future is really hard. In his own model of AGI timelines, the factor that is basically impossible to predict until you can make AGI is the “how much resources are needed to build AGI”.
So saying “let’s just throw away the biological anchors” doesn’t evade the general counterargument that to predict timelines at all, you need to find information on “how much resources are needed to build AGI”, and that is incredibly hard. If you or Ajeya can argue for actual evidence in that last question, then yeah, I expect Yudkowsky would possibly update on the validity of the timeline estimates.
But at the moment, in this thread, I see no argument like that.
I don’t think this is what Yudkowsky is saying at all in the post. Actually, I think he is saying the exact opposite: that 2.5 years estimate is too fast as an estimate that is supposed to always work. If I understand correctly, his point is that you have significantly less than that most of the time, except during the initial growth after paradigms shifts where you’re pushing as much compute as you can on your new paradigm. (That being said, Yudkowsky seems to agree with you that this should make us directionally update towards AGI arriving in less time)
My interpretation seems backed by this quote (and the fact that he’s presenting these points as if they’re clearly wrong):
I don’t understand how Yudkowsky can be changing the subject when his subject has never been about “probability distribution over date-of-AGI-creation”? His point IMO is that this is a bad question to ask, not because you wouldn’t want the true answer if you could magically get it, but because we don’t have and won’t have even close to the amount of evidence needed to do this non-trivially until 2 years before AGI (and maybe not even then, because you need to know the Thielian secrets). As such, to reach an answer that fit that type, you must contort the evidence and extract more bits of information that the analogies actually contain, which means that this is a recipe for saying nonsense.
(Note that I’m not arguing Yudkowsky is right, just that I think this is his point, and that your comment is missing it — might be wrong about all of those ^^)
Here too this sounds like missing Yudkowsky’s point, which is made in the paragraph just after your original quote:
My interpretation is that he’s saying that:
The model, and the whole approach, is a fundamentally bad and misguided way of thinking about these questions, which falls in the many ways he’s arguing for before in the dialogue
If he stops talking about whether the model is bad, and just looks at its output, then he thinks that’s an overestimate compared to the output of his own model.
Thanks for this comment (and the other comment below also).
I think we don’t really disagree that much here. I may have just poorly communicated, slash maybe I’m objecting to the way Yudkowsky said things because I read it as implying things I disagree with.
That’s what I think too—normal incremental progress is probably slower than 2.5-year doubling, but there’s also occasional breakthrough progress which is much faster, and it all balances out to a faster-than-2.5-year-doubling, but in such a way that makes it really hard to predict, because so much hangs on whether and when breakthroughs happen. I think I just miscommunicated.
Here I think I share your interpretation of Yudkowsky; I just disagree with Yudkowsky. I agree on the second part; the model overestimates median TAI arrival time. But I disagree on the first part—I think that having a probability distribution over when to expect TAI / AGI / AI-PONR etc. is pretty important/decision-relevant, e.g. for advising people on whether to go to grad school, or for deciding what sort of research project to undertake. (Perhaps Yudkowsky agrees with this much.) And I think that Ajeya’s framework is the best framework I know of for getting that distribution. I think any reasonable distribution should be formed by Ajeya’s framework, or some more complicated model that builds off of it (adding more bells and whistles such as e.g. a data-availability constraint or a probability-of-paradigm-shift mechanic.). Insofar as Yudkowsky was arguing against this, and saying that we need to throw out the whole model and start from scratch with a different model, I was not convinced. (Though maybe I need to reread the post and/or your steelman summary)
Hum, I would say Yudkowsky seems to agree with the value of a probability distribution for timelines.
(Quoting The Weak Inside View (2008) from the AI FOOM Debate)
On the other hand, my interpretation of Yudkowsky strongly disagree with the second part of your paragraph:
So my interpretation of the text is that Yudkowsky says that you need to know how compute will be transformed into AGI to estimate the timelines (then you can plug your estimates for the compute), and that the default of any approach which relies on biological analogies for that part will be sprouting nonsense, because evolution and biology optimize in fundamentally different ways than human researchers do.
For each of the three examples, he goes into more detail about the way this is instantiated. My understanding of his criticism of Ajeya’s model is that he disagrees that just current deep learning algorithms are actually a recipe for turning compute into AGI, and so saying “we keep to current deep learning and estimated the required compute” doesn’t make sense and doesn’t solve the question of how to turn compute into AGI. (Note that his might be the place where you or someone defending Ajeya’s model want to disagree with Yudkowsky. I’m just pointing that this is a more productive place to debate him because that might actually make him change his mind — or change your mind if he convinces you)
The more general argument (the reason why “the trick” doesn’t work) is that if you actually have a way of transforming compute into AGI, that means you know how to build AGI. And if you do, you’re very, very close to the end of the timeline.
I guess I would say: Ajeya’s framework/model can incorporate this objection; this isn’t a “get rid of the whole framework” objection but rather a “tweak the model in the following way” objection.
Like, I agree that it would be bad if everyone who used Ajeya’s model had to put 100% of their probability mass into the six bio anchors she chose. That’s super misleading/biasing/ignores loads of other possible ways AGI might happen. But I don’t think of this as a necessary part of Ajeya’s model; when I use it, I throw out the six bio anchors and just directly input my probability distribution over OOMs of compute. My distribution is informed by the bio anchors, of course, but that’s not the only thing that informs it.
First, I want to clarify that I feel we’re going into a more interesting place, where there’s a better chance that you might find a point that invalidates Yudkowsky’s argument, and can thus convince him of the value of the model.
But it’s also important to realize that IMO, Yudkowsky is not just saying that biological anchors are bad. The more general problem (which is also developed in this post) is that predicting the Future is really hard. In his own model of AGI timelines, the factor that is basically impossible to predict until you can make AGI is the “how much resources are needed to build AGI”.
So saying “let’s just throw away the biological anchors” doesn’t evade the general counterargument that to predict timelines at all, you need to find information on “how much resources are needed to build AGI”, and that is incredibly hard. If you or Ajeya can argue for actual evidence in that last question, then yeah, I expect Yudkowsky would possibly update on the validity of the timeline estimates.
But at the moment, in this thread, I see no argument like that.