Thanks for the thoughtful and detailed comments! I’ll respond to a few points, otherwise in general I’m just nodding in agreement.
I think it’s important to emphasize (a) that Davidson’s model is mostly about pre-AGI takeoff (20% automation to 100%) rather than post-AGI takeoff (100% to superintelligence) but it strongly suggests that the latter will be very fast (relative to what most people naively expect) on the order of weeks probably and very likely less than a year.
And it’s a good model, so we need to take this seriously. My only quibble would be to raise again the possibility (only a possibility!) that progress becomes more difficult around the point where we reach AGI, because that is the point where we’d be outgrowing human training data. I haven’t tried to play with the model and see whether that would significantly affect the post-AGI takeoff timeline.
(Oh, and now that I think about it more, I’d guess that Davidson’s model significantly underestimates the speed of post-AGI takeoff, because it might just treat anything above AGI as merely 100% automation, whereas actually there are different degrees of 100% automation corresponding to different levels of quality intelligence; 100% automation by ASI will be significantly more research-oomph than 100% automation by AGI. But I’d need to reread the model to decide whether this is true or not. You’ve read it recently, what do you think?)
I want to say that he models this by equating the contribution of one ASI to more than one AGI, i.e. treating additional intelligence as equivalent to a speed boost. But I could be mis-remembering, and I certainly don’t remember how he translates intelligence into speed. If it’s just that each post-AGI factor of two in algorithm / silicon improvements is modeled as yielding twice as many AGIs per dollar, then I’d agree that might be an underestimate (because one IQ 300 AI might be worth a very large number of IQ 150 AIs, or whatever).
And (b) Davidson’s model says that while there is significant uncertainty over how fast takeoff will be if it happens in the 30′s or beyond, if it happens in the 20′s—i.e. if AGI is achieved in the 20′s—then it’s pretty much gotta be pretty fast. Again this can be seen by playing around with the widget on takeoffspeeds.com.
Yeah, even without consulting any models, I would expect that any scenario where we achieve AGI in the 20s is a very scary scenario for many reasons.
--I work at OpenAI and I see how the sausage gets made. Already things like Copilot and ChatGPT are (barely, but noticeably) accelerating AI R&D. I can see a clear path to automating more and more parts of the research process, and my estimate is that going 10x faster is something like a lower bound on what would happen if we had AGI (e.g. if AutoGPT worked well enough that we could basically use it as a virtual engineer + scientist) and my central estimate would be “it’s probably about 10x when we first reach AGI, but then it quickly becomes 100x, 1000x, etc. as qualitative improvements kick in.” There’s a related issue of how much ‘room to grow’ is there, i.e. how much low-hanging fruit is there to pick that would improve our algorithms, supposing we started from something like “It’s AutoGPT but good, as good as an OAI employee.” My answer is “Several OOMs at least.” So my nose-to-the-ground impression is if anything more bullish/fast-takeoff-y than Davidson’s model predicts.
What is your feeling regarding the importance of other inputs, i.e. training data and compute?
> I think of AI progress as being driven by a mix of cognitive input, training data, training FLOPs, and inference FLOPs. Davidson models the impact of cognitive input and inference FLOPs, but I didn’t see training data or training FLOPs taken into account. (“Doesn’t model data/environment inputs to AI development.”) My expectation that as RSI drives an increase in cognitive input, training data and training FLOPs will be a drag on progress. (Training FLOPs will be increasing, but not as quickly as cognitive inputs.)
Training FLOPs is literally the most important and prominent variable in the model, it’s the “AGI training requirements” variable. I agree that possible data bottlenecks are ignored; if it turns out that data is the bottleneck, timelines to AGI will be longer (and possibly takeoff slower? Depends on how the data problem eventually gets solved; takeoff could be faster in some scenarios...) Personally I don’t think the data bottleneck will slow us down much, but I could be wrong.
Ugh! This was a big miss on my part, thank you for calling it out. I skimmed too rapidly through the introduction. I saw references to biological anchors and I think I assumed that meant the model was starting from an estimate of FLOPS performed by the brain (i.e. during “inference”) and projecting when the combination of more-efficient algorithms and larger FLOPS budgets (due to more $$$ plus better hardware) would cross that threshold. But on re-read, of course you are correct and the model does focus on training FLOPS.
Compute is a very important input, important enough that it makes sense IMO to use it as the currency by which we measure the other inputs (this is basically what Bio Anchors + Tom’s model do).
There is a question of whether we’ll be bottlenecked on it in a way that throttles takeoff; it may not matter if you have AGI, if the only way to get AGI+ is to wait for another even bigger training run to complete.
I think in some sense we will indeed be bottlenecked by compute during takeoff… but that nevertheless we’ll be going something like 10x − 1000x faster than we currently go, because labor can substitute for compute to some extent (Not so much if it’s going at 1x speed; but very much if it’s going at 10x, 100x speed) and we’ll have a LOT of sped-up labor. Like, I do a little exercise where I think about what my coworkers are doing and I imagine what if they had access to AGI that was exactly as good as they are at everything, only 100x faster. I feel like they’d make progress on their current research agendas about 10x as fast. Could be a bit less, could be a lot more. Especially once we start getting qualitative intelligence improvements over typical OAI researchers, it could be a LOT more, because in scientific research there seems to be HUGE returns to quality, the smartest geniuses seem to accomplish more in a year than 90th-percentile scientists accomplish in their lifetime.
Training data also might be a bottleneck. However I think that by the time we are about to hit AGI and/or just having hit AGI, it won’t be. Smart humans are able to generate their own training data, so to speak; the entire field of mathematics is a bunch of people talking to each other and iteratively adding proofs to the blockchain so to speak and learning from each other’s proofs. That’s just an example, I think, of how around AGI we should basically have a self-sustaining civilization of AGIs talking to each other and evaluating each other’s outputs and learning from them. And this is just one of several ways in which training data bottleneck could be overcome. Another is better algorithms that are more data-efficient. The human brain seems to be more data-efficient than modern LLMs, for example. Maybe we can figure out how it manages that.
Thanks for the thoughtful and detailed comments! I’ll respond to a few points, otherwise in general I’m just nodding in agreement.
And it’s a good model, so we need to take this seriously. My only quibble would be to raise again the possibility (only a possibility!) that progress becomes more difficult around the point where we reach AGI, because that is the point where we’d be outgrowing human training data. I haven’t tried to play with the model and see whether that would significantly affect the post-AGI takeoff timeline.
I want to say that he models this by equating the contribution of one ASI to more than one AGI, i.e. treating additional intelligence as equivalent to a speed boost. But I could be mis-remembering, and I certainly don’t remember how he translates intelligence into speed. If it’s just that each post-AGI factor of two in algorithm / silicon improvements is modeled as yielding twice as many AGIs per dollar, then I’d agree that might be an underestimate (because one IQ 300 AI might be worth a very large number of IQ 150 AIs, or whatever).
Yeah, even without consulting any models, I would expect that any scenario where we achieve AGI in the 20s is a very scary scenario for many reasons.
What is your feeling regarding the importance of other inputs, i.e. training data and compute?
Ugh! This was a big miss on my part, thank you for calling it out. I skimmed too rapidly through the introduction. I saw references to biological anchors and I think I assumed that meant the model was starting from an estimate of FLOPS performed by the brain (i.e. during “inference”) and projecting when the combination of more-efficient algorithms and larger FLOPS budgets (due to more $$$ plus better hardware) would cross that threshold. But on re-read, of course you are correct and the model does focus on training FLOPS.
Sounds like we are basically on the same page!
Re: your question:
Compute is a very important input, important enough that it makes sense IMO to use it as the currency by which we measure the other inputs (this is basically what Bio Anchors + Tom’s model do).
There is a question of whether we’ll be bottlenecked on it in a way that throttles takeoff; it may not matter if you have AGI, if the only way to get AGI+ is to wait for another even bigger training run to complete.
I think in some sense we will indeed be bottlenecked by compute during takeoff… but that nevertheless we’ll be going something like 10x − 1000x faster than we currently go, because labor can substitute for compute to some extent (Not so much if it’s going at 1x speed; but very much if it’s going at 10x, 100x speed) and we’ll have a LOT of sped-up labor. Like, I do a little exercise where I think about what my coworkers are doing and I imagine what if they had access to AGI that was exactly as good as they are at everything, only 100x faster. I feel like they’d make progress on their current research agendas about 10x as fast. Could be a bit less, could be a lot more. Especially once we start getting qualitative intelligence improvements over typical OAI researchers, it could be a LOT more, because in scientific research there seems to be HUGE returns to quality, the smartest geniuses seem to accomplish more in a year than 90th-percentile scientists accomplish in their lifetime.
Training data also might be a bottleneck. However I think that by the time we are about to hit AGI and/or just having hit AGI, it won’t be. Smart humans are able to generate their own training data, so to speak; the entire field of mathematics is a bunch of people talking to each other and iteratively adding proofs to the blockchain so to speak and learning from each other’s proofs. That’s just an example, I think, of how around AGI we should basically have a self-sustaining civilization of AGIs talking to each other and evaluating each other’s outputs and learning from them. And this is just one of several ways in which training data bottleneck could be overcome. Another is better algorithms that are more data-efficient. The human brain seems to be more data-efficient than modern LLMs, for example. Maybe we can figure out how it manages that.