Note: I wrote my comment while reading as notes to see what I thought of your arguments while reading more than as a polished thing.
I think your calibration on the ‘slow scenario’ is off. What you claim is the slowest plausible one is fairly clearly the median scenario given that it is pretty much just following current trends, and slower than present trend is clearly plausible. Things already slowed way down, with advancements in very narrow areas being the only real change. There is a reason that OpenAI hasn’t dared even name something GPT 5, for instance. Even 03 isn’t really an improvement on general llm duties and that is the ‘exciting’ new thing, as you pretty much say.
Advancement is disappointingly slow in AI that I personally use (mostly image generation, where new larger models are often not really better overall for the past year or so, and newer ones mostly use llm style architectures), for instance, and it is plausible that there will be barely any movement in terms of clear quality improvement in general uses over the next couple years. And image generation should be easier to improve than general llms because it should be earlier in the diminishing returns of scale (as the scale is much smaller). Note that since most are also diffusion models, they are already using an image equivalent of the trick o1 and o3 introduced with what I would argue is effectively chain of thought. For some reason, all the advancements I hear about these days seem like uninspired copies of things that already happened in image generation.
The one exception is ‘agents’ but those show no signs of present day usefulness. Who knows how quickly such things will become useful, but historical trends on new tech, especially in AI, say ‘not soon’ for real use. A lot of people and companies are very interested in the idea for obvious reasons, but that doesn’t mean it will be fast. See also self-driving cars which has taken many times longer than expected, despite seeming like it is probably a success story in the making (for the distant future). In fact, self-driving cars are the real world equivalent of a narrow agent, and the insane difficulty they are having is strong evidence against agents being a transformatively useful thing soon.
I do think that AI as it currently is will have a transformative impact in the near term for certain activities (image generation for non-artists like me is already one of them), but I think the smartphone comparison is a good one; I still don’t bother to use a smartphone (though it has many significant uses). I would be surprised if it had as big an impact as the worldwide web has on a year for year basis counting from the beginning of the www (supposedly in 1989) for that and 2014 when transformers were invented (or even 2018 when GPT1 became a thing) for AI, for instance. I like the comparison to the web because I think that AI going especially well would be a change to our information capacities similar to an internet 3.0. (Assuming you count the web as 2.0).
As to the fast scenario, that does seem like the fastest scenario that isn’t completely ridiculous, but think that your belief in its probability is dramatically too high. I do agree that if you believe that self-play (in the AlphaGo sense) to generate good data is doable for poorly definable problems that would alleviate the lack of data issues we suffer in large parts of the space, but it is unlikely that would actually improve the quality of the data in the near term, and there are already a lot of data quality issues. I personally do not believe that o1 and o3 have at all ‘shown’ that synthetic data is a solved issue, and it wouldn’t be for quite a while if ever.
Note that the image generation models already have been using synthetic data by teachers for a while now with ‘SDXL Turbo’ and other later adversarial distillation schemes. This did manage a several times speed boost, but at a cost of some quality, as all such schemes do. Crucially, no one has managed to increase quality this way, because the ‘teacher’ provides a maximum quality level you can’t go beyond (except by pure luck).
Speculatively, you could perhaps improve quality by having a third model selecting the absolute best outputs of the teacher and only training on those until you have something better than the teacher, and then switching ‘better than the teacher’ into teacher and automatically start training a new student (or perhaps retraining the old teacher?). The problem is, how do you get that selection model that is actually better than the things you are trying to improve in its own self-play style learning rather than just getting them to fit the static model of a good output? Human data creation cannot be replaced in general without massive advancements in the field. You might be able to switch human data generation to just training the selection model though.
In some areas, you could perhaps train the AI directly on automatically generated data from sensors in the real world, but that seems like it would reduce the speed of progress to that of the real world unless you have that exponential increase in sensor data instead.
I do agree that in a fast scenario, it would clearly be algorithmic improvements rather than scale leading to it.
Also, o1 and o3 are only ‘better’ because of a willingness to use immensely more compute in the inference stage, and given that people already can’t afford them, that route seems like a it will be played out after not too many generations of scaling, especially since hardware is improving so slowly these days. Chain of thought should probably be largely replaced with something more like what image generation models currently use where each step iterates on the current results. These could be combined together of course.
Diffusion models make a latent picture of a bunch of different areas, and each of those influences each other area in the future, so in text generation you could analogously have a chain of thought that is used in its entirety to create a new chain of thought. For example, you could use a ten deep chain of thought being used to create another ten deep chain of thought nine times instead of a hundred different options (with the first ten being generated by just the input of course). If you’re crazy, it could literally be exponential, where you generate one for the first step, two in the second… 32 in the fifth, and so on.
“Identifying The Requirements for a Short Timeline” I think you are missing an interesting way to tell if AI is accelerating AI research. A lot of normal research is eventually integrated into the next generation of products. If AI really was accelerating the process, you would see the integrations happening much more quickly, with a shorter lag time between ‘new idea first published’ and ‘new idea integrated into a fully formed product’ that is actually good. A human might take several months to test the idea, but if an AI could do the research, it could also replicate the other research incredibly quickly, and see how it works when combined with the other research.
(Ran out of steam when my computer crashed during the above paragraph, though I don’t seem to have lost any of what I wrote since I do it in notepad.)
I would say the best way to tell you are in a shorter timeline is if it seems like gains from each advancement start broadening rather than narrowing. If each advancement applies narrowly, you need a truly absurd number of advancements, but if they are broad, far fewer.
Honestly, I see very little likelihood of what I consider AGI in the next couple decades at least (at least if you want it to have surpassed humanity), and if we don’t break out of the current paradigm, not for much, much longer than that, if ever. You do have some interesting points, and seem reasonable, but I really can’t agree with the idea that we are at all close to it. Also, your fast scenario seems more like it would be 20 years than 4. 4 years isn’t the ‘fast’ scenario, it is the ‘miracle’ scenario. The ‘slow scenario’ reads like ‘this might be the work of centuries, or maybe half of one if we are lucky’. The strong disagreement on how long these scenarios would take is because the point we are at now is far, far below what you seem to believe. We aren’t even vaguely close.
As far as your writing goes, I think it was fairly well written structurally and was somewhat interesting, and I even agree that large parts of the ‘fast’ scenario as you laid it out make sense, but since you are wrong about the amount of time to associate with the scenarios, the overall analysis is very far off. I did find it to be worth my time to read.
Note: I wrote my comment while reading as notes to see what I thought of your arguments while reading more than as a polished thing.
I think your calibration on the ‘slow scenario’ is off. What you claim is the slowest plausible one is fairly clearly the median scenario given that it is pretty much just following current trends, and slower than present trend is clearly plausible. Things already slowed way down, with advancements in very narrow areas being the only real change. There is a reason that OpenAI hasn’t dared even name something GPT 5, for instance. Even 03 isn’t really an improvement on general llm duties and that is the ‘exciting’ new thing, as you pretty much say.
Advancement is disappointingly slow in AI that I personally use (mostly image generation, where new larger models are often not really better overall for the past year or so, and newer ones mostly use llm style architectures), for instance, and it is plausible that there will be barely any movement in terms of clear quality improvement in general uses over the next couple years. And image generation should be easier to improve than general llms because it should be earlier in the diminishing returns of scale (as the scale is much smaller). Note that since most are also diffusion models, they are already using an image equivalent of the trick o1 and o3 introduced with what I would argue is effectively chain of thought. For some reason, all the advancements I hear about these days seem like uninspired copies of things that already happened in image generation.
The one exception is ‘agents’ but those show no signs of present day usefulness. Who knows how quickly such things will become useful, but historical trends on new tech, especially in AI, say ‘not soon’ for real use. A lot of people and companies are very interested in the idea for obvious reasons, but that doesn’t mean it will be fast. See also self-driving cars which has taken many times longer than expected, despite seeming like it is probably a success story in the making (for the distant future). In fact, self-driving cars are the real world equivalent of a narrow agent, and the insane difficulty they are having is strong evidence against agents being a transformatively useful thing soon.
I do think that AI as it currently is will have a transformative impact in the near term for certain activities (image generation for non-artists like me is already one of them), but I think the smartphone comparison is a good one; I still don’t bother to use a smartphone (though it has many significant uses). I would be surprised if it had as big an impact as the worldwide web has on a year for year basis counting from the beginning of the www (supposedly in 1989) for that and 2014 when transformers were invented (or even 2018 when GPT1 became a thing) for AI, for instance. I like the comparison to the web because I think that AI going especially well would be a change to our information capacities similar to an internet 3.0. (Assuming you count the web as 2.0).
As to the fast scenario, that does seem like the fastest scenario that isn’t completely ridiculous, but think that your belief in its probability is dramatically too high. I do agree that if you believe that self-play (in the AlphaGo sense) to generate good data is doable for poorly definable problems that would alleviate the lack of data issues we suffer in large parts of the space, but it is unlikely that would actually improve the quality of the data in the near term, and there are already a lot of data quality issues. I personally do not believe that o1 and o3 have at all ‘shown’ that synthetic data is a solved issue, and it wouldn’t be for quite a while if ever.
Note that the image generation models already have been using synthetic data by teachers for a while now with ‘SDXL Turbo’ and other later adversarial distillation schemes. This did manage a several times speed boost, but at a cost of some quality, as all such schemes do. Crucially, no one has managed to increase quality this way, because the ‘teacher’ provides a maximum quality level you can’t go beyond (except by pure luck).
Speculatively, you could perhaps improve quality by having a third model selecting the absolute best outputs of the teacher and only training on those until you have something better than the teacher, and then switching ‘better than the teacher’ into teacher and automatically start training a new student (or perhaps retraining the old teacher?). The problem is, how do you get that selection model that is actually better than the things you are trying to improve in its own self-play style learning rather than just getting them to fit the static model of a good output? Human data creation cannot be replaced in general without massive advancements in the field. You might be able to switch human data generation to just training the selection model though.
In some areas, you could perhaps train the AI directly on automatically generated data from sensors in the real world, but that seems like it would reduce the speed of progress to that of the real world unless you have that exponential increase in sensor data instead.
I do agree that in a fast scenario, it would clearly be algorithmic improvements rather than scale leading to it.
Also, o1 and o3 are only ‘better’ because of a willingness to use immensely more compute in the inference stage, and given that people already can’t afford them, that route seems like a it will be played out after not too many generations of scaling, especially since hardware is improving so slowly these days. Chain of thought should probably be largely replaced with something more like what image generation models currently use where each step iterates on the current results. These could be combined together of course.
Diffusion models make a latent picture of a bunch of different areas, and each of those influences each other area in the future, so in text generation you could analogously have a chain of thought that is used in its entirety to create a new chain of thought. For example, you could use a ten deep chain of thought being used to create another ten deep chain of thought nine times instead of a hundred different options (with the first ten being generated by just the input of course). If you’re crazy, it could literally be exponential, where you generate one for the first step, two in the second… 32 in the fifth, and so on.
“Identifying The Requirements for a Short Timeline”
I think you are missing an interesting way to tell if AI is accelerating AI research. A lot of normal research is eventually integrated into the next generation of products. If AI really was accelerating the process, you would see the integrations happening much more quickly, with a shorter lag time between ‘new idea first published’ and ‘new idea integrated into a fully formed product’ that is actually good. A human might take several months to test the idea, but if an AI could do the research, it could also replicate the other research incredibly quickly, and see how it works when combined with the other research.
(Ran out of steam when my computer crashed during the above paragraph, though I don’t seem to have lost any of what I wrote since I do it in notepad.)
I would say the best way to tell you are in a shorter timeline is if it seems like gains from each advancement start broadening rather than narrowing. If each advancement applies narrowly, you need a truly absurd number of advancements, but if they are broad, far fewer.
Honestly, I see very little likelihood of what I consider AGI in the next couple decades at least (at least if you want it to have surpassed humanity), and if we don’t break out of the current paradigm, not for much, much longer than that, if ever. You do have some interesting points, and seem reasonable, but I really can’t agree with the idea that we are at all close to it. Also, your fast scenario seems more like it would be 20 years than 4. 4 years isn’t the ‘fast’ scenario, it is the ‘miracle’ scenario. The ‘slow scenario’ reads like ‘this might be the work of centuries, or maybe half of one if we are lucky’. The strong disagreement on how long these scenarios would take is because the point we are at now is far, far below what you seem to believe. We aren’t even vaguely close.
As far as your writing goes, I think it was fairly well written structurally and was somewhat interesting, and I even agree that large parts of the ‘fast’ scenario as you laid it out make sense, but since you are wrong about the amount of time to associate with the scenarios, the overall analysis is very far off. I did find it to be worth my time to read.