I would guess that making progress on AGI would be slower. Here are two reasons I think are particularly important:
ImageNet accuracy is a metric that can in many ways be gamed; so you can make progress on ImageNet that is not transferable to more general image classification tasks. As an example of this, in this paper the authors conduct experiments which confirm that adversarially robust training on ImageNet degrades ImageNet test or validation accuracy, but robustly trained models generalize better to classification tasks on more diverse datasets when fine-tuned on them.
This indicates that a lot of the progress on ImageNet is actually “overlearning”: it doesn’t generalize in a useful way to tasks we actually care about in the real world. There’s good reason to believe that part of overlearning would show up as algorithmic progress in our framework, as people can adapt their models better to ImageNet even without extra compute or data.
Researchers have stronger feedback loops on ImageNet: they can try something directly on the benchmark they care about, see the results and immediately update on their findings. This allows them to iterate much faster and iteration is a crucial component of progress in any engineering problem. In contrast, our iteration loops towards AGI operate at considerably lower frequencies. This point is also made by Ajeya Cotra in her biological anchors report, and it’s why she chooses to cut the software progress speed estimates from Hernandez and Brown (2020) in half when computing her AGI timelines.
Such an adjustment seems warranted here, but I think the way Cotra does it is not very principled and certainly doesn’t do justice to the importance of the question of software progress.
Overall I agree with your point that training AGI is a different kind of task. I would be more optimistic about progress in a very broad domain such as computer vision or natural language processing translating to progress towards AGI, but I suspect the conversion will still be significantly less favorable than any explicit performance metric would suggest. I would not recommend using point estimates of software progress on the order of a doubling of compute efficiency per year for forecasting timelines.
I would guess that making progress on AGI would be slower. Here are two reasons I think are particularly important:
ImageNet accuracy is a metric that can in many ways be gamed; so you can make progress on ImageNet that is not transferable to more general image classification tasks. As an example of this, in this paper the authors conduct experiments which confirm that adversarially robust training on ImageNet degrades ImageNet test or validation accuracy, but robustly trained models generalize better to classification tasks on more diverse datasets when fine-tuned on them.
This indicates that a lot of the progress on ImageNet is actually “overlearning”: it doesn’t generalize in a useful way to tasks we actually care about in the real world. There’s good reason to believe that part of overlearning would show up as algorithmic progress in our framework, as people can adapt their models better to ImageNet even without extra compute or data.
Researchers have stronger feedback loops on ImageNet: they can try something directly on the benchmark they care about, see the results and immediately update on their findings. This allows them to iterate much faster and iteration is a crucial component of progress in any engineering problem. In contrast, our iteration loops towards AGI operate at considerably lower frequencies. This point is also made by Ajeya Cotra in her biological anchors report, and it’s why she chooses to cut the software progress speed estimates from Hernandez and Brown (2020) in half when computing her AGI timelines.
Such an adjustment seems warranted here, but I think the way Cotra does it is not very principled and certainly doesn’t do justice to the importance of the question of software progress.
Overall I agree with your point that training AGI is a different kind of task. I would be more optimistic about progress in a very broad domain such as computer vision or natural language processing translating to progress towards AGI, but I suspect the conversion will still be significantly less favorable than any explicit performance metric would suggest. I would not recommend using point estimates of software progress on the order of a doubling of compute efficiency per year for forecasting timelines.