I expect AI to look qualitatively like (i) “stack more layers,”… The improvements AI systems make to AI systems are more like normal AI R&D … There may be important innovations about how to apply very large models, but these innovations will have quantitatively modest effects (e.g. reducing the compute required for an impressive demonstration by 2x or maybe 10x rather than 100x)
Your view seems to implicitly assume that an AI with an understanding of NN research at the level necessary to contribute SotA results will not be able to leverage its similar level of understanding of neuroscience, GPU hardware/compilers, architecture search, and NN theory. If we instead assume the AI can bring together these domains, it seems to me that AI-driven research will look very different from business as usual. Instead we should expect advances like heavily optimized, partially binarized, spiking neural networks—all developed in one paper/library. In this scenario, it seems natural to assume something more like 100x efficiency progress.
Take-off debates seem to focus on whether we should expect AI to suddenly acquire far super-human capabilities in a specific domain i.e. locally. However this assumption seems unnecessary, instead fast takeoff may only require bringing together expert domain knowledge across multiple domains in a weakly super-human way. I see two possible cruxes here: (1) Will AI be able to globally interpolate across research fields? (2) Given the ability to globally interpolate, will fast take-off occur?
As weak empirical evidence in favor of (1), I see DALL-E 2′s ability to generate coherent images from a composition of two concepts as independent of the concept-distance (/cooccurrence frequency) of these concepts. E.g. “Ukiyo-e painting of a cat hacker wearing VR headsets” is no harder than “Ukiyo-e painting of a cat wearing a kimono” to DALL-E2. Granted, this is a anecdotal impression, but over sample size N~50 prompts.
Metaculus Questions There are a few relevant Metaculus questions to consider. First two don’t distinguish fast/radical AI-driven research progress from mundane AI-driven research progress. Nevertheless I would be interested to see both sides’ predictions.
I’m classifying “optimized, partially binarized, spiking neural networks” as architecture changes. I expect those to be gradually developed by humans and to represent modest and hard-won performance improvements. I expect them to eventually be developed faster by AI, but that before they are developed radically faster by AI they will be developed slightly faster. I don’t think interdisciplinarity is a silver bullet for making faster progress on deep learning.
I don’t think I understand the Metaculus questions precisely enough in order to predict on them; it seems like the action is in implicit quantitative distinctions:
In “Years between GWP Growth > 25% and AGI,” the majority of the AGI definition is carried by a 2-hour adversarial Turing test. But the difficulty of this test depends enormously on the judges and on the comparison human. If you use the strongest possible definition of Turing test then I’m expecting the answer to be negative (though mean is still large and positive because it is extraordinarily hard for it to go very negative). If you take the kind of Turing test I’d expect someone to use in an impressive demo, I expect it to be >5 years and this is mostly just a referendum on timelines.
For “AI capable of developing AI software,” it seems like all the action is in quantitative details of how good (/novel/etc.) the code is, I don’t think that a literal meeting of the task definition would have a large impact on the world.
For “transformers to accelerate DL progress,” I guess the standard is clear, but it seems like a weird operationalization—would the question already resolve positively if we were using LSTMs instead of transformers, because of papers like this one? If not, then it seems like the action comes down to unstated quantitative claims about how good the architectures are. I think that transformers will work better than RNNs for these applications, but that this won’t have a large effect on the overall rate of progress in deep learning by 2025.
before they are developed radically faster by AI they will be developed slightly faster.
I see a couple reasons why this wouldn’t be true:
First, consider LLM progress: overall perplexity increases relatively smoothly, particular capabilities emerge abruptly. As such the ability to construct a coherent Arxiv paper interpolating between two papers from different disciplines seems likely to emerge abruptly. I.e. currently asking a LLM to do this would generate a paper with zero useful ideas, and we have no reason to expect that the first GPT-N to be able to do this will only generate half, or one idea. It is just as likely to generate five+ very useful ideas.
There are a couple ways one might expect continuity via acceleration in AI-driven research in the run up to GPT-N (both of which I disagree with): Quoc Le-style AI-based NAS is likely to have continued apace in the run up to GPT-N, but for this to provide continuity you have to claim that the year GPT-N starts moving AI research forwards, AI NAS built up to just the rightrate of progress needed to allow GPT-N to fit the trend. Otherwise there might be a sequence of research-relevant, intermediate tasks which GPT-(N-i) will develop competency on—thereby accelerating research. I don’t see what those tasks would be[1].
I don’t think interdisciplinarity is a silver bullet for making faster progress on deep learning.
Second, I agree that interdisciplinarity, when building upon a track record of within-discipline progress, would be continuous. However, we should expect Arxiv and/or Github-trained LLMs to skip the mono-disciplinary research acceleration phase. In effect, I expect there to be no time in between when we can get useful answers to “Modify transformer code so that gradients are more stable during training”, and “Modify transformer code so that gradients are more stable during training, but change the transformer architecture to make use of spiking”.
If you disagree, how do you imagine continuous progress leading up to the above scenario? An important case is if Codex/Github Copilot improves continuously along the way taking a larger and larger role in ML repo authorship. If we assume that AGI arrives without depending onLLMs achieving understanding of recent Arxiv papers, then I agree that this scenario is much more likely to feature continuity in AI-driven AI research. I’m highly uncertain about how this assumption will play out. Off the top of my head, 40% of codex-driven research reaches AGI before Arxiv understanding.
Your view seems to implicitly assume that an AI with an understanding of NN research at the level necessary to contribute SotA results will not be able to leverage its similar level of understanding of neuroscience, GPU hardware/compilers, architecture search, and NN theory. If we instead assume the AI can bring together these domains, it seems to me that AI-driven research will look very different from business as usual. Instead we should expect advances like heavily optimized, partially binarized, spiking neural networks—all developed in one paper/library. In this scenario, it seems natural to assume something more like 100x efficiency progress.
Take-off debates seem to focus on whether we should expect AI to suddenly acquire far super-human capabilities in a specific domain i.e. locally. However this assumption seems unnecessary, instead fast takeoff may only require bringing together expert domain knowledge across multiple domains in a weakly super-human way. I see two possible cruxes here: (1) Will AI be able to globally interpolate across research fields? (2) Given the ability to globally interpolate, will fast take-off occur?
As weak empirical evidence in favor of (1), I see DALL-E 2′s ability to generate coherent images from a composition of two concepts as independent of the concept-distance (/cooccurrence frequency) of these concepts. E.g. “Ukiyo-e painting of a cat hacker wearing VR headsets” is no harder than “Ukiyo-e painting of a cat wearing a kimono” to DALL-E2. Granted, this is a anecdotal impression, but over sample size N~50 prompts.
Metaculus Questions There are a few relevant Metaculus questions to consider. First two don’t distinguish fast/radical AI-driven research progress from mundane AI-driven research progress. Nevertheless I would be interested to see both sides’ predictions.
Date AIs Capable of Developing AI Software | Metaculus
Transformers to accelerate DL progress | Metaculus
Years Between GWP Growth >25% and AGI | Metaculus
I’m classifying “optimized, partially binarized, spiking neural networks” as architecture changes. I expect those to be gradually developed by humans and to represent modest and hard-won performance improvements. I expect them to eventually be developed faster by AI, but that before they are developed radically faster by AI they will be developed slightly faster. I don’t think interdisciplinarity is a silver bullet for making faster progress on deep learning.
I don’t think I understand the Metaculus questions precisely enough in order to predict on them; it seems like the action is in implicit quantitative distinctions:
In “Years between GWP Growth > 25% and AGI,” the majority of the AGI definition is carried by a 2-hour adversarial Turing test. But the difficulty of this test depends enormously on the judges and on the comparison human. If you use the strongest possible definition of Turing test then I’m expecting the answer to be negative (though mean is still large and positive because it is extraordinarily hard for it to go very negative). If you take the kind of Turing test I’d expect someone to use in an impressive demo, I expect it to be >5 years and this is mostly just a referendum on timelines.
For “AI capable of developing AI software,” it seems like all the action is in quantitative details of how good (/novel/etc.) the code is, I don’t think that a literal meeting of the task definition would have a large impact on the world.
For “transformers to accelerate DL progress,” I guess the standard is clear, but it seems like a weird operationalization—would the question already resolve positively if we were using LSTMs instead of transformers, because of papers like this one? If not, then it seems like the action comes down to unstated quantitative claims about how good the architectures are. I think that transformers will work better than RNNs for these applications, but that this won’t have a large effect on the overall rate of progress in deep learning by 2025.
I see a couple reasons why this wouldn’t be true:
First, consider LLM progress: overall perplexity increases relatively smoothly, particular capabilities emerge abruptly. As such the ability to construct a coherent Arxiv paper interpolating between two papers from different disciplines seems likely to emerge abruptly. I.e. currently asking a LLM to do this would generate a paper with zero useful ideas, and we have no reason to expect that the first GPT-N to be able to do this will only generate half, or one idea. It is just as likely to generate five+ very useful ideas.
There are a couple ways one might expect continuity via acceleration in AI-driven research in the run up to GPT-N (both of which I disagree with): Quoc Le-style AI-based NAS is likely to have continued apace in the run up to GPT-N, but for this to provide continuity you have to claim that the year GPT-N starts moving AI research forwards, AI NAS built up to just the right rate of progress needed to allow GPT-N to fit the trend. Otherwise there might be a sequence of research-relevant, intermediate tasks which GPT-(N-i) will develop competency on—thereby accelerating research. I don’t see what those tasks would be[1].
Second, I agree that interdisciplinarity, when building upon a track record of within-discipline progress, would be continuous. However, we should expect Arxiv and/or Github-trained LLMs to skip the mono-disciplinary research acceleration phase. In effect, I expect there to be no time in between when we can get useful answers to “Modify transformer code so that gradients are more stable during training”, and “Modify transformer code so that gradients are more stable during training, but change the transformer architecture to make use of spiking”.
If you disagree, how do you imagine continuous progress leading up to the above scenario? An important case is if Codex/Github Copilot improves continuously along the way taking a larger and larger role in ML repo authorship. If we assume that AGI arrives without depending on LLMs achieving understanding of recent Arxiv papers, then I agree that this scenario is much more likely to feature continuity in AI-driven AI research. I’m highly uncertain about how this assumption will play out. Off the top of my head, 40% of codex-driven research reaches AGI before Arxiv understanding.
Perhaps better and better versions of Ought’s work. I doubt this work will scale to the levels of research utility relevant here.