The weighty conclusion of the “recursive self-improvement” meme is not “expect seed AI”. The weighty conclusion is “sufficiently smart AI will rapidly improve to heights that leave humans in the dust”.
Note that this conclusion is still, to the best of my knowledge, completely true, and recursive self-improvement is a correct argument for it.
This whole discussion seems relevant to me because it feels like it keeps coming up when you and Eliezer talk about why prosaic AI alignment doesn’t help, sometimes explicitly (“Even if this helped with capabilities produced by SGD, why would it help in the regime that actually matters?”) and often because it just seems to be a really strong background assumption for you that leads to you having a very different concrete picture of what is going to happen.
It doesn’t seem like recursive self-improvement is a cheap lower bound argument, it seems like you really think that what I think of as the “normal, boring” world just isn’t going to happen. So I’m generally interested in talking about that and get clear about what you think is going on here, and hopefully get some predictions on the record.
This also gives me the sense that you feel quite strongly about your view of recursive self-improvement. If you had a 50% chance on “something like boring business as usual with SGD driving crucial performance improvements at the crucial time” then your dismissal of prosaic AI alignment seems strange to me.
(ETA: there’s actually a lot going on here, I’d guess this is like 1/4th of the total disagreement.)
Robin Hanson was claiming things along the lines of ‘The power is in the culture; superintelligences wouldn’t be able to outstrip the rest of humanity.’
Worth noting that Robin seems to strongly agree that “recursive self-improvement” is going to happen, it’s just that he has a set of empirical views for which that name sounds silly and it won’t be as local or fast as Eliezer thinks.
Relatedly, Eliezer saying “Robin was wrong for doubting RSI; if other crazy stuff will happen before RSI then he’s just even more wrong” seems wrong. In Age of Em I think Robin speculates that within a few years of the first brain emulations, there will be more alien AI systems which are able to double their own productivity within a few weeks (and then a few weeks later it will be even crazier)! That sure sounds like he’s on board with the part of RSI that is obvious, and what he’s saying is precisely that other crazy stuff will happen first, essentially that we will use computers to replace the hardware of brains before we replace the software. (The book came out in 2016 but I think Robin has had the basic outline of this view since 2012 or earlier.)
The big update over the last decade has been that humans might be able to fumble their way to AGI that can do crazy stuff before it does much self-improvement.
This feels to me like it’s still missing a key part of the disagreement, at least with people like me. As best I can tell/guess, this is also an important piece of the disagreement with Robin Hanson and with some of the OpenAI or OpenPhil people who don’t like your discussion of recursive self-improvement.
Here’s how the situation seems to me:
“Making AI better” is one of the activities humans are engaged in.
If AI were about as good as things at humans, then AI would be superhuman at “making AI better” at roughly the same time it was superhuman at other tasks.
In fact there will be a lot of dispersion, and prima facie we’d guess that there are a lot of tasks (say 15-60% of them as a a made up 50% confidence interval) where AI is superhuman before AI R&D.
What’s more, even within R&D we expect some complementarity where parts of the project get automated while humans still add value in other places, leading to more continuous (but still fairly rapid, i.e. over years rather than decades) acceleration.
That said, at the point when AI is capable of doing a lot of crazy stuff in other domains, “AI R&D” is a crazy important part of the economy, and so this will be a big but not overwhelmingly dominant part of what AI is applied to (and relatedly, a big but not overwhelmingly dominant part of where smart people entering the workforce go to work, and where VCs invest, and so on).
The improvements AI systems make to AI systems are more like normal AI R&D, and can be shared across firms in the same way that modern AI research can be.
As far as I can make out from Eliezer and your comments, you think that instead the action is crossing a criticality threshold of “k>1,” which suggests a perspective more like:
AI is able to do some things and not others.
The things AI can do, it typically does much better/faster/cheaper than humans.
Early AI systems can improve some but not all parts of their own design. This leads to rapid initial progress, but diminishing returns (basically they are free-riding on parts of the design already done by humans).
Eventually AI is able to improve enough stuff that there are increasing rather than diminishing returns to scale even within the subset of improvements that the AI is able to make.
Past this point progress is accelerating even without further human effort (which also leads to expanding the set of improvements at which the AI is superhuman). So from here the timescale for takeoff is very short relative to the timescale of human-driven R&D progress.
This is reasonably likely to happen from a single innovation that pushes you over a k>1 threshold.
This dynamic is a central part of the alignment and policy problem faced by humans right now who are having this discussion. I.e. prior to the time when this dynamic happens most research is still being done by humans, the world is relatively similar to the world of today, etc.
The improvements made by AI systems during this process are very unlikely modern R&D, and so can’t be shared between AI labs in the same way that e.g. architectural innovations for neural networks or new training strategies can be.
I feel like the first picture is looking better and better with each passing year. Every step towards boring automation of R&D (e.g. by code models that can write mediocre code and thereby improve the efficiency of normal software engineers and ML researchers) suggests that AI will be doing recursive self-improvement around the same time it is doing other normal tasks, with timescales and economic dynamics closer to those envisioned by more boring people.
On what I’m calling the boring picture, “k>1” isn’t a key threshold. Instead we have k>1 and increasing returns to scale well before takeoff. But the rate of AI progress is slow-but-accelerating relative to human abilities, and therefore we can forecast takeoff speed by looking at the rate of AI progress when driven by human R&D.
You frame this as an update about “fumbling your way to an AGI that can do crazy stuff before it does much self-improvement,” but that feels to me like it’s not engaging with the basic argument at issue here: why would you think the AI is likely to be so good at “making further AI progress” relative to human researchers and engineers? Why should we be at all surprised by what we’ve seen over recent years, where software-engineering AI seems like it behaves similarly to AI in other domains (and looks poised to cross human level around broadly the same time rather than much earlier)? Why should this require fumbling rather than being the default state of affairs (as presumably imagined by someone more skeptical of “recursive self-improvement”).
My impression of the MIRI view here mostly comes from older writing by Eliezer, where he often talks about how an AI would be much better at programming because humans lack what you might call a “codic cortex” and so are very bad programmers relative to their overall level of intelligence. This view does not seem to me like it matches the modern world very well—actual AI systems that write code (and which appear on track to accelerate R&D) are learning to program using similar styles and tools to humans, rather than any kind of new perceptual modality.
(As an aside, in most other ways I like the intelligence explosion microeconomics writeup. It just seems like there’s some essential perspective that isn’t really argued for but suffuses the document, most clear in its language of “spark a FOOM” and criticality thresholds and so on.)
Also: It’s important to ask proponents of a theory what they predict will happen, before crowing about how their theory made a misprediction. You’re always welcome to ask for my predictions in advance.
I’d be interested to get predictions from you and Eliezer about what you think is going to happen in relevant domains over the next 5 years. If we aren’t able to get those predictions, then it seems reasonable to just do an update based on what we would have predicted if we took your view more seriously (since that’s pretty relevant if we are now deciding whether to take your views seriously).
If you wanted to state any relevant predictions I’d be happy to comment on those. But I understand how it’s annoying to leave the ball in your court, so here are some topics where I’m happy to give quantitative predictions if you or Eliezer have a conflicting intuition:
I expect successful AI-automating-AI to look more like AI systems doing programming or ML research, or other tasks that humans do. I think they are likely to do this in a relatively “dumb” way (by trying lots of things, taking small steps, etc.) compared to humans, but that the activity will look basically similar and will lean heavily on oversight and imitation of humans rather than being learned de novo (performing large searches is the main way in which it will look particularly unhuman, but probably the individual steps will still look like human intuitive guesses rather than something alien). Concretely, we could measure this by either performance on benchmarks or economic value, and we could distinguish the kinds of systems I imagine from the kind you imagine by e.g. you telling a story about fast takeoff and then talking about some systems similar to those involved in your takeoff story.
I expect that the usefulness and impressiveness of AI systems will generally improve continuously. I expect that in typical economically important cases we will have a bunch of people working on relevant problems, and so will have trend lines to extrapolate, and that those will be relatively smooth rather than exhibiting odd behavior near criticality thresholds.
At the point when the availability of AI is doubling the pace of AI R&D, I expect that technically similar AI systems will be producing at least hundreds of billions of dollars a year of value in other domains, and my median is more like $1T/year. I expect that we can continue to meaningfully measure things like “the pace of AI R&D” by looking at how quickly AI systems improve at standard benchmarks.
I expect the most powerful AI systems (e.g. those responsible for impressive demonstrations of AI-accelerated R&D progress) will be built in large labs, with compute budgets at least in the hundreds of millions of dollars and most likely larger. There may be important innovations about how to apply very large models, but these innovations will have quantitatively modest effects (e.g. reducing the compute required for an impressive demonstration by 2x or maybe 10x rather than 100x) and so a significant fraction of the total value added / profit will flow to firms that train large models or who build large computing clusters to run them.
I expect AI to look qualitatively like (i) “stack more layers,” (ii) loss functions and datasets that capture cognitive abilities we are interested in with less noise, (iii) architecture and optimization improvements that yield continuous progress in performance, (iv) cleverer ways to form large teams of trained models that result in continuous progress. This isn’t a very confident prediction but it feels like I’ve got to have higher probability on it than you all, perhaps I’d give 50% that in retrospect someone I think is reasonable would say “yup definitely a significant majority of the progress was in categories (i)-(iv) in the sense that I understood them when that comment was written in 2022.”
It may be that we agree about all of these predictions. I think that’s fine, and the main upshot is that you shouldn’t cite anything over the next 5 years as evidence for your views relative to mine. Or it may be that we disagree but it’s not worth your time to really engage here, which I also think is quite reasonable given how much stuff there is to do (although I hope then you will have more sympathy for people who misunderstood your position in the future).
Perhaps more importantly, if you didn’t disagree with me about any 5 year predictions then I feel like there’s something about your position I don’t yet understand or think is an error:
Why isn’t aligning future AI systems similar to aligning existing AI systems? It feels to me like it should be about the (i) aligning the systems doing the R&D, (ii) aligning the kinds of systems they are building. Is that wrong? Or perhaps: why do you think they will be building such different systems from “stack more layers”? (I certainly agree they will be eventually, but the question seems to just be whether there is a significant probability of doing stack more layers or something similar for a significant subjective time.)
Why does continuous improvement in the pace of R&D, driven by AI systems that are contributing to the same R&D process as humans, lead to a high probability of incredibly fast takeoff? It seems to me like there is a natural way to get estimated takeoff speeds from growth models + trend extrapolation, which puts a reasonable probability on “fast takeoff” according to the “1 year doubling before 4 year doubling” view (and therefore I’m very sympathetic to people disagreeing with that view on those grounds) but puts a very low probability on takeoff over weeks or by a small team.
it seems like you really think that what I think of as the “normal, boring” world just isn’t going to happen.
I agree. I don’t think that RSI is a crux for me on that front, FYI.
It sounds from skimming your comment (I’m travelling at the moment, so I won’t reply in much depth, sorry) like there is in fact a misunderstanding in here somewhere. Like:
If you had a 50% chance on “something like boring business as usual with SGD driving crucial performance improvements at the crucial time” then your dismissal of prosaic AI alignment seems strange to me.
I do not have that view, and my alternative view is not particularly founded on RSI.
Trotting out some good old fashioned evolutionary analogies, my models say that something boring with natural selection pushed humans past thresholds that allowed some other process (that was neither natural selection nor RSI) to drive a bunch of performance improvements, and I expect that shocks like that can happen again.
RSI increases the upside from such a shock. But also RSI is easier to get started in a clean mind than in a huge opaque model, so \shrug maybe it won’t be relevant until after the acute risk period ends.
That sure sounds like he’s on board with the part of RSI that is obvious, and what he’s saying is precisely that other crazy stuff will happen first, essentially that we will use computers to replace the hardware of brains before we replace the software.
Which crazy stuff happens first seems pretty important to me, in adjudicating between hypotheses. So far, the type of crazy that we’ve been seeing undermines my understanding of Robin’s hypotheses. I’m open to the argument that I simply don’t understand what his hypotheses predict.
As far as I can make out from Eliezer and your comments, you think that instead the action is crossing a criticality threshold of “k>1,”
Speaking for myself, it looks like the action is in crossing the minimum of [some threshold humans crossed and chimps didn’t] and [the threshold for recursive self-improvement of the relevant mind] (and perhaps-more-realistically [the other thresholds we cannot forsee], given that this looks like thresholdy terrain), where the RSI threshould might in principle be the lowest one on a particularly clean mind design, but it’s not looking like we’re angling towards particularly clean minds.
(Also, to be clear, my median guess is that some self-modification probably does wind up being part of the mix. But, like, if we suppose it doesn’t, or that it’s not playing a key role, then I’m like “huh, I guess the mind was janky enough that the returns on that weren’t worth the costs \shrug”.)
My guess is that past-Eliezer and/or past-I were conflating RSI thresholds with other critical thresholds (perhaps by not super explicitly tracking the difference) in a way that bred this particuar confusion. Oops, sorry.
I’d be interested to get predictions from you and Eliezer about what you think is going to happen in relevant domains over the next 5 years.
For what it’s worth, the sort of predictions I was reverse-soliciting were predictions of the form “we just trained the system X on task Y which looks alignment-related to us, and are happy to share details of the setup, how do you think it performed?”. I find it much easier to generate predictions of that form, than to generate open-ended predictions about what the field will be able to pull off in the near-term (where my models aren’t particularly sharply concentrated (which means that anyone who wants to sharply concentrate probability has an opportunity to take Bayes points off of me! (though ofc I’d appreciate the option to say either “oh, well sure, that’s obvious” or “that’s not obvious to me!” in advance of hearing the results, if you think that someone’s narrow prediction is particularly novel with respect to me))).
I don’t know why the domain looks thresholdy to you. Do you think some existing phenomena in ML look thresholdy in practice? Do you see a general argument for thresholds even if the k>1 criticality threshold argument doesn’t pan out? Is the whole thing coming down to generalization from chimps → humans?
Some central reasons the terrain looks thresholdy to me:
Science often comes with “click” moments, where many things slide into place and start making sense.
As we enter the ‘AI can do true science’ regime, it becomes important that AI can unlock new technologies (both cognitive/AI technologies, and other impactful technologies), new scientific disciplines and subdisciplines, new methodologies and ways of doing intellectual inquiry, etc.
‘The ability to invent new technologies’ and ‘the ability to launch into new scientific fields/subfields’, including ones that may not even be on our radar today (whether or not they’re ‘hard’ in an absolute sense — sometimes AI will just think differently from us), is inherently thresholdy, because ‘starting or creating an entirely new thing’ is a 0-to-1 change, more so than ‘incrementally improving on existing technologies and subdisciplines’ tends to be.
Many of these can also use one discovery/innovation to reach other discoveries/innovations, increasing the thresholdiness. (An obvious example of this is RSI, but AI can also just unlock a scientific subdiscipline that chains into a bunch of new discoveries, leads to more new subdisciplines, etc.)
Empirically, humans did not need to evolve separate specialized-to-the-field modules in order to be able to do biotechnology as well as astrophysics as well as materials science as well as economics as well as topology. Some combination of ‘human-specific machinery’ and ‘machinery that precedes humans’ sufficed to do all the sciences (that we know of), even though those fields didn’t exist in the environment our brain was being built in. Thus, general intelligence is a thing; you can figure out how to do AI in such a way that once you can do one science, you have the machinery in hand to do all the other sciences.
Empirically, all of these fields sprang into existence almost simultaneously for humans, within the space of a few decades or centuries. So in addition to the general points above about “clicks are a thing” and “starting new fields and inventing new technologies is threshold-y”, it’s also the case that AGI is likely to unlock all of the sciences simultaneously in much the same way humans did.
That one big “click” moment, that unlocks all the other click moments and new sciences/technologies and sciences-and-technologies-that-chain-off-of-those-sciences-and-technologies, implies that many different thresholds are likely to get reached at the same time.
Which increases the probability that even if one specific threshold wouldn’t have been crazily high-impact on its own, the aggregate effect of many of those thresholds at once does end up crazily high-impact.
you can figure out how to do AI in such a way that once you can do one science, you have the machinery in hand to do all the other sciences
And indeed, I would be extremely surprised if we find a way to do AI that only lets you build general-purpose par-human astrophysics AI, but doesn’t also let you build general-purpose par-human biochemistry AI.
(There may be an AI technique like that in principle, but I expect it to be a very weird technique you’d have to steer toward on purpose; general techniques are a much easier way to build science AI. So I don’t think that the first general-purpose astrophysics AI system we build will be like that, in the worlds where we build general-purpose astrophysics AI systems.)
Which crazy stuff happens first seems pretty important to me, in adjudicating between hypotheses. So far, the type of crazy that we’ve been seeing undermines my understanding of Robin’s hypotheses. I’m open to the argument that I simply don’t understand what his hypotheses predict.
FWIW, I think everyone agrees strongly with “which crazy stuff happens first seems pretty important”. Paul was saying that Robin never disagreed with eventual RSI, but just argued that other crazy stuff would happen first. So Robin shouldn’t be criticized on the grounds of disagreeing about the importance of RSI, unless you want to claim that RSI is the first crazy thing that happens (which you don’t seem to believe particularly strongly). But it’s totally fair game to e.g. criticize the prediction that ems will happen before de-novo AI (if you think that now looks very unlikely).
Relatedly, Eliezer saying “Robin was wrong for doubting RSI; if other crazy stuff will happen before RSI then he’s just even more wrong” seems wrong.
Eliezer’s argument for localized Foom (and for localized RSI in particular) wasn’t ‘no cool tech will happen prior to AGI; therefore AGI will produce a localized Foom’. If it were, then it would indeed be bizarre to cite an example of pre-AGI cool tech (AlphaGo Zero) and say ‘aha, evidence for localized Foom’.
Rather, Eliezer’s argument for localized Foom and localized RSI was:
It’s not hard to improve on human brains.
You can improve on human brains with relatively simple algorithms; you don’t need a huge library of crucial proprietary components that are scattered all over the economy and need to be carefully accumulated and assembled.
The important dimensions for improvement aren’t just ‘how fast or high-fidelity is the system’s process of learning human culture?’.
General intelligence isn’t just a bunch of heterogeneous domain-specific narrow modules glued together.
Insofar as general intelligence decomposes into parts/modules, these modules work a lot better as one brain than as separate heterogeneous AIs scattered around the world. (See Permitted Possibilities, & Locality.)
I.e.:
Localized Foom isn’t blocked by humans being near a cognitive ceiling in general.
Localized Foom isn’t blocked by “there’s no algorithmic progress on AI” or “there’s no simple, generally applicable algorithmic progress on AI”.
Localized Foom isn’t blocked by “humans are only amazing because we can accumulate culture; and humans already cross that threshold, so it won’t be that big of a deal if something else crosses the exact same threshold; and since AI will be dependent on painstakingly accumulated human culture in the same way we are, it won’t be able to suddenly pull ahead”.
Localized Foom isn’t blocked by “getting an AI that’s par-human at one narrow domain or task won’t mean you have an AI that’s par-human at anything else”.
Localized Foom isn’t blocked by “there’s no special advantage to doing the cognition inside a brain, vs. doing it in distributed fashion across many different AIs in the world that work very differently”.
AlphaGo and its successors were indeed evidence for these claims, to the extent you can get evidence for them by looking at performance on board games.
Insofar as Robin thinks ems come before AI, impressive AI progress is also evidence for Eliezer’s view over Robin’s; but this wasn’t the focus of the Foom debate or of Eliezer’s follow-up. This would be much more of a crux if Robin endorsed ‘AGI quickly gets you localized Foom, but AGI doesn’t happen until after ems’; but I don’t think he endorses a story like that. (Though he does endorse ‘AGI doesn’t happen until after ems’, to the extent ‘AGI’ makes sense as a category in Robin’s ontology.)
AlphaGo and its successors are also evidence that progress often surprises people and comes in spurts: there weren’t a ton of people loudly saying ‘if a major AGI group tries hard in the next 1-4 years, we’ll immediately blast past the human range of Go ability even though AI has currently never beaten a Go professional’ one, two, or four years before AlphaGo. But this is more directly relevant to the Paul-Eliezer disagreement than the Robin-Eliezer one, and it’s weaker evidence insofar as Go isn’t economically important.
[...] When I remarked upon how it sure looked to me like humans had an architectural improvement over chimpanzees that counted for a lot, Hanson replied that this seemed to him like a one-time gain from allowing the cultural accumulation of knowledge.
I emphasize how all the mighty human edifice of Go knowledge, the joseki and tactics developed over centuries of play, the experts teaching children from an early age, was entirely discarded by AlphaGo Zero with a subsequent performance improvement. These mighty edifices of human knowledge, as I understand the Hansonian thesis, are supposed to be the bulwark against rapid gains in AI capability across multiple domains at once. I said, “Human intelligence is crap and our accumulated skills are crap,” and this appears to have been borne out.
Similarly, single research labs like DeepMind are not supposed to pull far ahead of the general ecology, because adapting AI to any particular domain is supposed to require lots of components developed all over the place by a market ecology that makes those components available to other companies. AlphaGo Zero is much simpler than that. To the extent that nobody else can run out and build AlphaGo Zero, it’s either because Google has Tensor Processing Units that aren’t generally available, or because DeepMind has a silo of expertise for being able to actually make use of existing ideas like ResNets, or both.
Sheer speed of capability gain should also be highlighted here. Most of my argument for FOOM in the Yudkowsky-Hanson debate was about self-improvement and what happens when an optimization loop is folded in on itself. Though it wasn’t necessary to my argument, the fact that Go play went from “nobody has come close to winning against a professional” to “so strongly superhuman they’re not really bothering any more” over two years just because that’s what happens when you improve and simplify the architecture, says you don’t even need self-improvement to get things that look like FOOM.
Yes, Go is a closed system allowing for self-play. It still took humans centuries to learn how to play it. Perhaps the new Hansonian bulwark against rapid capability gain can be that the environment has lots of empirical bits that are supposed to be very hard to learn, even in the limit of AI thoughts fast enough to blow past centuries of human-style learning in 3 days; and that humans have learned these vital bits over centuries of cultural accumulation of knowledge, even though we know that humans take centuries to do 3 days of AI learning when humans have all the empirical bits they need; and that AIs cannot absorb this knowledge very quickly using “architecture”, even though humans learn it from each other using architecture. If so, then let’s write down this new world-wrecking assumption (that is, the world ends if the assumption is false) and be on the lookout for further evidence that this assumption might perhaps be wrong.
AlphaGo clearly isn’t a general AI. There’s obviously stuff humans do that make us much more general than AlphaGo, and AlphaGo obviously doesn’t do that. However, if even with the human special sauce we’re to expect AGI capabilities to be slow, domain-specific, and requiring feed-in from a big market ecology, then the situation we see without human-equivalent generality special sauce should not look like this.
To put it another way, I put a lot of emphasis in my debate on recursive self-improvement and the remarkable jump in generality across the change from primate intelligence to human intelligence. It doesn’t mean we can’t get info about speed of capability gains without self-improvement. It doesn’t mean we can’t get info about the importance and generality of algorithms without the general intelligence trick. The debate can start to settle for fast capability gains before we even get to what I saw as the good parts; I wouldn’t have predicted AlphaGo and lost money betting against the speed of its capability gains, because reality held a more extreme position than I did on the Yudkowsky-Hanson spectrum.
I think it’s good to go back to this specific quote and think about how it compares to AGI progress.
A difference I think Paul has mentioned before is that Go was not a competitive industry and competitive industries will have smaller capability jumps. Assuming this is true, I also wonder whether the secret sauce for AGI will be within the main competitive target of the AGI industry.
The thing the industry is calling AGI and targeting may end up being a specific style of shallow deployable intelligence when “real” AGI is a different style of “deeper” intelligence (with, say, less economic value at partial stages and therefore relatively unpursued). This would allow a huge jump like AlphaGo in AGI even in a competitive industry targeting AGI.
Both possibilities seem plausible to me and I’d like to hear arguments either way.
I expect AI to look qualitatively like (i) “stack more layers,”… The improvements AI systems make to AI systems are more like normal AI R&D … There may be important innovations about how to apply very large models, but these innovations will have quantitatively modest effects (e.g. reducing the compute required for an impressive demonstration by 2x or maybe 10x rather than 100x)
Your view seems to implicitly assume that an AI with an understanding of NN research at the level necessary to contribute SotA results will not be able to leverage its similar level of understanding of neuroscience, GPU hardware/compilers, architecture search, and NN theory. If we instead assume the AI can bring together these domains, it seems to me that AI-driven research will look very different from business as usual. Instead we should expect advances like heavily optimized, partially binarized, spiking neural networks—all developed in one paper/library. In this scenario, it seems natural to assume something more like 100x efficiency progress.
Take-off debates seem to focus on whether we should expect AI to suddenly acquire far super-human capabilities in a specific domain i.e. locally. However this assumption seems unnecessary, instead fast takeoff may only require bringing together expert domain knowledge across multiple domains in a weakly super-human way. I see two possible cruxes here: (1) Will AI be able to globally interpolate across research fields? (2) Given the ability to globally interpolate, will fast take-off occur?
As weak empirical evidence in favor of (1), I see DALL-E 2′s ability to generate coherent images from a composition of two concepts as independent of the concept-distance (/cooccurrence frequency) of these concepts. E.g. “Ukiyo-e painting of a cat hacker wearing VR headsets” is no harder than “Ukiyo-e painting of a cat wearing a kimono” to DALL-E2. Granted, this is a anecdotal impression, but over sample size N~50 prompts.
Metaculus Questions There are a few relevant Metaculus questions to consider. First two don’t distinguish fast/radical AI-driven research progress from mundane AI-driven research progress. Nevertheless I would be interested to see both sides’ predictions.
I’m classifying “optimized, partially binarized, spiking neural networks” as architecture changes. I expect those to be gradually developed by humans and to represent modest and hard-won performance improvements. I expect them to eventually be developed faster by AI, but that before they are developed radically faster by AI they will be developed slightly faster. I don’t think interdisciplinarity is a silver bullet for making faster progress on deep learning.
I don’t think I understand the Metaculus questions precisely enough in order to predict on them; it seems like the action is in implicit quantitative distinctions:
In “Years between GWP Growth > 25% and AGI,” the majority of the AGI definition is carried by a 2-hour adversarial Turing test. But the difficulty of this test depends enormously on the judges and on the comparison human. If you use the strongest possible definition of Turing test then I’m expecting the answer to be negative (though mean is still large and positive because it is extraordinarily hard for it to go very negative). If you take the kind of Turing test I’d expect someone to use in an impressive demo, I expect it to be >5 years and this is mostly just a referendum on timelines.
For “AI capable of developing AI software,” it seems like all the action is in quantitative details of how good (/novel/etc.) the code is, I don’t think that a literal meeting of the task definition would have a large impact on the world.
For “transformers to accelerate DL progress,” I guess the standard is clear, but it seems like a weird operationalization—would the question already resolve positively if we were using LSTMs instead of transformers, because of papers like this one? If not, then it seems like the action comes down to unstated quantitative claims about how good the architectures are. I think that transformers will work better than RNNs for these applications, but that this won’t have a large effect on the overall rate of progress in deep learning by 2025.
before they are developed radically faster by AI they will be developed slightly faster.
I see a couple reasons why this wouldn’t be true:
First, consider LLM progress: overall perplexity increases relatively smoothly, particular capabilities emerge abruptly. As such the ability to construct a coherent Arxiv paper interpolating between two papers from different disciplines seems likely to emerge abruptly. I.e. currently asking a LLM to do this would generate a paper with zero useful ideas, and we have no reason to expect that the first GPT-N to be able to do this will only generate half, or one idea. It is just as likely to generate five+ very useful ideas.
There are a couple ways one might expect continuity via acceleration in AI-driven research in the run up to GPT-N (both of which I disagree with): Quoc Le-style AI-based NAS is likely to have continued apace in the run up to GPT-N, but for this to provide continuity you have to claim that the year GPT-N starts moving AI research forwards, AI NAS built up to just the rightrate of progress needed to allow GPT-N to fit the trend. Otherwise there might be a sequence of research-relevant, intermediate tasks which GPT-(N-i) will develop competency on—thereby accelerating research. I don’t see what those tasks would be[1].
I don’t think interdisciplinarity is a silver bullet for making faster progress on deep learning.
Second, I agree that interdisciplinarity, when building upon a track record of within-discipline progress, would be continuous. However, we should expect Arxiv and/or Github-trained LLMs to skip the mono-disciplinary research acceleration phase. In effect, I expect there to be no time in between when we can get useful answers to “Modify transformer code so that gradients are more stable during training”, and “Modify transformer code so that gradients are more stable during training, but change the transformer architecture to make use of spiking”.
If you disagree, how do you imagine continuous progress leading up to the above scenario? An important case is if Codex/Github Copilot improves continuously along the way taking a larger and larger role in ML repo authorship. If we assume that AGI arrives without depending onLLMs achieving understanding of recent Arxiv papers, then I agree that this scenario is much more likely to feature continuity in AI-driven AI research. I’m highly uncertain about how this assumption will play out. Off the top of my head, 40% of codex-driven research reaches AGI before Arxiv understanding.
This is an expression from Eliezer’s Intelligence Explosion Microeconomics. In this context, we imagine an AI making some improvement to its own operation, and then k is the number of new improvements which it is able to find and implement. If k>1, then each improvement allows the AI to make more new improvements, and we imagine the quality of the system growing exponentially.
It’s intended as a simplified model, but I think it simplifies too far to be meaningful in practice. Even very weak systems can be built with “k > 1,” the interesting question will always be about timescales—how long does it take a system to make what kind of improvement?
This whole discussion seems relevant to me because it feels like it keeps coming up when you and Eliezer talk about why prosaic AI alignment doesn’t help, sometimes explicitly (“Even if this helped with capabilities produced by SGD, why would it help in the regime that actually matters?”) and often because it just seems to be a really strong background assumption for you that leads to you having a very different concrete picture of what is going to happen.
It doesn’t seem like recursive self-improvement is a cheap lower bound argument, it seems like you really think that what I think of as the “normal, boring” world just isn’t going to happen. So I’m generally interested in talking about that and get clear about what you think is going on here, and hopefully get some predictions on the record.
This also gives me the sense that you feel quite strongly about your view of recursive self-improvement. If you had a 50% chance on “something like boring business as usual with SGD driving crucial performance improvements at the crucial time” then your dismissal of prosaic AI alignment seems strange to me.
(ETA: there’s actually a lot going on here, I’d guess this is like 1/4th of the total disagreement.)
Worth noting that Robin seems to strongly agree that “recursive self-improvement” is going to happen, it’s just that he has a set of empirical views for which that name sounds silly and it won’t be as local or fast as Eliezer thinks.
Relatedly, Eliezer saying “Robin was wrong for doubting RSI; if other crazy stuff will happen before RSI then he’s just even more wrong” seems wrong. In Age of Em I think Robin speculates that within a few years of the first brain emulations, there will be more alien AI systems which are able to double their own productivity within a few weeks (and then a few weeks later it will be even crazier)! That sure sounds like he’s on board with the part of RSI that is obvious, and what he’s saying is precisely that other crazy stuff will happen first, essentially that we will use computers to replace the hardware of brains before we replace the software. (The book came out in 2016 but I think Robin has had the basic outline of this view since 2012 or earlier.)
This feels to me like it’s still missing a key part of the disagreement, at least with people like me. As best I can tell/guess, this is also an important piece of the disagreement with Robin Hanson and with some of the OpenAI or OpenPhil people who don’t like your discussion of recursive self-improvement.
Here’s how the situation seems to me:
“Making AI better” is one of the activities humans are engaged in.
If AI were about as good as things at humans, then AI would be superhuman at “making AI better” at roughly the same time it was superhuman at other tasks.
In fact there will be a lot of dispersion, and prima facie we’d guess that there are a lot of tasks (say 15-60% of them as a a made up 50% confidence interval) where AI is superhuman before AI R&D.
What’s more, even within R&D we expect some complementarity where parts of the project get automated while humans still add value in other places, leading to more continuous (but still fairly rapid, i.e. over years rather than decades) acceleration.
That said, at the point when AI is capable of doing a lot of crazy stuff in other domains, “AI R&D” is a crazy important part of the economy, and so this will be a big but not overwhelmingly dominant part of what AI is applied to (and relatedly, a big but not overwhelmingly dominant part of where smart people entering the workforce go to work, and where VCs invest, and so on).
The improvements AI systems make to AI systems are more like normal AI R&D, and can be shared across firms in the same way that modern AI research can be.
As far as I can make out from Eliezer and your comments, you think that instead the action is crossing a criticality threshold of “k>1,” which suggests a perspective more like:
AI is able to do some things and not others.
The things AI can do, it typically does much better/faster/cheaper than humans.
Early AI systems can improve some but not all parts of their own design. This leads to rapid initial progress, but diminishing returns (basically they are free-riding on parts of the design already done by humans).
Eventually AI is able to improve enough stuff that there are increasing rather than diminishing returns to scale even within the subset of improvements that the AI is able to make.
Past this point progress is accelerating even without further human effort (which also leads to expanding the set of improvements at which the AI is superhuman). So from here the timescale for takeoff is very short relative to the timescale of human-driven R&D progress.
This is reasonably likely to happen from a single innovation that pushes you over a k>1 threshold.
This dynamic is a central part of the alignment and policy problem faced by humans right now who are having this discussion. I.e. prior to the time when this dynamic happens most research is still being done by humans, the world is relatively similar to the world of today, etc.
The improvements made by AI systems during this process are very unlikely modern R&D, and so can’t be shared between AI labs in the same way that e.g. architectural innovations for neural networks or new training strategies can be.
I feel like the first picture is looking better and better with each passing year. Every step towards boring automation of R&D (e.g. by code models that can write mediocre code and thereby improve the efficiency of normal software engineers and ML researchers) suggests that AI will be doing recursive self-improvement around the same time it is doing other normal tasks, with timescales and economic dynamics closer to those envisioned by more boring people.
On what I’m calling the boring picture, “k>1” isn’t a key threshold. Instead we have k>1 and increasing returns to scale well before takeoff. But the rate of AI progress is slow-but-accelerating relative to human abilities, and therefore we can forecast takeoff speed by looking at the rate of AI progress when driven by human R&D.
You frame this as an update about “fumbling your way to an AGI that can do crazy stuff before it does much self-improvement,” but that feels to me like it’s not engaging with the basic argument at issue here: why would you think the AI is likely to be so good at “making further AI progress” relative to human researchers and engineers? Why should we be at all surprised by what we’ve seen over recent years, where software-engineering AI seems like it behaves similarly to AI in other domains (and looks poised to cross human level around broadly the same time rather than much earlier)? Why should this require fumbling rather than being the default state of affairs (as presumably imagined by someone more skeptical of “recursive self-improvement”).
My impression of the MIRI view here mostly comes from older writing by Eliezer, where he often talks about how an AI would be much better at programming because humans lack what you might call a “codic cortex” and so are very bad programmers relative to their overall level of intelligence. This view does not seem to me like it matches the modern world very well—actual AI systems that write code (and which appear on track to accelerate R&D) are learning to program using similar styles and tools to humans, rather than any kind of new perceptual modality.
(As an aside, in most other ways I like the intelligence explosion microeconomics writeup. It just seems like there’s some essential perspective that isn’t really argued for but suffuses the document, most clear in its language of “spark a FOOM” and criticality thresholds and so on.)
I’d be interested to get predictions from you and Eliezer about what you think is going to happen in relevant domains over the next 5 years. If we aren’t able to get those predictions, then it seems reasonable to just do an update based on what we would have predicted if we took your view more seriously (since that’s pretty relevant if we are now deciding whether to take your views seriously).
If you wanted to state any relevant predictions I’d be happy to comment on those. But I understand how it’s annoying to leave the ball in your court, so here are some topics where I’m happy to give quantitative predictions if you or Eliezer have a conflicting intuition:
I expect successful AI-automating-AI to look more like AI systems doing programming or ML research, or other tasks that humans do. I think they are likely to do this in a relatively “dumb” way (by trying lots of things, taking small steps, etc.) compared to humans, but that the activity will look basically similar and will lean heavily on oversight and imitation of humans rather than being learned de novo (performing large searches is the main way in which it will look particularly unhuman, but probably the individual steps will still look like human intuitive guesses rather than something alien). Concretely, we could measure this by either performance on benchmarks or economic value, and we could distinguish the kinds of systems I imagine from the kind you imagine by e.g. you telling a story about fast takeoff and then talking about some systems similar to those involved in your takeoff story.
I expect that the usefulness and impressiveness of AI systems will generally improve continuously. I expect that in typical economically important cases we will have a bunch of people working on relevant problems, and so will have trend lines to extrapolate, and that those will be relatively smooth rather than exhibiting odd behavior near criticality thresholds.
At the point when the availability of AI is doubling the pace of AI R&D, I expect that technically similar AI systems will be producing at least hundreds of billions of dollars a year of value in other domains, and my median is more like $1T/year. I expect that we can continue to meaningfully measure things like “the pace of AI R&D” by looking at how quickly AI systems improve at standard benchmarks.
I expect the most powerful AI systems (e.g. those responsible for impressive demonstrations of AI-accelerated R&D progress) will be built in large labs, with compute budgets at least in the hundreds of millions of dollars and most likely larger. There may be important innovations about how to apply very large models, but these innovations will have quantitatively modest effects (e.g. reducing the compute required for an impressive demonstration by 2x or maybe 10x rather than 100x) and so a significant fraction of the total value added / profit will flow to firms that train large models or who build large computing clusters to run them.
I expect AI to look qualitatively like (i) “stack more layers,” (ii) loss functions and datasets that capture cognitive abilities we are interested in with less noise, (iii) architecture and optimization improvements that yield continuous progress in performance, (iv) cleverer ways to form large teams of trained models that result in continuous progress. This isn’t a very confident prediction but it feels like I’ve got to have higher probability on it than you all, perhaps I’d give 50% that in retrospect someone I think is reasonable would say “yup definitely a significant majority of the progress was in categories (i)-(iv) in the sense that I understood them when that comment was written in 2022.”
It may be that we agree about all of these predictions. I think that’s fine, and the main upshot is that you shouldn’t cite anything over the next 5 years as evidence for your views relative to mine. Or it may be that we disagree but it’s not worth your time to really engage here, which I also think is quite reasonable given how much stuff there is to do (although I hope then you will have more sympathy for people who misunderstood your position in the future).
Perhaps more importantly, if you didn’t disagree with me about any 5 year predictions then I feel like there’s something about your position I don’t yet understand or think is an error:
Why isn’t aligning future AI systems similar to aligning existing AI systems? It feels to me like it should be about the (i) aligning the systems doing the R&D, (ii) aligning the kinds of systems they are building. Is that wrong? Or perhaps: why do you think they will be building such different systems from “stack more layers”? (I certainly agree they will be eventually, but the question seems to just be whether there is a significant probability of doing stack more layers or something similar for a significant subjective time.)
Why does continuous improvement in the pace of R&D, driven by AI systems that are contributing to the same R&D process as humans, lead to a high probability of incredibly fast takeoff? It seems to me like there is a natural way to get estimated takeoff speeds from growth models + trend extrapolation, which puts a reasonable probability on “fast takeoff” according to the “1 year doubling before 4 year doubling” view (and therefore I’m very sympathetic to people disagreeing with that view on those grounds) but puts a very low probability on takeoff over weeks or by a small team.
I agree. I don’t think that RSI is a crux for me on that front, FYI.
It sounds from skimming your comment (I’m travelling at the moment, so I won’t reply in much depth, sorry) like there is in fact a misunderstanding in here somewhere. Like:
I do not have that view, and my alternative view is not particularly founded on RSI.
Trotting out some good old fashioned evolutionary analogies, my models say that something boring with natural selection pushed humans past thresholds that allowed some other process (that was neither natural selection nor RSI) to drive a bunch of performance improvements, and I expect that shocks like that can happen again.
RSI increases the upside from such a shock. But also RSI is easier to get started in a clean mind than in a huge opaque model, so \shrug maybe it won’t be relevant until after the acute risk period ends.
Which crazy stuff happens first seems pretty important to me, in adjudicating between hypotheses. So far, the type of crazy that we’ve been seeing undermines my understanding of Robin’s hypotheses. I’m open to the argument that I simply don’t understand what his hypotheses predict.
Speaking for myself, it looks like the action is in crossing the minimum of [some threshold humans crossed and chimps didn’t] and [the threshold for recursive self-improvement of the relevant mind] (and perhaps-more-realistically [the other thresholds we cannot forsee], given that this looks like thresholdy terrain), where the RSI threshould might in principle be the lowest one on a particularly clean mind design, but it’s not looking like we’re angling towards particularly clean minds.
(Also, to be clear, my median guess is that some self-modification probably does wind up being part of the mix. But, like, if we suppose it doesn’t, or that it’s not playing a key role, then I’m like “huh, I guess the mind was janky enough that the returns on that weren’t worth the costs \shrug”.)
My guess is that past-Eliezer and/or past-I were conflating RSI thresholds with other critical thresholds (perhaps by not super explicitly tracking the difference) in a way that bred this particuar confusion. Oops, sorry.
For what it’s worth, the sort of predictions I was reverse-soliciting were predictions of the form “we just trained the system X on task Y which looks alignment-related to us, and are happy to share details of the setup, how do you think it performed?”. I find it much easier to generate predictions of that form, than to generate open-ended predictions about what the field will be able to pull off in the near-term (where my models aren’t particularly sharply concentrated (which means that anyone who wants to sharply concentrate probability has an opportunity to take Bayes points off of me! (though ofc I’d appreciate the option to say either “oh, well sure, that’s obvious” or “that’s not obvious to me!” in advance of hearing the results, if you think that someone’s narrow prediction is particularly novel with respect to me))).
I don’t know why the domain looks thresholdy to you. Do you think some existing phenomena in ML look thresholdy in practice? Do you see a general argument for thresholds even if the k>1 criticality threshold argument doesn’t pan out? Is the whole thing coming down to generalization from chimps → humans?
Some central reasons the terrain looks thresholdy to me:
Science often comes with “click” moments, where many things slide into place and start making sense.
As we enter the ‘AI can do true science’ regime, it becomes important that AI can unlock new technologies (both cognitive/AI technologies, and other impactful technologies), new scientific disciplines and subdisciplines, new methodologies and ways of doing intellectual inquiry, etc.
‘The ability to invent new technologies’ and ‘the ability to launch into new scientific fields/subfields’, including ones that may not even be on our radar today (whether or not they’re ‘hard’ in an absolute sense — sometimes AI will just think differently from us), is inherently thresholdy, because ‘starting or creating an entirely new thing’ is a 0-to-1 change, more so than ‘incrementally improving on existing technologies and subdisciplines’ tends to be.
Many of these can also use one discovery/innovation to reach other discoveries/innovations, increasing the thresholdiness. (An obvious example of this is RSI, but AI can also just unlock a scientific subdiscipline that chains into a bunch of new discoveries, leads to more new subdisciplines, etc.)
Empirically, humans did not need to evolve separate specialized-to-the-field modules in order to be able to do biotechnology as well as astrophysics as well as materials science as well as economics as well as topology. Some combination of ‘human-specific machinery’ and ‘machinery that precedes humans’ sufficed to do all the sciences (that we know of), even though those fields didn’t exist in the environment our brain was being built in. Thus, general intelligence is a thing; you can figure out how to do AI in such a way that once you can do one science, you have the machinery in hand to do all the other sciences.
Empirically, all of these fields sprang into existence almost simultaneously for humans, within the space of a few decades or centuries. So in addition to the general points above about “clicks are a thing” and “starting new fields and inventing new technologies is threshold-y”, it’s also the case that AGI is likely to unlock all of the sciences simultaneously in much the same way humans did.
That one big “click” moment, that unlocks all the other click moments and new sciences/technologies and sciences-and-technologies-that-chain-off-of-those-sciences-and-technologies, implies that many different thresholds are likely to get reached at the same time.
Which increases the probability that even if one specific threshold wouldn’t have been crazily high-impact on its own, the aggregate effect of many of those thresholds at once does end up crazily high-impact.
And indeed, I would be extremely surprised if we find a way to do AI that only lets you build general-purpose par-human astrophysics AI, but doesn’t also let you build general-purpose par-human biochemistry AI.
(There may be an AI technique like that in principle, but I expect it to be a very weird technique you’d have to steer toward on purpose; general techniques are a much easier way to build science AI. So I don’t think that the first general-purpose astrophysics AI system we build will be like that, in the worlds where we build general-purpose astrophysics AI systems.)
Do you think that things won’t look thresholdy even in a capability regime in which a large actor can work out how melt all the GPUs?
FWIW, I think everyone agrees strongly with “which crazy stuff happens first seems pretty important”. Paul was saying that Robin never disagreed with eventual RSI, but just argued that other crazy stuff would happen first. So Robin shouldn’t be criticized on the grounds of disagreeing about the importance of RSI, unless you want to claim that RSI is the first crazy thing that happens (which you don’t seem to believe particularly strongly). But it’s totally fair game to e.g. criticize the prediction that ems will happen before de-novo AI (if you think that now looks very unlikely).
Eliezer’s argument for localized Foom (and for localized RSI in particular) wasn’t ‘no cool tech will happen prior to AGI; therefore AGI will produce a localized Foom’. If it were, then it would indeed be bizarre to cite an example of pre-AGI cool tech (AlphaGo Zero) and say ‘aha, evidence for localized Foom’.
Rather, Eliezer’s argument for localized Foom and localized RSI was:
It’s not hard to improve on human brains.
You can improve on human brains with relatively simple algorithms; you don’t need a huge library of crucial proprietary components that are scattered all over the economy and need to be carefully accumulated and assembled.
The important dimensions for improvement aren’t just ‘how fast or high-fidelity is the system’s process of learning human culture?’.
General intelligence isn’t just a bunch of heterogeneous domain-specific narrow modules glued together.
Insofar as general intelligence decomposes into parts/modules, these modules work a lot better as one brain than as separate heterogeneous AIs scattered around the world. (See Permitted Possibilities, & Locality.)
I.e.:
Localized Foom isn’t blocked by humans being near a cognitive ceiling in general.
Localized Foom isn’t blocked by “there’s no algorithmic progress on AI” or “there’s no simple, generally applicable algorithmic progress on AI”.
Localized Foom isn’t blocked by “humans are only amazing because we can accumulate culture; and humans already cross that threshold, so it won’t be that big of a deal if something else crosses the exact same threshold; and since AI will be dependent on painstakingly accumulated human culture in the same way we are, it won’t be able to suddenly pull ahead”.
Localized Foom isn’t blocked by “getting an AI that’s par-human at one narrow domain or task won’t mean you have an AI that’s par-human at anything else”.
Localized Foom isn’t blocked by “there’s no special advantage to doing the cognition inside a brain, vs. doing it in distributed fashion across many different AIs in the world that work very differently”.
AlphaGo and its successors were indeed evidence for these claims, to the extent you can get evidence for them by looking at performance on board games.
Insofar as Robin thinks ems come before AI, impressive AI progress is also evidence for Eliezer’s view over Robin’s; but this wasn’t the focus of the Foom debate or of Eliezer’s follow-up. This would be much more of a crux if Robin endorsed ‘AGI quickly gets you localized Foom, but AGI doesn’t happen until after ems’; but I don’t think he endorses a story like that. (Though he does endorse ‘AGI doesn’t happen until after ems’, to the extent ‘AGI’ makes sense as a category in Robin’s ontology.)
AlphaGo and its successors are also evidence that progress often surprises people and comes in spurts: there weren’t a ton of people loudly saying ‘if a major AGI group tries hard in the next 1-4 years, we’ll immediately blast past the human range of Go ability even though AI has currently never beaten a Go professional’ one, two, or four years before AlphaGo. But this is more directly relevant to the Paul-Eliezer disagreement than the Robin-Eliezer one, and it’s weaker evidence insofar as Go isn’t economically important.
Quoting Eliezer’s AlphaGo Zero and the Foom Debate:
I think it’s good to go back to this specific quote and think about how it compares to AGI progress.
A difference I think Paul has mentioned before is that Go was not a competitive industry and competitive industries will have smaller capability jumps. Assuming this is true, I also wonder whether the secret sauce for AGI will be within the main competitive target of the AGI industry.
The thing the industry is calling AGI and targeting may end up being a specific style of shallow deployable intelligence when “real” AGI is a different style of “deeper” intelligence (with, say, less economic value at partial stages and therefore relatively unpursued). This would allow a huge jump like AlphaGo in AGI even in a competitive industry targeting AGI.
Both possibilities seem plausible to me and I’d like to hear arguments either way.
Your view seems to implicitly assume that an AI with an understanding of NN research at the level necessary to contribute SotA results will not be able to leverage its similar level of understanding of neuroscience, GPU hardware/compilers, architecture search, and NN theory. If we instead assume the AI can bring together these domains, it seems to me that AI-driven research will look very different from business as usual. Instead we should expect advances like heavily optimized, partially binarized, spiking neural networks—all developed in one paper/library. In this scenario, it seems natural to assume something more like 100x efficiency progress.
Take-off debates seem to focus on whether we should expect AI to suddenly acquire far super-human capabilities in a specific domain i.e. locally. However this assumption seems unnecessary, instead fast takeoff may only require bringing together expert domain knowledge across multiple domains in a weakly super-human way. I see two possible cruxes here: (1) Will AI be able to globally interpolate across research fields? (2) Given the ability to globally interpolate, will fast take-off occur?
As weak empirical evidence in favor of (1), I see DALL-E 2′s ability to generate coherent images from a composition of two concepts as independent of the concept-distance (/cooccurrence frequency) of these concepts. E.g. “Ukiyo-e painting of a cat hacker wearing VR headsets” is no harder than “Ukiyo-e painting of a cat wearing a kimono” to DALL-E2. Granted, this is a anecdotal impression, but over sample size N~50 prompts.
Metaculus Questions There are a few relevant Metaculus questions to consider. First two don’t distinguish fast/radical AI-driven research progress from mundane AI-driven research progress. Nevertheless I would be interested to see both sides’ predictions.
Date AIs Capable of Developing AI Software | Metaculus
Transformers to accelerate DL progress | Metaculus
Years Between GWP Growth >25% and AGI | Metaculus
I’m classifying “optimized, partially binarized, spiking neural networks” as architecture changes. I expect those to be gradually developed by humans and to represent modest and hard-won performance improvements. I expect them to eventually be developed faster by AI, but that before they are developed radically faster by AI they will be developed slightly faster. I don’t think interdisciplinarity is a silver bullet for making faster progress on deep learning.
I don’t think I understand the Metaculus questions precisely enough in order to predict on them; it seems like the action is in implicit quantitative distinctions:
In “Years between GWP Growth > 25% and AGI,” the majority of the AGI definition is carried by a 2-hour adversarial Turing test. But the difficulty of this test depends enormously on the judges and on the comparison human. If you use the strongest possible definition of Turing test then I’m expecting the answer to be negative (though mean is still large and positive because it is extraordinarily hard for it to go very negative). If you take the kind of Turing test I’d expect someone to use in an impressive demo, I expect it to be >5 years and this is mostly just a referendum on timelines.
For “AI capable of developing AI software,” it seems like all the action is in quantitative details of how good (/novel/etc.) the code is, I don’t think that a literal meeting of the task definition would have a large impact on the world.
For “transformers to accelerate DL progress,” I guess the standard is clear, but it seems like a weird operationalization—would the question already resolve positively if we were using LSTMs instead of transformers, because of papers like this one? If not, then it seems like the action comes down to unstated quantitative claims about how good the architectures are. I think that transformers will work better than RNNs for these applications, but that this won’t have a large effect on the overall rate of progress in deep learning by 2025.
I see a couple reasons why this wouldn’t be true:
First, consider LLM progress: overall perplexity increases relatively smoothly, particular capabilities emerge abruptly. As such the ability to construct a coherent Arxiv paper interpolating between two papers from different disciplines seems likely to emerge abruptly. I.e. currently asking a LLM to do this would generate a paper with zero useful ideas, and we have no reason to expect that the first GPT-N to be able to do this will only generate half, or one idea. It is just as likely to generate five+ very useful ideas.
There are a couple ways one might expect continuity via acceleration in AI-driven research in the run up to GPT-N (both of which I disagree with): Quoc Le-style AI-based NAS is likely to have continued apace in the run up to GPT-N, but for this to provide continuity you have to claim that the year GPT-N starts moving AI research forwards, AI NAS built up to just the right rate of progress needed to allow GPT-N to fit the trend. Otherwise there might be a sequence of research-relevant, intermediate tasks which GPT-(N-i) will develop competency on—thereby accelerating research. I don’t see what those tasks would be[1].
Second, I agree that interdisciplinarity, when building upon a track record of within-discipline progress, would be continuous. However, we should expect Arxiv and/or Github-trained LLMs to skip the mono-disciplinary research acceleration phase. In effect, I expect there to be no time in between when we can get useful answers to “Modify transformer code so that gradients are more stable during training”, and “Modify transformer code so that gradients are more stable during training, but change the transformer architecture to make use of spiking”.
If you disagree, how do you imagine continuous progress leading up to the above scenario? An important case is if Codex/Github Copilot improves continuously along the way taking a larger and larger role in ML repo authorship. If we assume that AGI arrives without depending on LLMs achieving understanding of recent Arxiv papers, then I agree that this scenario is much more likely to feature continuity in AI-driven AI research. I’m highly uncertain about how this assumption will play out. Off the top of my head, 40% of codex-driven research reaches AGI before Arxiv understanding.
Perhaps better and better versions of Ought’s work. I doubt this work will scale to the levels of research utility relevant here.
Can someone clarify what “k>1” refers to in this context? Like, what does k denote?
This is an expression from Eliezer’s Intelligence Explosion Microeconomics. In this context, we imagine an AI making some improvement to its own operation, and then k is the number of new improvements which it is able to find and implement. If k>1, then each improvement allows the AI to make more new improvements, and we imagine the quality of the system growing exponentially.
It’s intended as a simplified model, but I think it simplifies too far to be meaningful in practice. Even very weak systems can be built with “k > 1,” the interesting question will always be about timescales—how long does it take a system to make what kind of improvement?