Are you saying that a human designed model is expected to achieve superhuman intelligence without being able to change its own basic structure and algorithms? I think it is a bold claim, and not one being made by Eliezer or anyone else in the highly-alarmed alignment community.
Are you saying that a human designed model is expected to achieve superhuman intelligence without being able to change its own basic structure and algorithms?
I think we have lots of evidence of AI systems achieving superhuman performance on cognitive tasks (like how GPT-4 is better than humans at next-token prediction, even tho it doesn’t seem to have all of human intelligence). I think it would not surprise me if, when we find a supervised learning task close enough to ‘intelligence’, a human-designed architecture trained on that task achieves superhuman intelligence.
Now, you might think that such a task (or combination of related tasks) doesn’t exist, but if so, it’d be interesting to hear why (and whether or not you were surprised by how many linguistic capabilities next-token-prediction unlocked).
[edit: also, to be clear, I think ‘transformative AI’ is different from superintelligence; that’s why I put the ‘non-super’ in front. It’s not obvious to me that ‘superintelligence’ is the right classification when I’m mostly concerned about problems of security / governance / etc.; superintelligence is sometimes used to describe systems smarter than any human and sometimes used to describe systems smarter than all humans put together, and while both of those spook me I’m not sure that’s even necessary; systems that are similarly smart to humans but substantially cheaper would themselves be a big deal.]
I think it is a bold claim, and not one being made by Eliezer or anyone else in the highly-alarmed alignment community.
Maybe my view is skewed, but I think I haven’t actually seen all that many “RSI → superintelligence” claims in the last ~7 years. I remember thinking at the time that AlphaGo was a big update about the simplicity of human intelligence; one might have thought that you needed emulators for lots of different cognitive modules to surpass human Go ability, but it turned out that the visual cortex plus some basic planning was enough. And so once the deep learning revolution got into full swing, it became plausible that we would reach AGI without RSI first (whereas beforehand I think the gap to AGI looked large enough that it was necessary to posit something like RSI to cross it).
[I think I said this privately and semi-privately, but don’t remember writing any blog posts about it because “maybe you can just slap stuff together and make an unaligned AGI!” was not the sort of blog post I wanted to write.]
I remember thinking at the time that AlphaGo was a big update about the simplicity of human intelligence
The thing that spooks me about this is not so much simplicity of the architecture but the fact that for example Leela Zero plays superhuman Go at only 50M parameters. Put that in context of modern LLMs with 300B parameters, the distinction is in training data. With sufficiently clever synthetic data generation (that might be just some RL setup), “non-giant training runs” might well suffice for general superintelligence, rendering any governance efforts that are not AGI-assisted futile.
“Current architecture” is a narrower category than “human-designed architecture.” You might have said in 2012 “current architectures can’t beat Go” but that wouldn’t have meant we needed RSI to beat Go[1]; we just need to design something better than what we had then.
I think it is likely that a human-designed architecture could be an extinction-level AI. I think it is not obvious whether the first extinction-level AI will be human-designed or AI-designed, as it’s both determined by technological uncertainties and political uncertainties.
I think it is plausible that if you did massive scaling on current architecture, you could get an extinction-level AI, but it is pretty unlikely that this will be the first or even seriously attempted. [Like, if we had a century of hardware progress and no software progress, could GPT-2123 be extinction-level? I’m not gonna rule it out, but I am going to consider extremely unlikely “a century of hardware progress with no software progress.”]
Does expert iteration count as recursive self-improvement? IMO “not really but it’s close.” Obviously it doesn’t let you overcome any architecture-imposed limitations, but it lets you iteratively improve based on your current capability level. And if you view some of our current training regimes as already RSI-like, then this conversation changes.
I’m quite confident that it’s possible, but not very confident that such a thing would be likely the first general superintelligence. I expect a period during which humans develop increasingly better models, until one or more of those can develop more generally better models by itself. The last capability isn’t necessary to AI-caused doom, but it’s certainly one that would greatly increase the risks.
One of my biggest contributors to “no AI doom” credence is that there are technical problems that prevent us from ever developing anything sufficiently smarter than ourselves to threaten our survival. I don’t think it’s certain that we can do that—but I think the odds are that we can, almost certain that we will if we can, and likely comparatively soon (decades rather than centuries or millennia).
Are you saying that a human designed model is expected to achieve superhuman intelligence without being able to change its own basic structure and algorithms? I think it is a bold claim, and not one being made by Eliezer or anyone else in the highly-alarmed alignment community.
I think we have lots of evidence of AI systems achieving superhuman performance on cognitive tasks (like how GPT-4 is better than humans at next-token prediction, even tho it doesn’t seem to have all of human intelligence). I think it would not surprise me if, when we find a supervised learning task close enough to ‘intelligence’, a human-designed architecture trained on that task achieves superhuman intelligence.
Now, you might think that such a task (or combination of related tasks) doesn’t exist, but if so, it’d be interesting to hear why (and whether or not you were surprised by how many linguistic capabilities next-token-prediction unlocked).
[edit: also, to be clear, I think ‘transformative AI’ is different from superintelligence; that’s why I put the ‘non-super’ in front. It’s not obvious to me that ‘superintelligence’ is the right classification when I’m mostly concerned about problems of security / governance / etc.; superintelligence is sometimes used to describe systems smarter than any human and sometimes used to describe systems smarter than all humans put together, and while both of those spook me I’m not sure that’s even necessary; systems that are similarly smart to humans but substantially cheaper would themselves be a big deal.]
Maybe my view is skewed, but I think I haven’t actually seen all that many “RSI → superintelligence” claims in the last ~7 years. I remember thinking at the time that AlphaGo was a big update about the simplicity of human intelligence; one might have thought that you needed emulators for lots of different cognitive modules to surpass human Go ability, but it turned out that the visual cortex plus some basic planning was enough. And so once the deep learning revolution got into full swing, it became plausible that we would reach AGI without RSI first (whereas beforehand I think the gap to AGI looked large enough that it was necessary to posit something like RSI to cross it).
[I think I said this privately and semi-privately, but don’t remember writing any blog posts about it because “maybe you can just slap stuff together and make an unaligned AGI!” was not the sort of blog post I wanted to write.]
The thing that spooks me about this is not so much simplicity of the architecture but the fact that for example Leela Zero plays superhuman Go at only 50M parameters. Put that in context of modern LLMs with 300B parameters, the distinction is in training data. With sufficiently clever synthetic data generation (that might be just some RL setup), “non-giant training runs” might well suffice for general superintelligence, rendering any governance efforts that are not AGI-assisted futile.
I think the claim is that one can get a transformative AI by pushing a current architecture, but probably not an extinction-level AI?
“Current architecture” is a narrower category than “human-designed architecture.” You might have said in 2012 “current architectures can’t beat Go” but that wouldn’t have meant we needed RSI to beat Go[1]; we just need to design something better than what we had then.
I think it is likely that a human-designed architecture could be an extinction-level AI. I think it is not obvious whether the first extinction-level AI will be human-designed or AI-designed, as it’s both determined by technological uncertainties and political uncertainties.
I think it is plausible that if you did massive scaling on current architecture, you could get an extinction-level AI, but it is pretty unlikely that this will be the first or even seriously attempted. [Like, if we had a century of hardware progress and no software progress, could GPT-2123 be extinction-level? I’m not gonna rule it out, but I am going to consider extremely unlikely “a century of hardware progress with no software progress.”]
Does expert iteration count as recursive self-improvement? IMO “not really but it’s close.” Obviously it doesn’t let you overcome any architecture-imposed limitations, but it lets you iteratively improve based on your current capability level. And if you view some of our current training regimes as already RSI-like, then this conversation changes.
I’m quite confident that it’s possible, but not very confident that such a thing would be likely the first general superintelligence. I expect a period during which humans develop increasingly better models, until one or more of those can develop more generally better models by itself. The last capability isn’t necessary to AI-caused doom, but it’s certainly one that would greatly increase the risks.
One of my biggest contributors to “no AI doom” credence is that there are technical problems that prevent us from ever developing anything sufficiently smarter than ourselves to threaten our survival. I don’t think it’s certain that we can do that—but I think the odds are that we can, almost certain that we will if we can, and likely comparatively soon (decades rather than centuries or millennia).