the very first AGI we train explodes like a nuclear bomb and unilaterally sets the atmosphere on fire, killing everyone instantly
To understand whether this is the kind of thing that could be true or false, it seems like you should say what “the very first AGI” means. What makes a system an AGI?
I feel like this view is gradually looking less plausible as we build increasingly intelligent and general systems, and they persistently don’t explode (though only gradually because it’s unclear what the view means).
It looks to me like what is going to happen is that AI systems will gradually get better at R&D. They can help with R&D by some unknown combination of complementing and replacing human labor, either way leading to acceleration. The key question is the timeline for that acceleration—how long between “AIs are good enough at R&D that their help increases the rate of progress (on average, over relevant domains) by more than doubling the speed of human labor would” and “dyson sphere.” I feel that plausible views range from 2 months to 20 years, based on quantitative questions of returns curves and complementarity between AI and humans and the importance of capital. I’d overall guess 2-8 years.
Yeah, I think there’s a sharp-ish discontinuity at the point where we get to AGI. “General intelligence” is, to wit, general — it implements some cognition that can efficiently derive novel heuristics for solving any problem/navigating arbitrary novel problem domains. And a system that can’t do that is, well, not an AGI.
Conceptually, the distinction between an AGI and a pre-AGI system feels similar to the distinction between a system that’s Turing-complete and one that isn’t:
Any Turing-complete system implements a set of rules that suffices to represent any mathematical structure/run any program. A system that’s just “slightly below” Turing-completeness, however, is dramatically more limited.
Similarly, an AGI has a complete set of some cognitive features that make it truly universal — features it can use to bootstrap any other capability it needs, from scratch. By contrast, even a slightly “pre-AGI” system would be qualitatively inferior, not simply quantitatively so.
There’s still some fuzziness around the edges, like whether any significantly useful R&D capabilities only happen at post-AGI cognition, or to what extent being an AGI is the sufficient and not only the necessary condition for an omnicidal explosion.
But I do think there’s a meaningful sense in which AGI-ness is a binary, not a continuum. (I’m also hopeful regarding nailing all of this down mathematically, instead of just vaguely gesturing at it like this.)
It seems like a human is universal in the sense that they can think about new problem solving strategies, evaluate them, adopt successful ones, etc. Most of those new problem-solving strategies are developed by long trial and error and cultural imitation of successful strategies.
If a language model could do the same thing with chain of thought, would you say that it is an AGI? So would the existence of such systems, without an immediate intelligence explosion, falsify your view?
If such a system seemed intuitively universal but wasn’t exploding, what kind of observation would tell you that it isn’t universal after all, and therefore salvage your claim? Or maybe to put this more sharply: how did you decide that text-davinci-003, prompted to pursue an open-ended goal and given the ability to instantiate and delegate to new copies of itself, isn’t an AGI?
It seems to me that you will probably have an “AGI” in the sense you are gesturing at here well before the beginning of explosive R&D growth. I don’t really see the argument for why an intelligence explosion would follow quickly from that point. (Indeed, my view is that you could probably build an AGI in this sense out of text-davinci-003, though the result would be uneconomical.)
If such a system seemed intuitively universal but wasn’t exploding, what kind of observation would tell you that it isn’t universal after all, and therefore salvage your claim?
Given that we don’t understand how current LLMs work, and how the “space of problems” generally looks like, it’s difficult to come up with concrete tests that I’m confident I won’t goalpost-move on. A prospective one might be something like this:
If you invent or find a board game of similar complexity to chess that [the ML model] has never seen before and explain the rules using only text (and, if [the ML model] is multimodal, also images), [a pre-AGI model] will not be able to perform as well at the game as an average human who has never seen the game before and is learning it for the first time in the same way.
I. e., an AGI would be able to learn problem-solving in completely novel, “basically-off-distribution” domains. And if a system that has capabilities like this doesn’t explode (and it’s not deliberately trained to be myopic or something), that would falsify my view.
But for me to be confident in putting weight on that test, we’d need to clarify some specific details about the “minimum level of complexity” of the new board game, and that it’s “different enough” from all known board games for the AI to be unable to just generalize from them… And given that it’s unclear in which directions it’s easy to generalize, I expect I wouldn’t be confident in any metric we’d be able to come up with.
I guess a sufficient condition for AGI would be “is able to invent a new scientific field out of whole cloth, with no human steering, as an instrumental goal towards solving some other task”. But that’s obviously an overly high bar.
As far as empirical tests for AGI-ness go, I’m hoping for interpretability-based ones instead. I. e., that we’re able to formalize what “general intelligence” means, then search for search in our models.
As far as my epistemic position, I expect three scenarios here:
We develop powerful interpretability tools, and they directly show whether my claims about general intelligence hold.
We don’t develop powerful interpretability tools before an AGI explodes like I fear and kills us all.
AI capabilities gradually improve until we get to scientific-field-inventing AGIs, and my mind changes only then.
In scenarios where I’m wrong, I mostly don’t expect to ever encounter any black-boxy test like you’re suggesting which seems convincing to me, before I encounter overwhelming evidence that makes convincing me a moot point.
(Which is not to say my position on this can’t be moved at all — I’m open to mechanical arguments about why cognition doesn’t work how I’m saying it does, etc. But I don’t expect to ever observe an LLM whose mix of capabilities and incapabilities makes me go “oh, I guess that meets my minimal AGI standard but it doesn’t explode, guess I was wrong on that”.)
To understand whether this is the kind of thing that could be true or false, it seems like you should say what “the very first AGI” means. What makes a system an AGI?
I feel like this view is gradually looking less plausible as we build increasingly intelligent and general systems, and they persistently don’t explode (though only gradually because it’s unclear what the view means).
It looks to me like what is going to happen is that AI systems will gradually get better at R&D. They can help with R&D by some unknown combination of complementing and replacing human labor, either way leading to acceleration. The key question is the timeline for that acceleration—how long between “AIs are good enough at R&D that their help increases the rate of progress (on average, over relevant domains) by more than doubling the speed of human labor would” and “dyson sphere.” I feel that plausible views range from 2 months to 20 years, based on quantitative questions of returns curves and complementarity between AI and humans and the importance of capital. I’d overall guess 2-8 years.
Yeah, I think there’s a sharp-ish discontinuity at the point where we get to AGI. “General intelligence” is, to wit, general — it implements some cognition that can efficiently derive novel heuristics for solving any problem/navigating arbitrary novel problem domains. And a system that can’t do that is, well, not an AGI.
Conceptually, the distinction between an AGI and a pre-AGI system feels similar to the distinction between a system that’s Turing-complete and one that isn’t:
Any Turing-complete system implements a set of rules that suffices to represent any mathematical structure/run any program. A system that’s just “slightly below” Turing-completeness, however, is dramatically more limited.
Similarly, an AGI has a complete set of some cognitive features that make it truly universal — features it can use to bootstrap any other capability it needs, from scratch. By contrast, even a slightly “pre-AGI” system would be qualitatively inferior, not simply quantitatively so.
There’s still some fuzziness around the edges, like whether any significantly useful R&D capabilities only happen at post-AGI cognition, or to what extent being an AGI is the sufficient and not only the necessary condition for an omnicidal explosion.
But I do think there’s a meaningful sense in which AGI-ness is a binary, not a continuum. (I’m also hopeful regarding nailing all of this down mathematically, instead of just vaguely gesturing at it like this.)
It seems like a human is universal in the sense that they can think about new problem solving strategies, evaluate them, adopt successful ones, etc. Most of those new problem-solving strategies are developed by long trial and error and cultural imitation of successful strategies.
If a language model could do the same thing with chain of thought, would you say that it is an AGI? So would the existence of such systems, without an immediate intelligence explosion, falsify your view?
If such a system seemed intuitively universal but wasn’t exploding, what kind of observation would tell you that it isn’t universal after all, and therefore salvage your claim? Or maybe to put this more sharply: how did you decide that text-davinci-003, prompted to pursue an open-ended goal and given the ability to instantiate and delegate to new copies of itself, isn’t an AGI?
It seems to me that you will probably have an “AGI” in the sense you are gesturing at here well before the beginning of explosive R&D growth. I don’t really see the argument for why an intelligence explosion would follow quickly from that point. (Indeed, my view is that you could probably build an AGI in this sense out of text-davinci-003, though the result would be uneconomical.)
Given that we don’t understand how current LLMs work, and how the “space of problems” generally looks like, it’s difficult to come up with concrete tests that I’m confident I won’t goalpost-move on. A prospective one might be something like this:
I. e., an AGI would be able to learn problem-solving in completely novel, “basically-off-distribution” domains. And if a system that has capabilities like this doesn’t explode (and it’s not deliberately trained to be myopic or something), that would falsify my view.
But for me to be confident in putting weight on that test, we’d need to clarify some specific details about the “minimum level of complexity” of the new board game, and that it’s “different enough” from all known board games for the AI to be unable to just generalize from them… And given that it’s unclear in which directions it’s easy to generalize, I expect I wouldn’t be confident in any metric we’d be able to come up with.
I guess a sufficient condition for AGI would be “is able to invent a new scientific field out of whole cloth, with no human steering, as an instrumental goal towards solving some other task”. But that’s obviously an overly high bar.
As far as empirical tests for AGI-ness go, I’m hoping for interpretability-based ones instead. I. e., that we’re able to formalize what “general intelligence” means, then search for search in our models.
As far as my epistemic position, I expect three scenarios here:
We develop powerful interpretability tools, and they directly show whether my claims about general intelligence hold.
We don’t develop powerful interpretability tools before an AGI explodes like I fear and kills us all.
AI capabilities gradually improve until we get to scientific-field-inventing AGIs, and my mind changes only then.
In scenarios where I’m wrong, I mostly don’t expect to ever encounter any black-boxy test like you’re suggesting which seems convincing to me, before I encounter overwhelming evidence that makes convincing me a moot point.
(Which is not to say my position on this can’t be moved at all — I’m open to mechanical arguments about why cognition doesn’t work how I’m saying it does, etc. But I don’t expect to ever observe an LLM whose mix of capabilities and incapabilities makes me go “oh, I guess that meets my minimal AGI standard but it doesn’t explode, guess I was wrong on that”.)