expecting LLMs to not be the right kind of algorithm for future powerful AGI—the kind that can … do innovative science
I don’t know what could serve as a crux for this. When I don’t rule out LLMs, what I mean is that I can’t find an argument with the potential to convince me to become mostly confident that scaling LLMs to 1e29 FLOPs in the next few years won’t produce something clunky and unsuitable for many purposes, but still barely sufficient to then develop a more reasonable AI architecture within 1-2 more years. And by an LLM that does this I mean the overall system that allows LLM’s scaffolding environment to create and deploy new tuned models using new preference data that lets the new LLM variant do better on particular tasks as the old LLM variant encounters them, or even pre-train models on datasets with heavy doses of LLM-generated problem sets with solutions, to distill the topics that the previous generation of models needed extensive search to stumble through navigating, taking a lot of time and compute to retrain models in a particular stilted way where a more reasonable algorithm would do it much more efficiently.
Many traditionally non-LLM algorithms reduce to such a setup, at an unreasonable but possibly still affordable cost. So this quite fits the description of LLMs as not being “the right kind of algorithm”, but the prediction is that the scaling experiment could go either way, that there is no legible way to be confident in either outcome before it’s done.
I feel like “Will LLMs scale to AGI?” is right up there with “Should there be government regulation of large ML training runs?” as a black-hole-like attractor state that sucks up way too many conversations. :) I want to fight against that: this post is not about the question of whether or not LLMs will scale to AGI.
Rather, this post is conditioned on the scenario where future AGI will be an algorithm that (1) does not involve LLMs, and (2) will be invented by human AI researchers, as opposed to being invented by future LLMs (whether scaffolded, multi-modal, etc. or not). This is a scenario that I want to talk about; and if you assign an extremely low credence to that scenario, then whatever, we can agree to disagree. (If you want to argue about what credence is appropriate, you can try responding to me here or links therein, but note that I probably won’t engage, it’s generally not a topic I like to talk about for “infohazard” reasons [see footnote here if anyone reading this doesn’t know what that means].)
I find that a lot of alignment researchers don’t treat this scenario as their modal expectation, but still assign it like >10% credence, which is high enough that we should be able to agree that thinking through that scenario is a good use of time.
if you assign an extremely low credence to that scenario, then whatever
I don’t assign low credence to the scenario where LLMs don’t scale to AGI (and my point doesn’t depend on this). I assign low credence to the scenario where it’s knowable today that LLMs very likely won’t scale to AGI. That is, that there is a thing I could study that should change my mind on this. This is more of a crux than the question as a whole, studying that thing would be actionable if I knew what it is.
whether or not LLMs will scale to AGI
This wording mostly answers one of my questions, I’m now guessing that you would say that LLMs are (in hindsight) “the right kind of algorithm” if the scenario I described comes to pass, which wasn’t clear to me from the post.
Yeah when I say things like “I expect LLMs to plateau before TAI”, I tend not to say it with the supremely high confidence and swagger that you’d hear from e.g. Yann LeCun, François Chollet, Gary Marcus, Dileep George, etc. I’d be more likely to say “I expect LLMs to plateau before TAI … but, well, who knows, I guess. Shrug.” (The last paragraph of this comment is me bringing up a scenario with a vaguely similar flavor to the thing you’re pointing at.)
I don’t know what could serve as a crux for this. When I don’t rule out LLMs, what I mean is that I can’t find an argument with the potential to convince me to become mostly confident that scaling LLMs to 1e29 FLOPs in the next few years won’t produce something clunky and unsuitable for many purposes, but still barely sufficient to then develop a more reasonable AI architecture within 1-2 more years. And by an LLM that does this I mean the overall system that allows LLM’s scaffolding environment to create and deploy new tuned models using new preference data that lets the new LLM variant do better on particular tasks as the old LLM variant encounters them, or even pre-train models on datasets with heavy doses of LLM-generated problem sets with solutions, to distill the topics that the previous generation of models needed extensive search to stumble through navigating, taking a lot of time and compute to retrain models in a particular stilted way where a more reasonable algorithm would do it much more efficiently.
Many traditionally non-LLM algorithms reduce to such a setup, at an unreasonable but possibly still affordable cost. So this quite fits the description of LLMs as not being “the right kind of algorithm”, but the prediction is that the scaling experiment could go either way, that there is no legible way to be confident in either outcome before it’s done.
I feel like “Will LLMs scale to AGI?” is right up there with “Should there be government regulation of large ML training runs?” as a black-hole-like attractor state that sucks up way too many conversations. :) I want to fight against that: this post is not about the question of whether or not LLMs will scale to AGI.
Rather, this post is conditioned on the scenario where future AGI will be an algorithm that (1) does not involve LLMs, and (2) will be invented by human AI researchers, as opposed to being invented by future LLMs (whether scaffolded, multi-modal, etc. or not). This is a scenario that I want to talk about; and if you assign an extremely low credence to that scenario, then whatever, we can agree to disagree. (If you want to argue about what credence is appropriate, you can try responding to me here or links therein, but note that I probably won’t engage, it’s generally not a topic I like to talk about for “infohazard” reasons [see footnote here if anyone reading this doesn’t know what that means].)
I find that a lot of alignment researchers don’t treat this scenario as their modal expectation, but still assign it like >10% credence, which is high enough that we should be able to agree that thinking through that scenario is a good use of time.
I don’t assign low credence to the scenario where LLMs don’t scale to AGI (and my point doesn’t depend on this). I assign low credence to the scenario where it’s knowable today that LLMs very likely won’t scale to AGI. That is, that there is a thing I could study that should change my mind on this. This is more of a crux than the question as a whole, studying that thing would be actionable if I knew what it is.
This wording mostly answers one of my questions, I’m now guessing that you would say that LLMs are (in hindsight) “the right kind of algorithm” if the scenario I described comes to pass, which wasn’t clear to me from the post.
Yeah when I say things like “I expect LLMs to plateau before TAI”, I tend not to say it with the supremely high confidence and swagger that you’d hear from e.g. Yann LeCun, François Chollet, Gary Marcus, Dileep George, etc. I’d be more likely to say “I expect LLMs to plateau before TAI … but, well, who knows, I guess. Shrug.” (The last paragraph of this comment is me bringing up a scenario with a vaguely similar flavor to the thing you’re pointing at.)