I think it’s worth disentangling LLMs and Transformers and so on in discussions like this one—they are not one and the same. For instance, the following are distinct positions that have quite different implications:
The current precise transformer LM setup but bigger will never achieve AGI
A transformer trained on the language modelling objective will never achieve AGI (but a transformer network trained with other modalities or objectives or whatever will)
A language model with the transformer architecture will never achieve AGI (but a language model with some other architecture or training process will)
Which interventions make sense depends a lot on your precise model of why current models are not AGI, and I would consequently expect modelling things at the level of “LLMs vs not LLMs” to be less effective.
For the first I will just mention that labs are already moving away from the pure transformer architecture. Don’t take it from me, Sam Altman is on record saying they re moving away from pure scaling.
For the second I don’t think it’s a question about modalities. Text is definitely rich enough (’text is the universal interface)
Yes to the third. Like Byrnes I don’t feel I want to talk too much about it.
(But I will say it’s not too different from what’s being deployed right now. In particular I dont think there is something completely mysterious about intelligence or we need GOFAI or understand consciousness something like that)
The trouble is that AGI is an underdefined concept.
In my conception these simply don’t have the right type signature / architecture. Part of the problem is many people of conceiving General Intelligence in terms of capabilities which I think is misleading. A child is generally intelligent but a calculator is not.
The only caveat here is that it is conceivable that a large enough LLM might have a generally intelligent mesaoptimizers within it. I’m confused how likely this is.
That is not to say LLMs won’t be absolutely transformative—they will!
And that is not to say timelines are long
(I fear most of what is needed for AGI is already known by various people)
I think it’s worth disentangling LLMs and Transformers and so on in discussions like this one—they are not one and the same. For instance, the following are distinct positions that have quite different implications:
The current precise transformer LM setup but bigger will never achieve AGI
A transformer trained on the language modelling objective will never achieve AGI (but a transformer network trained with other modalities or objectives or whatever will)
A language model with the transformer architecture will never achieve AGI (but a language model with some other architecture or training process will)
Which interventions make sense depends a lot on your precise model of why current models are not AGI, and I would consequently expect modelling things at the level of “LLMs vs not LLMs” to be less effective.
You didn’t ask me but let me answer
I ~ believe all three.
For the first I will just mention that labs are already moving away from the pure transformer architecture. Don’t take it from me, Sam Altman is on record saying they re moving away from pure scaling.
For the second I don’t think it’s a question about modalities. Text is definitely rich enough (’text is the universal interface)
Yes to the third. Like Byrnes I don’t feel I want to talk too much about it. (But I will say it’s not too different from what’s being deployed right now. In particular I dont think there is something completely mysterious about intelligence or we need GOFAI or understand consciousness something like that)
The trouble is that AGI is an underdefined concept. In my conception these simply don’t have the right type signature / architecture. Part of the problem is many people of conceiving General Intelligence in terms of capabilities which I think is misleading. A child is generally intelligent but a calculator is not.
The only caveat here is that it is conceivable that a large enough LLM might have a generally intelligent mesaoptimizers within it. I’m confused how likely this is.
That is not to say LLMs won’t be absolutely transformative—they will! And that is not to say timelines are long (I fear most of what is needed for AGI is already known by various people)