Of course most of your points are very valid (although some are more questionable, such as blakening as an anthropomorphism, which sounds more dismissive than explanatory), but there’s a natural intermediate position between yours and Mahowald et al. 2023:
As of 2023, LLM produce something most would call thought, if only it wasn’t so bad at keeping functional coherence.
In other words, we could argue at infinite at which exact point we should call that bits of thought or bits of langage. But the actual, non trivial, point is: is it likely that coherence will spontaneously appear with more computing ressources (plus maybe a few more or less minor tricks)? Or is it likely that LLM2AGI requires something more?
I mostly agree with the latter, while also thinking that it’s very possible that, in the limit of excessively (?) large computing power, LLM could grokke to functional coherence at some point.
I argue for the former in the section “Linguistic capability circuits inside LLM-based AI could be sufficient for approximating general intelligence”. Insisting that AGI action must be a single Transformer inference is pointless: sure, The Bitter Lesson suggests that things will eventually converge in that direction, but first AGI will unlikely be like that.
Then I misread this section as arguing that LLM could yada yada, not that it was likely. Would you like to bet?
Yes, we agree not to care about completing single inference with what I called more or less minor tricks, like using a context document telling to play the role of, say, a three-headed lizardwoman from Venus (say it fits your parental caring needs better than Her).
Thanks for this excessively interesting post.
Of course most of your points are very valid (although some are more questionable, such as blakening as an anthropomorphism, which sounds more dismissive than explanatory), but there’s a natural intermediate position between yours and Mahowald et al. 2023:
As of 2023, LLM produce something most would call thought, if only it wasn’t so bad at keeping functional coherence.
In other words, we could argue at infinite at which exact point we should call that bits of thought or bits of langage. But the actual, non trivial, point is: is it likely that coherence will spontaneously appear with more computing ressources (plus maybe a few more or less minor tricks)? Or is it likely that LLM2AGI requires something more?
I mostly agree with the latter, while also thinking that it’s very possible that, in the limit of excessively (?) large computing power, LLM could grokke to functional coherence at some point.
I argue for the former in the section “Linguistic capability circuits inside LLM-based AI could be sufficient for approximating general intelligence”. Insisting that AGI action must be a single Transformer inference is pointless: sure, The Bitter Lesson suggests that things will eventually converge in that direction, but first AGI will unlikely be like that.
Then I misread this section as arguing that LLM could yada yada, not that it was likely. Would you like to bet?
Yes, we agree not to care about completing single inference with what I called more or less minor tricks, like using a context document telling to play the role of, say, a three-headed lizardwoman from Venus (say it fits your parental caring needs better than Her).