A few corollaries and alternative conclusions to the same premises:
There are two distinct interesting things here: a magic cross-domain property that can be learned, and an inner architecture that can learn it.
There may be several small efficient architectures. The ones in human brains may not be like the ones in language models. We have plausibly found one efficient architecture; this is not much evidence about unrelated implementations.
Since the learning is transferable to other domains, it’s not language specific. Large language models are just where we happened to first build good enough models. You quote discussion of the special properties of natural language statistics but, by assumption, there are similar statistical properties in other domains. The more a property is specific to language, or necessary because of the special properties of language, the less it’s likely to be a universal property that transfers to other domains.
Thanks for pointing this out!
A few corollaries and alternative conclusions to the same premises:
There are two distinct interesting things here: a magic cross-domain property that can be learned, and an inner architecture that can learn it.
There may be several small efficient architectures. The ones in human brains may not be like the ones in language models. We have plausibly found one efficient architecture; this is not much evidence about unrelated implementations.
Since the learning is transferable to other domains, it’s not language specific. Large language models are just where we happened to first build good enough models. You quote discussion of the special properties of natural language statistics but, by assumption, there are similar statistical properties in other domains. The more a property is specific to language, or necessary because of the special properties of language, the less it’s likely to be a universal property that transfers to other domains.