While Wolfram’s explanation is likely the fundamental premise upon which ChatGPT operates (from an initial design perspective), much of this article assumes a deeper functioning that, as is plainly admitted by the author, is unknown. We don’t KNOW how LLMs work. To attribute anything more than reasonably understood neural weighting algos to its operations is blue sky guessing. Let’s not waste time on that, nor on speculation in the face of limited accessible evidence one way or the other.
As I understand it, the point of neural net architectures is that they can learn a wide variety of objects, with some architectural specialization to suit various domains. Thus, during training there is a sense in which they ‘take on’ the structure of objects in the domain over which they operate. That’s one thing I am assuming. I furthermore believe that, since GPTs work in the domain of language, and language is a highly structured domain, that some knowledge of how language is structured is relevant to understand what GPTs are doing.
With this in mind, I want to turn to some work published Christopher D. Manning et al, in 2020.[1] They investigated syntactic structures represented in BERT (Bidirectional Encoder Representations from Transformers). Early in the paper they observe:
One might expect that a machine-learning model trained to predict the next word in a text will just be a giant associational learning machine, with lots of statistics on how often the word restaurant is followed by kitchen and perhaps some basic abstracted sequence knowledge such as knowing that adjectives are commonly followed by nouns in English. It is not at all clear that such a system can develop interesting knowledge of the linguistic structure of whatever human language the system is trained on. Indeed, this has been the dominant perspective in linguistics, where language models have long been seen as inadequate and having no scientific interest, even when their usefulness in practical engineering applications is grudgingly accepted.
That is not what they found. They found syntax. They discovered that neural networks induce
representations of sentence structure which capture many of the notions of linguistics, including word classes (parts of speech), syntactic structure (grammatical relations or dependencies), and coreference (which mentions of an entity refer to the same entity, such as, e.g., when “she” refers back to “Rachel”). [...] Indeed, the learned encoding of a sentence to a large extent includes the information found in the parse tree structures of sentences that have been proposed by linguists.
While BERT is a different kind of language technology than GPT, it does seem reasonable to assume that ChatGPT implements syntactic structure as well. Wouldn’t that have been the simplest, most parsimonious, explanation for its syntactic prowess? It would be a mistake, however, to think of story structure as just scaled-up syntactic structure.
Christopher D. Manning, Kevin Clark, John Hewitt, Urvashi Khandelwal, and Omer Levy, Emergent linguistic structure in artificial neural networks trained by self-supervision, PNAS, Vol. 117, No. 48, June 3, 2020, pp. 30046-30054, https://doi.org/10.1073/pnas.1907367117
While Wolfram’s explanation is likely the fundamental premise upon which ChatGPT operates (from an initial design perspective), much of this article assumes a deeper functioning that, as is plainly admitted by the author, is unknown. We don’t KNOW how LLMs work. To attribute anything more than reasonably understood neural weighting algos to its operations is blue sky guessing. Let’s not waste time on that, nor on speculation in the face of limited accessible evidence one way or the other.
As I understand it, the point of neural net architectures is that they can learn a wide variety of objects, with some architectural specialization to suit various domains. Thus, during training there is a sense in which they ‘take on’ the structure of objects in the domain over which they operate. That’s one thing I am assuming. I furthermore believe that, since GPTs work in the domain of language, and language is a highly structured domain, that some knowledge of how language is structured is relevant to understand what GPTs are doing.
That, however, is not a mere assumption. We have some evidence about that. Here’s a passage from my working paper, ChatGPT intimates a tantalizing future, its core LLM is organized on multiple levels, and it has broken the idea of thinking:
With this in mind, I want to turn to some work published Christopher D. Manning et al, in 2020.[1] They investigated syntactic structures represented in BERT (Bidirectional Encoder Representations from Transformers). Early in the paper they observe:
That is not what they found. They found syntax. They discovered that neural networks induce
While BERT is a different kind of language technology than GPT, it does seem reasonable to assume that ChatGPT implements syntactic structure as well. Wouldn’t that have been the simplest, most parsimonious, explanation for its syntactic prowess? It would be a mistake, however, to think of story structure as just scaled-up syntactic structure.
Christopher D. Manning, Kevin Clark, John Hewitt, Urvashi Khandelwal, and Omer Levy, Emergent linguistic structure in artificial neural networks trained by self-supervision, PNAS, Vol. 117, No. 48, June 3, 2020, pp. 30046-30054, https://doi.org/10.1073/pnas.1907367117