If compositionality is necessary, then this sort of “deep learning” implements compositionality, even if this fact is not superficially obvious from its structure.
But compositionality mostly isn’t necessary for the kind of writing GPT-2 does. Try getting it to tell you how many legs a dog has. One? Three? Five? It doesn’t know, because people rarely write things like “a dog has four legs” in its input data. Here’s GPT-2:
A dog has the same number of legs as a man, but has fewer legs than a gorilla. It has a lot of brains, but they are divided equally between the two front legs.
It’s very good at coming up with sentences that are grammatically and stylistically correct, but it has no concept of whether they’re true. Now, maybe that’s just a result of interacting exclusively with language—people generally learn how many legs a dog has by looking at one, not by hearing about them. But even when it comes to purely linguistic issues, it basically doesn’t make true statements. This is typified by its habit of contradicting itself (or repeating itself) within the same sentence:
A convincing argument requires certain objects to exist. Otherwise it’s not really science. For instance, according to Stromberg, all the atoms in a dog’s body exist, and the image of the dog, even though it does not exist, exists (p. 120).
Hmm… I think you are technically right, since “compositionality” is typically defined as a property of the way phrases/sentences/etc. in a language relate to their meanings. Since language modeling is a task defined in terms of words, without involving their referents at all, GPT-2 indeed does not model/exhibit this property of the way languages mean things.
But the same applies identically to every property of the way languages mean things! So if this is really the argument, there’s no reason to focus specifically on “compositionality.” On the one hand, we would never expect to get compositionality out of any language model, whether a “deep learning” model or some other kind. On the other hand, the argument would fail for any deep learning model that has to connect words with their referents, like one of those models that writes captions for images.
If we read the passage I quoted from 2019!Marcus in this way, it’s a trivially true point about GPT-2 that he immediately generalizes to a trivially false point about deep learning. I think when I originally read the passage, I just assumed he couldn’t possibly mean this, and jumped to another interpretation: he’s saying that deep learning lacks the capacity for structured representations, which would imply an inability to model compositionality even when needed (e.g. when doing image captioning as opposed to language modeling).
Fittingly, when he goes on to describe the problem, it doesn’t sound like he’s talking about meaning but about having flat rather than hierarchical representations:
Surprisingly, deep learning doesn’t really have any direct way of handling compositionality; it just has information about lots and lots of complex correlations, without any structure.
In The Algebraic Mind, Marcus critiqued some connectionist models on the grounds that they cannot support “structured representations.” Chapter 4 of the book is called “Structured Representations” and is all about this, mostly focused on meaning (he talks a lot about “structured knowledge”) but not at all tied to meaning specifically. Syntax and semantics are treated as equally in need of hierarchical representations, equally impossible without them, and equally possible with them.
Unlike the point about meaning and language models, this is a good and nontrivial argument that actually works against some neural nets once proposed as models of syntax or knowledge. So when 2019!Marcus wrote about “compositionality,” I assumed that he was making this argument, again, about GPT-2. In that case, GPT-2′s proficiency with syntax alone is a relevant datum, because Marcus and conventional linguists believe that syntax alone requires structured representations (as against some of the connectionists, who didn’t).
Unlike the point about meaning and language models, this is a good and nontrivial argument that actually works against some neural nets once proposed as models of syntax or knowledge.
For what it’s worth, I think you’re saying the same thing as my critique about concept modeling, if that’s what you’re referring to.
But compositionality mostly isn’t necessary for the kind of writing GPT-2 does. Try getting it to tell you how many legs a dog has. One? Three? Five? It doesn’t know, because people rarely write things like “a dog has four legs” in its input data. Here’s GPT-2:
It’s very good at coming up with sentences that are grammatically and stylistically correct, but it has no concept of whether they’re true. Now, maybe that’s just a result of interacting exclusively with language—people generally learn how many legs a dog has by looking at one, not by hearing about them. But even when it comes to purely linguistic issues, it basically doesn’t make true statements. This is typified by its habit of contradicting itself (or repeating itself) within the same sentence:
Hmm… I think you are technically right, since “compositionality” is typically defined as a property of the way phrases/sentences/etc. in a language relate to their meanings. Since language modeling is a task defined in terms of words, without involving their referents at all, GPT-2 indeed does not model/exhibit this property of the way languages mean things.
But the same applies identically to every property of the way languages mean things! So if this is really the argument, there’s no reason to focus specifically on “compositionality.” On the one hand, we would never expect to get compositionality out of any language model, whether a “deep learning” model or some other kind. On the other hand, the argument would fail for any deep learning model that has to connect words with their referents, like one of those models that writes captions for images.
If we read the passage I quoted from 2019!Marcus in this way, it’s a trivially true point about GPT-2 that he immediately generalizes to a trivially false point about deep learning. I think when I originally read the passage, I just assumed he couldn’t possibly mean this, and jumped to another interpretation: he’s saying that deep learning lacks the capacity for structured representations, which would imply an inability to model compositionality even when needed (e.g. when doing image captioning as opposed to language modeling).
Fittingly, when he goes on to describe the problem, it doesn’t sound like he’s talking about meaning but about having flat rather than hierarchical representations:
In The Algebraic Mind, Marcus critiqued some connectionist models on the grounds that they cannot support “structured representations.” Chapter 4 of the book is called “Structured Representations” and is all about this, mostly focused on meaning (he talks a lot about “structured knowledge”) but not at all tied to meaning specifically. Syntax and semantics are treated as equally in need of hierarchical representations, equally impossible without them, and equally possible with them.
Unlike the point about meaning and language models, this is a good and nontrivial argument that actually works against some neural nets once proposed as models of syntax or knowledge. So when 2019!Marcus wrote about “compositionality,” I assumed that he was making this argument, again, about GPT-2. In that case, GPT-2′s proficiency with syntax alone is a relevant datum, because Marcus and conventional linguists believe that syntax alone requires structured representations (as against some of the connectionists, who didn’t).
For what it’s worth, I think you’re saying the same thing as my critique about concept modeling, if that’s what you’re referring to.