Hmm… I think you are technically right, since “compositionality” is typically defined as a property of the way phrases/sentences/etc. in a language relate to their meanings. Since language modeling is a task defined in terms of words, without involving their referents at all, GPT-2 indeed does not model/exhibit this property of the way languages mean things.
But the same applies identically to every property of the way languages mean things! So if this is really the argument, there’s no reason to focus specifically on “compositionality.” On the one hand, we would never expect to get compositionality out of any language model, whether a “deep learning” model or some other kind. On the other hand, the argument would fail for any deep learning model that has to connect words with their referents, like one of those models that writes captions for images.
If we read the passage I quoted from 2019!Marcus in this way, it’s a trivially true point about GPT-2 that he immediately generalizes to a trivially false point about deep learning. I think when I originally read the passage, I just assumed he couldn’t possibly mean this, and jumped to another interpretation: he’s saying that deep learning lacks the capacity for structured representations, which would imply an inability to model compositionality even when needed (e.g. when doing image captioning as opposed to language modeling).
Fittingly, when he goes on to describe the problem, it doesn’t sound like he’s talking about meaning but about having flat rather than hierarchical representations:
Surprisingly, deep learning doesn’t really have any direct way of handling compositionality; it just has information about lots and lots of complex correlations, without any structure.
In The Algebraic Mind, Marcus critiqued some connectionist models on the grounds that they cannot support “structured representations.” Chapter 4 of the book is called “Structured Representations” and is all about this, mostly focused on meaning (he talks a lot about “structured knowledge”) but not at all tied to meaning specifically. Syntax and semantics are treated as equally in need of hierarchical representations, equally impossible without them, and equally possible with them.
Unlike the point about meaning and language models, this is a good and nontrivial argument that actually works against some neural nets once proposed as models of syntax or knowledge. So when 2019!Marcus wrote about “compositionality,” I assumed that he was making this argument, again, about GPT-2. In that case, GPT-2′s proficiency with syntax alone is a relevant datum, because Marcus and conventional linguists believe that syntax alone requires structured representations (as against some of the connectionists, who didn’t).
Unlike the point about meaning and language models, this is a good and nontrivial argument that actually works against some neural nets once proposed as models of syntax or knowledge.
For what it’s worth, I think you’re saying the same thing as my critique about concept modeling, if that’s what you’re referring to.
Hmm… I think you are technically right, since “compositionality” is typically defined as a property of the way phrases/sentences/etc. in a language relate to their meanings. Since language modeling is a task defined in terms of words, without involving their referents at all, GPT-2 indeed does not model/exhibit this property of the way languages mean things.
But the same applies identically to every property of the way languages mean things! So if this is really the argument, there’s no reason to focus specifically on “compositionality.” On the one hand, we would never expect to get compositionality out of any language model, whether a “deep learning” model or some other kind. On the other hand, the argument would fail for any deep learning model that has to connect words with their referents, like one of those models that writes captions for images.
If we read the passage I quoted from 2019!Marcus in this way, it’s a trivially true point about GPT-2 that he immediately generalizes to a trivially false point about deep learning. I think when I originally read the passage, I just assumed he couldn’t possibly mean this, and jumped to another interpretation: he’s saying that deep learning lacks the capacity for structured representations, which would imply an inability to model compositionality even when needed (e.g. when doing image captioning as opposed to language modeling).
Fittingly, when he goes on to describe the problem, it doesn’t sound like he’s talking about meaning but about having flat rather than hierarchical representations:
In The Algebraic Mind, Marcus critiqued some connectionist models on the grounds that they cannot support “structured representations.” Chapter 4 of the book is called “Structured Representations” and is all about this, mostly focused on meaning (he talks a lot about “structured knowledge”) but not at all tied to meaning specifically. Syntax and semantics are treated as equally in need of hierarchical representations, equally impossible without them, and equally possible with them.
Unlike the point about meaning and language models, this is a good and nontrivial argument that actually works against some neural nets once proposed as models of syntax or knowledge. So when 2019!Marcus wrote about “compositionality,” I assumed that he was making this argument, again, about GPT-2. In that case, GPT-2′s proficiency with syntax alone is a relevant datum, because Marcus and conventional linguists believe that syntax alone requires structured representations (as against some of the connectionists, who didn’t).
For what it’s worth, I think you’re saying the same thing as my critique about concept modeling, if that’s what you’re referring to.