As far as I’m aware, there was not (in recent decades at least) any controversy that word/punctuation choice was associative. We even have famous psycholinguistics experiments telling us that thinking of the word “goose” makes us more likely to think of the word “moose” as well as “duck” (linguistic priming is the one type of priming that has held up to the replication crisis as far as I know). Whenever linguists might have bothered to make computational models, I think those would have failed to produce human-like speech because their associative models were not powerful enough.
The appearance of a disagreement in this thread seems to hinge on an ambiguity in the phrase “word choice.”
If “word choice” just means something narrow like “selecting which noun you want to use, given that you are picking the inhabitant of a ‘slot’ in a noun phrase within a structured sentence and have a rough idea of what concept you want to convey,” then perhaps priming and other results about perceptions of “word similarity” might tell us something about how it is done. But no one ever thought that kind of word choice could scale up to full linguistic fluency, since you need some other process to provide the syntactic context. The idea that syntax can be eliminatively reduced to similarity-based choices on the word level is a radical rejection of linguistic orthodoxy. Nor does anyone (as far as I’m aware) believe GPT-2 works like this.
If “word choice” means something bigger that encompasses syntax, then priming experiments about single words don’t tell us much about it.
I do take the point that style as such might be a matter of the first, narrow kind of word choice, in which case GPT-2′s stylistic fluency is less surprising than its syntactic fluency. In fact, I think that’s true—intellectually, I am more impressed by the syntax than the style.
But the conjunction of the two impresses me to an extent greater than the sum of its parts. Occam’s Razor would have us prefer one mechanism to two when we can get away with it, so if we used to think two phenomena required very different mechanisms, a model that gets both using one mechanism should make us sit up and pay attention.
It’s more a priori plausible that all the distinctive things about language are products of a small number of facts about brain architecture, perhaps adapted to do only some of them with the rest arising as spandrels/epiphenomena—as opposed to needing N architectural facts to explain N distinctive things, with none of them yielding predictive fruit beyond the one thing it was proposed to explain. So, even if we already had a (sketch of a) model of style that felt conceptually akin to a neural net, the fact that we can get good style “for free” out of a model that also does good syntax (or, if you prefer, good syntax “for free” out of a model that also does good style) suggests we might be scientifically on the right track.
Neither one is surprising to me at all. In fact I don’t think there is a sharp divide between syntax and style—syntax is that word which we assign to culturally shared style. That’s why we can define specialized syntaxes for dialectal differences. And as a structural rule, syntax/style is very relevant to word choice since it prohibits certain combinations. A big enough network will have a large enough working memory to “keep in mind” enough contextual information to effectively satisfy the syntax rules describing the styles it learned.
As far as I’m aware, there was not (in recent decades at least) any controversy that word/punctuation choice was associative. We even have famous psycholinguistics experiments telling us that thinking of the word “goose” makes us more likely to think of the word “moose” as well as “duck” (linguistic priming is the one type of priming that has held up to the replication crisis as far as I know). Whenever linguists might have bothered to make computational models, I think those would have failed to produce human-like speech because their associative models were not powerful enough.
The appearance of a disagreement in this thread seems to hinge on an ambiguity in the phrase “word choice.”
If “word choice” just means something narrow like “selecting which noun you want to use, given that you are picking the inhabitant of a ‘slot’ in a noun phrase within a structured sentence and have a rough idea of what concept you want to convey,” then perhaps priming and other results about perceptions of “word similarity” might tell us something about how it is done. But no one ever thought that kind of word choice could scale up to full linguistic fluency, since you need some other process to provide the syntactic context. The idea that syntax can be eliminatively reduced to similarity-based choices on the word level is a radical rejection of linguistic orthodoxy. Nor does anyone (as far as I’m aware) believe GPT-2 works like this.
If “word choice” means something bigger that encompasses syntax, then priming experiments about single words don’t tell us much about it.
I do take the point that style as such might be a matter of the first, narrow kind of word choice, in which case GPT-2′s stylistic fluency is less surprising than its syntactic fluency. In fact, I think that’s true—intellectually, I am more impressed by the syntax than the style.
But the conjunction of the two impresses me to an extent greater than the sum of its parts. Occam’s Razor would have us prefer one mechanism to two when we can get away with it, so if we used to think two phenomena required very different mechanisms, a model that gets both using one mechanism should make us sit up and pay attention.
It’s more a priori plausible that all the distinctive things about language are products of a small number of facts about brain architecture, perhaps adapted to do only some of them with the rest arising as spandrels/epiphenomena—as opposed to needing N architectural facts to explain N distinctive things, with none of them yielding predictive fruit beyond the one thing it was proposed to explain. So, even if we already had a (sketch of a) model of style that felt conceptually akin to a neural net, the fact that we can get good style “for free” out of a model that also does good syntax (or, if you prefer, good syntax “for free” out of a model that also does good style) suggests we might be scientifically on the right track.
Neither one is surprising to me at all. In fact I don’t think there is a sharp divide between syntax and style—syntax is that word which we assign to culturally shared style. That’s why we can define specialized syntaxes for dialectal differences. And as a structural rule, syntax/style is very relevant to word choice since it prohibits certain combinations. A big enough network will have a large enough working memory to “keep in mind” enough contextual information to effectively satisfy the syntax rules describing the styles it learned.