That’s not a novel result though. We’ve basically known those aspects of speech to be associative for decades. Indeed it is pretty hard to explain many frequent errors in human speech models without associative generative models. There are some outliers, like Chomsky, who persist in pushing unrealistic models of human speech, but for the most part the field has assumed something like the Transformer model is how the lower levels of speech production worked.
Now reducing that assumption to practice is a huge engineering accomplishment which I don’t mean to belittle. But the OP is wondering why linguists are not all infatuated with GPT-2. The answer is that there wasn’t that much to be learned from a theorist perspective. They already assigned >90% probability that GPT-2 models something like how speech production works. So having it reduced to practice isn’t that big of an update, in terms of Bayesian reasoning. It’s just the wheel of progress turning forward.
They already assigned >90% probability that GPT-2 models something like how speech production works.
Is that truly the case? I recall reading Corey Washington a former linguist (who left the field for neuroscience in frustration with its culture and methods) claim that when he was a linguist the general attitude was there was no way in hell something like GPT-2 would ever work even close to the degree that it does.
Found it:
Steve: Corey’s background is in philosophy of language and linguistics, and also neuroscience, and I have always felt that he’s a little bit more pessimistic than I am about AGI. So I’m curious — and answer honestly, Corey, no revisionist thinking — before the results of this GPT-2 paper were available to you, would you not have bet very strongly against the procedure that they went through working?
Corey: Yes, I would’ve said no way in hell actually, to be honest with you.
Steve: Yes. So it’s an event that caused you to update your priors.
Corey: Absolutely. Just to be honest, when I was coming up, I was at MIT in the mid ’80s in linguistics, and there was this general talk about how machine translation just would never happen and how it was just lunacy, and maybe if they listened to us at MIT and took a little linguistics class they might actually figure out how to get this thing to work, but as it is they’re going off and doing this stuff which is just destined to fail. It’s a complete falsification of that basic outlook, which I think, — looking back, of course — had very little evidence — it had a lot of hubris behind it, but very little evidence behind it.
I was just recently reading a paper in Dutch, and I just simply… First of all, the OCR recognized the Dutch language and it gave me a little text version of the page. I simply copied the page, pasted it into Google Translate, and got a translation that allowed me to basically read this article without much difficulty. That would’ve been thought to be impossible 20, 30 years ago — and it’s not even close to predicting the next word, or writing in the style that is typical of the corpus.
Do you know any promising theories of the higher levels of speech production (i.e., human verbal/symbolic reasoning)? That seems to me to be one of the biggest missing pieces at this point of a theoretical understanding of human intelligence (and of AGI theory), and I wonder if there’s actually good theoretical work out there that I’m just not aware of.
for the most part the field has assumed something like the Transformer model is how the lower levels of speech production worked
Can you be more specific about what you mean by “something like the Transformer model”? Or is there a reference you recommend? I don’t think anyone believes that there are literally neurons in the brain wired up into a Transformer, or anything like that, right?
That’s not a novel result though. We’ve basically known those aspects of speech to be associative for decades. Indeed it is pretty hard to explain many frequent errors in human speech models without associative generative models. There are some outliers, like Chomsky, who persist in pushing unrealistic models of human speech, but for the most part the field has assumed something like the Transformer model is how the lower levels of speech production worked.
Now reducing that assumption to practice is a huge engineering accomplishment which I don’t mean to belittle. But the OP is wondering why linguists are not all infatuated with GPT-2. The answer is that there wasn’t that much to be learned from a theorist perspective. They already assigned >90% probability that GPT-2 models something like how speech production works. So having it reduced to practice isn’t that big of an update, in terms of Bayesian reasoning. It’s just the wheel of progress turning forward.
Is that truly the case? I recall reading Corey Washington a former linguist (who left the field for neuroscience in frustration with its culture and methods) claim that when he was a linguist the general attitude was there was no way in hell something like GPT-2 would ever work even close to the degree that it does.
Found it:
Do you know any promising theories of the higher levels of speech production (i.e., human verbal/symbolic reasoning)? That seems to me to be one of the biggest missing pieces at this point of a theoretical understanding of human intelligence (and of AGI theory), and I wonder if there’s actually good theoretical work out there that I’m just not aware of.
Can you be more specific about what you mean by “something like the Transformer model”? Or is there a reference you recommend? I don’t think anyone believes that there are literally neurons in the brain wired up into a Transformer, or anything like that, right?