“GPT-2 is a large transformer-based language model with 1.5 billion parameters, trained on a dataset[1] of 8 million web pages. GPT-2 is trained with a simple objective: predict the next word, given all of the previous words within some text”
Comparing AI to human neurology is off the mark in my estimation, because AIs don’t really learn rules. They can predict outcomes (within a narrow context), but the AI has no awareness of the actual “rules” that are leading to that outcome—all it knows is weights and likelihoods.
This reality actually drives one of the key distinctions between human neurology and AI—humans often only need one record in their “training set” in order to begin making wildly accurate predictions, because humans turn that record into a rule that can be immediately generalized, while an AI would be positively useless with so little data.
A good example:
Imagine a human being is introduced to a deer for the first time. They are told that the animal they are looking at is called a “deer”. From then on, every single time they see a deer, they will know it’s a deer, without any other pieces of data.
In contrast, building an AI that could begin correctly identifying images of deer after being exposed to just one training record, without sacrificing the AIs ability to be as discerning as it needs to be (is that a bird in that brush?), is extraordinarily out of reach at the moment.
EDIT: Also thought I’d point out that GPT-2′s efficacy declines substantially as the length of its writing increases (I forget the exact number but after 300 words or something it all goes to mumbo jumbo). That, to me, strongly indicates that GPT-2 is not imitating human neurology at all.
What you describe is correct about GPT-2 and a correct response to the article, but be careful to not over-generalize. There are ways of making AIs with “one-shot learning.”
I had a sense I was kind of overstepping when I wrote that...
Do those AI frameworks tend to be very discerning though? I imagine they tend to have high recall and low precision on valid test cases too dissimilar from the single training case.
I should say first that I completely agree with you about the extreme data inefficiency of many systems that get enthusiastically labeled “AI” these days—it is a big problem which calls into question many claims about these systems and their displays of “intelligence.”
Especially a few years ago (the field has been getting better about this over time), there was a tendency to define performance with reference to some set collection of tasks similar to the training task without acknowledging that broader generalization capacity, and generalization speed in terms of “number of data points needed to learn the general rule,” are key components of any intuitive/familiar notion of intelligence. I’ve written about this in a few places, like the last few sections of this post, where I talk about the “strange simpletons.”
However, it’s not clear to me that this limitation is inherent to neural nets or to “AI” in the way you seem to be saying. You write:
Comparing AI to human neurology is off the mark in my estimation, because AIs don’t really learn rules. They can predict outcomes (within a narrow context), but the AI has no awareness of the actual “rules” that are leading to that outcome—all it knows is weights and likelihoods.
If I understand you correctly, you’re taking a position that Marcus argued against in The Algebraic Mind. I’m taking Marcus’ arguments there largely as a given in this post, because I agree with them and because I was interested specifically in the way Marcus’ Algebraic Mind arguments cut against Marcus’ own views about deep learning today.
If you want to question the Algebraic Mind stuff itself, that’s fine, but if so you’re disagreeing with both me and Marcus more fundamentally than (I think) Marcus and I disagree with one another, and you’ll need a more fleshed-out argument if you want to bridge a gulf of this size.
“You need the right architecture. You need, maybe, just maybe, an architecture that can tell us a thing or two about the human brain.”
I liked this article. I don’t think GPT-2 can tell us anything about how the human brain works, though.
Regardless of how well GPT-2 writes, it does not understand language, remotely. I’ve taken an excerpt from https://openai.com/blog/better-language-models/ :
“GPT-2 is a large transformer-based language model with 1.5 billion parameters, trained on a dataset[1] of 8 million web pages. GPT-2 is trained with a simple objective: predict the next word, given all of the previous words within some text”
Comparing AI to human neurology is off the mark in my estimation, because AIs don’t really learn rules. They can predict outcomes (within a narrow context), but the AI has no awareness of the actual “rules” that are leading to that outcome—all it knows is weights and likelihoods.
This reality actually drives one of the key distinctions between human neurology and AI—humans often only need one record in their “training set” in order to begin making wildly accurate predictions, because humans turn that record into a rule that can be immediately generalized, while an AI would be positively useless with so little data.
A good example:
Imagine a human being is introduced to a deer for the first time. They are told that the animal they are looking at is called a “deer”. From then on, every single time they see a deer, they will know it’s a deer, without any other pieces of data.
In contrast, building an AI that could begin correctly identifying images of deer after being exposed to just one training record, without sacrificing the AIs ability to be as discerning as it needs to be (is that a bird in that brush?), is extraordinarily out of reach at the moment.
EDIT: Also thought I’d point out that GPT-2′s efficacy declines substantially as the length of its writing increases (I forget the exact number but after 300 words or something it all goes to mumbo jumbo). That, to me, strongly indicates that GPT-2 is not imitating human neurology at all.
What you describe is correct about GPT-2 and a correct response to the article, but be careful to not over-generalize. There are ways of making AIs with “one-shot learning.”
I had a sense I was kind of overstepping when I wrote that...
Do those AI frameworks tend to be very discerning though? I imagine they tend to have high recall and low precision on valid test cases too dissimilar from the single training case.
The Wikipedia article is pretty good on this subject:
https://en.wikipedia.org/wiki/One-shot_learning
I should say first that I completely agree with you about the extreme data inefficiency of many systems that get enthusiastically labeled “AI” these days—it is a big problem which calls into question many claims about these systems and their displays of “intelligence.”
Especially a few years ago (the field has been getting better about this over time), there was a tendency to define performance with reference to some set collection of tasks similar to the training task without acknowledging that broader generalization capacity, and generalization speed in terms of “number of data points needed to learn the general rule,” are key components of any intuitive/familiar notion of intelligence. I’ve written about this in a few places, like the last few sections of this post, where I talk about the “strange simpletons.”
However, it’s not clear to me that this limitation is inherent to neural nets or to “AI” in the way you seem to be saying. You write:
If I understand you correctly, you’re taking a position that Marcus argued against in The Algebraic Mind. I’m taking Marcus’ arguments there largely as a given in this post, because I agree with them and because I was interested specifically in the way Marcus’ Algebraic Mind arguments cut against Marcus’ own views about deep learning today.
If you want to question the Algebraic Mind stuff itself, that’s fine, but if so you’re disagreeing with both me and Marcus more fundamentally than (I think) Marcus and I disagree with one another, and you’ll need a more fleshed-out argument if you want to bridge a gulf of this size.