Common Deep Learning Critique “It’s just memorization”
Critique:
Let’s say there is some intelligent behavior that emerges from these huge models. These researchers have given up on the idea that we should understand intelligence. They’re just playing the memorization game. They’re using their petabytes and petabytes of data to make these every bigger models, and they’re just memorizing everything with brute force. This strategy can not scale. They will run out of space before anything more interesting happens.
Connor Leahy response:
What you just described is what I believed one year ago. I believed bigger models just memorized more.
But when I sat down and read the paper for GPT-3I almost fell out of my chair. GPT-3 did not complete a full epoch on its data. It only saw most of its data once.
But even so, it had a wide knowledge of topics it could not have seen more than once. This implies it is capable of learning complete concepts in a single update step. Everyone says deep learning can’t do this. But GPT-3 seems to have learned some kind of meta-learning algorithm to allow it to rapidly learn new concepts it only sees one time.
Human babies need to hear a new word many times to learn it, perhaps over weeks. Human adults, on the other hand, can immediately understand a new word the first time it is introduced. GPT-3 has learned this without any change in architecture.
Machine learning researchers
Common Deep Learning Critique “It’s just memorization”
Critique:
Connor Leahy response:
Paraphrased from this podcast segment: https://www.youtube.com/watch?v=HrV19SjKUss&t=5460s