I have an impression that within lifetime human learning is orders of magnitude more sample efficient than large language models
Yes, I think this is clearly true, at least with respect to the number of word tokens a human must be exposed to in order to obtain full understanding of one’s first language.
Suppose for the sake of argument that someone encounters (through either hearing or reading) 50,000 words per day on average, starting from birth, and that it takes 6000 days (so about 16 years and 5 months) to obtain full adult-level linguistic competence (I can see an argument that full linguistic competence happens years before this, but I don’t think you could really argue that it happens much after this).
This would mean that the person encounters a total of 300,000,000 words in the course of gaining full language understanding. By contrast, the training data numbers I have seen for LLMs are typically in the hundreds of billions of tokens.
And I think there is evidence that humans can obtain linguistic fluency with exposure to far fewer words/tokens than this.
Children born deaf, for example, can only be exposed to a sign-language token when they are looking at the person making the sign, and thus probably get exposure to fewer tokens by default than hearing children who can overhear a conversation somewhere else, but they can still become fluent in sign language.
Even just considering people whose parents did not talk much and who didn’t go to school or learn to read, they are almost always able to acquire linguistic competence (except in cases of extreme deprivation).
Yes, I think this is clearly true, at least with respect to the number of word tokens a human must be exposed to in order to obtain full understanding of one’s first language.
Suppose for the sake of argument that someone encounters (through either hearing or reading) 50,000 words per day on average, starting from birth, and that it takes 6000 days (so about 16 years and 5 months) to obtain full adult-level linguistic competence (I can see an argument that full linguistic competence happens years before this, but I don’t think you could really argue that it happens much after this).
This would mean that the person encounters a total of 300,000,000 words in the course of gaining full language understanding. By contrast, the training data numbers I have seen for LLMs are typically in the hundreds of billions of tokens.
And I think there is evidence that humans can obtain linguistic fluency with exposure to far fewer words/tokens than this.
Children born deaf, for example, can only be exposed to a sign-language token when they are looking at the person making the sign, and thus probably get exposure to fewer tokens by default than hearing children who can overhear a conversation somewhere else, but they can still become fluent in sign language.
Even just considering people whose parents did not talk much and who didn’t go to school or learn to read, they are almost always able to acquire linguistic competence (except in cases of extreme deprivation).