Don’t know enough neuroscience or machine learning to have an opinion here
Same as above
Seems unlikely. Capabilities emerging at particular levels of scale (this phenomenon seems present even in GPT-3.5 → GPT-4) makes me doubt that GPT-2 has any sort of human level linguistics competence. That said, the task that GPT performs when modelling language (next token prediction), is not the same task that humans perform, and it seems likely that GPT-2 was already superhuman at that task? Alternatively, performance on downstream tasks is not an “ecological evaluation” of GPT.
LLaMA demonstrates that it is possible to tradeoff training data size against model size and attain performance gains even with Chinchilla over trained models; this also seems to reduce the inference compute cost
Thoughts:
Strongly agree re the unfair comparison
Don’t know enough neuroscience or machine learning to have an opinion here
Same as above
Seems unlikely. Capabilities emerging at particular levels of scale (this phenomenon seems present even in GPT-3.5 → GPT-4) makes me doubt that GPT-2 has any sort of human level linguistics competence. That said, the task that GPT performs when modelling language (next token prediction), is not the same task that humans perform, and it seems likely that GPT-2 was already superhuman at that task? Alternatively, performance on downstream tasks is not an “ecological evaluation” of GPT.
LLaMA demonstrates that it is possible to tradeoff training data size against model size and attain performance gains even with Chinchilla over trained models; this also seems to reduce the inference compute cost