I found his results very impressive as well. For example, he’s able to prompt GPT-3 to summarize a Wikipedia article on quantum computing at either a second grade or an eighth grade level, depending on the prompt.
It’s not really meant to be a stand alone explanation, but it does list some of GPT-2/3′s more impressive abilities. After compiling the presentation, I think we’ll look back on GPT-3 as the “Wright brothers” moment for AGI.
Consider, this post suggests GPT-3 cost ~$4.6 million to train: https://lambdalabs.com/blog/demystifying-gpt-3. It would be well within Google/Microsoft/Amazon/DoD/etc’s budget to increase model size by another 2 (possibly 3) orders of magnitude. Based on the jump in GPT-3′s performance going from 13 B parameters to 175 B parameters, such a “GPT-4” would be absolutely stunning.
On the bright side, according to OpenAI’s scaling laws paper, GPT-3 is about the size that scaling was predicted to start breaking down. So maybe GPT-4 won’t actually be better than GPT-3. I’m not counting on it though.
Same. Specifically, I went from predicting 50% chance of human-level AGI within 40 years to 50% chance within 10 years.
Andrew Mayne was also given access to the GPT-3 API. You can read his impressions here: https://andrewmayneblog.wordpress.com/
I found his results very impressive as well. For example, he’s able to prompt GPT-3 to summarize a Wikipedia article on quantum computing at either a second grade or an eighth grade level, depending on the prompt.
I actually put together a presentation on GPT-like architectures and their uses for my advisor: https://docs.google.com/presentation/d/1kCJ2PJ_3UteHBX5TWZyrF5ontEdNx_B4vi6KTmQmPNo/edit?usp=sharing
It’s not really meant to be a stand alone explanation, but it does list some of GPT-2/3′s more impressive abilities. After compiling the presentation, I think we’ll look back on GPT-3 as the “Wright brothers” moment for AGI.
Consider, this post suggests GPT-3 cost ~$4.6 million to train: https://lambdalabs.com/blog/demystifying-gpt-3. It would be well within Google/Microsoft/Amazon/DoD/etc’s budget to increase model size by another 2 (possibly 3) orders of magnitude. Based on the jump in GPT-3′s performance going from 13 B parameters to 175 B parameters, such a “GPT-4” would be absolutely stunning.
On the bright side, according to OpenAI’s scaling laws paper, GPT-3 is about the size that scaling was predicted to start breaking down. So maybe GPT-4 won’t actually be better than GPT-3. I’m not counting on it though.
It’s possible that GPT-3 is roughly at where the maximally naive simple text LM begins to hit the constant wall, but I don’t regard this as important; as I emphasize at every turn, there are many distinct ways in which to improve it greatly using purely known methods, never mind future research approaches. The question is not whether there is any way GPT-4 might fail, but any way in which it might succeed.
There’s a typo in your Andrew Mayne link, but thanks for linking it—that’s wild!
https://andrewmayneblog.wordpress.com/
Thanks, fixed.