Some things that I feel undermine your case: your sample size is fairly small here, and it would have been valuable if you tried sampling maybe 10-20 times for each. Also, these code snippets are either the kind of thing I’d expect would be in the dataset, or are trivial. Plus, GPT-3 wasn’t used as a base model for AlphaCode, so it can’t have been due to “fine-tuning and filtering tricks”. Finally, GPT-3 is way bigger than any AlphaCode model.
I had missed this step. Retrospectively it should have been obvious… of course that you don’t start from a huge text predictor model to build a code predictor model that only needs to predict compilable code. Thanks for the clarification.
I think the fact that GPT-3 is controlled by OpenAI and AlphaCode is a DeepMind project has more to do with it. Of course you don’t need to hotstart by transfer learning, but it’s a good idea anyway if you can, which is why DM not using its own GPT-3-equivalent (Gopher, trained at considerable expense) has drawn comment.
Some things that I feel undermine your case: your sample size is fairly small here, and it would have been valuable if you tried sampling maybe 10-20 times for each. Also, these code snippets are either the kind of thing I’d expect would be in the dataset, or are trivial. Plus, GPT-3 wasn’t used as a base model for AlphaCode, so it can’t have been due to “fine-tuning and filtering tricks”. Finally, GPT-3 is way bigger than any AlphaCode model.
I had missed this step. Retrospectively it should have been obvious… of course that you don’t start from a huge text predictor model to build a code predictor model that only needs to predict compilable code. Thanks for the clarification.
I think the fact that GPT-3 is controlled by OpenAI and AlphaCode is a DeepMind project has more to do with it. Of course you don’t need to hotstart by transfer learning, but it’s a good idea anyway if you can, which is why DM not using its own GPT-3-equivalent (Gopher, trained at considerable expense) has drawn comment.