Veedrac comments on is gpt-3 few-shot ready for real applications?

Veedrac 6 Aug 2020 20:43 UTC
8 points
If the issue is the size of having a fine-tuned model for each individual task you care about, why not just fine-tune on all your tasks simultaneously, on one model? GPT-3 has plenty of capacity.
- nostalgebraist 8 Aug 2020 2:06 UTC
  3 points
  Parent
  People do this a lot with BERT, and it has its own problems—the first section of this recent paper gives a good overview.
  Then of course there is plenty of work trying to mitigate those problems, like that paper . . . but there are still various ways of doing so, with no clear consensus. So a more general statement of few-shot’s promise might be “you don’t have to worry about which fine-tuning setup you’re going to use, out of the many available alternatives, all of which have pitfalls.”
  - Veedrac 9 Aug 2020 0:27 UTC
    3 points
    Parent
    I think the results in that paper argue that it’s not really a big deal as long as you don’t make some basic errors like trying to fine-tune on tasks sequentially. MT-A outperforms Full in Table 1. GPT-3 is already a multi-task learner (as is BERT), so it would be very surprising if training on fewer tasks was too difficult for it.