Veedrac comments on is gpt-3 few-shot ready for real applications?

Veedrac 9 Aug 2020 0:27 UTC
3 points
I think the results in that paper argue that it’s not really a big deal as long as you don’t make some basic errors like trying to fine-tune on tasks sequentially. MT-A outperforms Full in Table 1. GPT-3 is already a multi-task learner (as is BERT), so it would be very surprising if training on fewer tasks was too difficult for it.