Why would training GPT-3 on its own output improve it at all?
Self-distillation is a thing, even outside a DRL setting. (“Best-of” sampling is similar to self-distillation in being a way to get better output out of GPT-3 using just GPT-3.) There’s also an issue of “sampling can prove the presence of knowledge but not the absence” in terms of unlocking abilities that you haven’t prompted—in a very timely paper yesterday, OA demonstrates that GPT-3 models have much better translation abilities than anyone realized, and you can train on its own output to improve its zero-shot translation to English & make that power accessible: “Unsupervised Neural Machine Translation with Generative Language Models Only”, Han et al 2021.
Self-distillation is a thing, even outside a DRL setting. (“Best-of” sampling is similar to self-distillation in being a way to get better output out of GPT-3 using just GPT-3.) There’s also an issue of “sampling can prove the presence of knowledge but not the absence” in terms of unlocking abilities that you haven’t prompted—in a very timely paper yesterday, OA demonstrates that GPT-3 models have much better translation abilities than anyone realized, and you can train on its own output to improve its zero-shot translation to English & make that power accessible: “Unsupervised Neural Machine Translation with Generative Language Models Only”, Han et al 2021.