Start with bad data, train a bad model. It’s bad but it’s still good enough to rank your training data. Now you have better training data. Train a better model. The architecture is different of course, but is there anything analogous?
Yes, it’s my understanding that OpenAI did this for GPT-4. It’s discussed in the system card PDF. They used early versions of GPT-4 to generate synthetic test data and also as an evaluator of GPT-4 responses.
Are people doing anything in LLMs like the classic StyleGAN training data bootstrapping pattern?
Start with bad data, train a bad model. It’s bad but it’s still good enough to rank your training data. Now you have better training data. Train a better model. The architecture is different of course, but is there anything analogous?
Yes, it’s my understanding that OpenAI did this for GPT-4. It’s discussed in the system card PDF. They used early versions of GPT-4 to generate synthetic test data and also as an evaluator of GPT-4 responses.