Language models seem to do a pretty good job at judging text “quality” in a way that agrees with humans. And of course, they’re good at generating new text. Could it be useful for a model to generate a bunch of output, filter it for quality by its own judgment, and then continue training on its own output? If so, would it be possible to “bootstrap” arbitrary amounts of extra training data?
It might be even better to just augment the data with quality judgements instead of only keeping the high-quality samples. This way, quality can have the form of a natural language description instead of a one-dimensional in-built thing, and you can later prime the model for an appropriate sense/dimension/direction of quality, as a kind of objective, without retraining.
Language models seem to do a pretty good job at judging text “quality” in a way that agrees with humans. And of course, they’re good at generating new text. Could it be useful for a model to generate a bunch of output, filter it for quality by its own judgment, and then continue training on its own output? If so, would it be possible to “bootstrap” arbitrary amounts of extra training data?
It might be even better to just augment the data with quality judgements instead of only keeping the high-quality samples. This way, quality can have the form of a natural language description instead of a one-dimensional in-built thing, and you can later prime the model for an appropriate sense/dimension/direction of quality, as a kind of objective, without retraining.