RLHF is done after the pre-training process. I believe this is referring to including examples like this in the pre-training process itself.
Though in broad strokes, I agree with you. It’s not inconceivable to me that they’ll turn/are turning their ChatGPT data into its own training data for future models using this concept of corrected mistakes.
RLHF is done after the pre-training process. I believe this is referring to including examples like this in the pre-training process itself.
Though in broad strokes, I agree with you. It’s not inconceivable to me that they’ll turn/are turning their ChatGPT data into its own training data for future models using this concept of corrected mistakes.