To solve this problem you would need a very large dataset of mistakes made by LLMs, and their true continuations. [...] This dataset is unlikely to ever exist, given that its size would need to be many times bigger than the entire internet.
I had assumed that creating on that dataset was a major reason for doing a public release of ChatGPT. “Was this a good response?” [thumb-up] / [thumb-down] → dataset → more RLHF. Right?
RLHF is done after the pre-training process. I believe this is referring to including examples like this in the pre-training process itself.
Though in broad strokes, I agree with you. It’s not inconceivable to me that they’ll turn/are turning their ChatGPT data into its own training data for future models using this concept of corrected mistakes.
I had assumed that creating on that dataset was a major reason for doing a public release of ChatGPT. “Was this a good response?” [thumb-up] / [thumb-down] → dataset → more RLHF. Right?
RLHF is done after the pre-training process. I believe this is referring to including examples like this in the pre-training process itself.
Though in broad strokes, I agree with you. It’s not inconceivable to me that they’ll turn/are turning their ChatGPT data into its own training data for future models using this concept of corrected mistakes.