hazel comments on GPTs are Predictors, not Imitators

hazel 9 Apr 2023 7:14 UTC
2 points
0
To solve this problem you would need a very large dataset of mistakes made by LLMs, and their true continuations. [...] This dataset is unlikely to ever exist, given that its size would need to be many times bigger than the entire internet.
I had assumed that creating on that dataset was a major reason for doing a public release of ChatGPT. “Was this a good response?” [thumb-up] / [thumb-down] → dataset → more RLHF. Right?
- awg 9 Apr 2023 15:38 UTC
  1 point
  0
  Parent
  RLHF is done after the pre-training process. I believe this is referring to including examples like this in the pre-training process itself.
  Though in broad strokes, I agree with you. It’s not inconceivable to me that they’ll turn/are turning their ChatGPT data into its own training data for future models using this concept of corrected mistakes.