Insub comments on Pretraining Language Models with Human Preferences

Insub 23 Feb 2023 5:31 UTC
6 points
1
I’m also morbidly curious what the model would do in <|bad|> mode.
I’m guessing that poison-pilling the <|bad|> sentences would have a negative effect on the <|good|> capabilities as well? I.e. It seems like the post is saying that the whole reason you need to include the <|bad|>s at all in the training dataset is that the model needs them in order to correctly generalize, even when predicting <|good|> sentences.
- Tomek Korbak 23 Feb 2023 18:17 UTC
  1 point
  0
  Parent
  
  I’m guessing that poison-pilling the <|bad|> sentences would have a negative effect on the <|good|> capabilities as well?
  
  That would be my guess too.