Charlie Steiner comments on Scaling Laws for Reward Model Overoptimization

Charlie Steiner 21 Oct 2022 6:26 UTC
LW: 3 AF: 1
0
AF
Did you notice any qualitative trends in responses as you optimized harder for the models of the gold RM? Like, anything aside from just “sounding kind of like instruct-GPT”?
- leogao 26 Oct 2022 1:48 UTC
  LW: 2 AF: 1
  0
  AF Parent
  There’s an example in the appendix but we didn’t do a lot of qualitative analysis.