LawrenceC comments on Trying to disambiguate different questions about whether RLHF is “good”

LawrenceC 15 Dec 2022 16:25 UTC
LW: 2 AF: 1
0
AF
This doesn’t seem to be what Gao et al found: Figure 9 shows that the KL between RL and initial policy, at a given proxy reward score, still is significantly larger than the equivalent KL for a BoN-policy, as shown in Figure 1.