habryka comments on Thoughts on the impact of RLHF research

habryka 20 Feb 2023 22:58 UTC
LW: 4 AF: 2
0
AF
Yep, I think it’s pretty plausible this is just a data-quality issue, though I find myself somewhat skeptical of this. Maybe worth a bet?
I would be happy to bet that conditional on them trying to solve this with more supervised training and no RLHF, we are going to see error modes substantially more catastrophic than current Chat-GPT.