johnswentworth comments on Common misconceptions about OpenAI

johnswentworth 30 Aug 2022 4:58 UTC
20 points
8
The problem isn’t just learning whole human models. RLHF will select for any heuristic/strategy which, even by accident, hides bad behavior from humans. It applies even at low capabilities.