Somebody should probably write a post explaining why RL from human feedback is actively harmful to avoiding AI doom. It’s one thing when OpenAI does it, but when Anthropic thinks it’s a good idea, clearly something has failed to be explained.
(I personally do not expect to get around to writing such a post soon, because I expect discussion around the post would take a fair bit of time and attention, and I am busy with other things for the next few weeks.)
Somebody should probably write a post explaining why RL from human feedback is actively harmful to avoiding AI doom. It’s one thing when OpenAI does it, but when Anthropic thinks it’s a good idea, clearly something has failed to be explained.
(I personally do not expect to get around to writing such a post soon, because I expect discussion around the post would take a fair bit of time and attention, and I am busy with other things for the next few weeks.)
I’d also be interested in someone doing this; I tend towards seeing it as good, but haven’t seen a compilation of arguments for and against.