johnswentworth comments on johnswentworth’s Shortform

johnswentworth 13 Apr 2022 4:58 UTC
26 points
0
Somebody should probably write a post explaining why RL from human feedback is actively harmful to avoiding AI doom. It’s one thing when OpenAI does it, but when Anthropic thinks it’s a good idea, clearly something has failed to be explained.
(I personally do not expect to get around to writing such a post soon, because I expect discussion around the post would take a fair bit of time and attention, and I am busy with other things for the next few weeks.)
What links here?
- RLHF by Ansh Radhakrishnan (12 May 2022 21:18 UTC; 18 points)
- 1a3orn 15 Apr 2022 0:11 UTC
  8 points
  0
  Parent
  I’d also be interested in someone doing this; I tend towards seeing it as good, but haven’t seen a compilation of arguments for and against.
- [ ]
  [deleted]