Daniel Kokotajlo comments on Compendium of problems with RLHF

Daniel Kokotajlo 30 Jan 2023 0:04 UTC
6 points
3
It’s gonna take me a while to digest this post, but in the meantime, thank you! This is the sort of content I love to see. (ETA: I strong-upvoted this post)
- Daniel Kokotajlo 31 Jan 2023 19:38 UTC
  4 points
  2
  Parent
  My updated thoughts are: Still a great post, not as polished as it should be though. That’s OK. The important thing is that it compiles a big list of problems and alleged problems for RLHF, with links.
  - Charbel-Raphaël 31 Jul 2023 22:07 UTC
    2 points
    0
    Parent
    Here is the polished version from our team led by Stephen Casper and Xander Davies: Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback :)