RGRGRG comments on Thoughts on sharing information about language model capabilities

RGRGRG 1 Aug 2023 23:10 UTC
1 point
0
As one specific example—has RLHF, which the below post suggests was potentially was initially intended for safety, been a net negative for AI safety?
https://www.alignmentforum.org/posts/LqRD7sNcpkA9cmXLv/open-problems-and-fundamental-limitations-of-rlhf