Noosphere89 comments on On A List of Lethalities

Noosphere89 14 Sep 2024 18:52 UTC
4 points
0
I’ve responded to the original post A List of Lethalities here, and it’s about why synthetic data is awesome for alignment, why alignment generalizes further than capabilities, which is the opposite of Nate Soares’s model, why you can’t update from the null string like Eliezer did, and much, much more:
https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/?commentId=Gcigdmuje4EacwirD