I’ve responded to the original post A List of Lethalities here, and it’s about why synthetic data is awesome for alignment, why alignment generalizes further than capabilities, which is the opposite of Nate Soares’s model, why you can’t update from the null string like Eliezer did, and much, much more:
I’ve responded to the original post A List of Lethalities here, and it’s about why synthetic data is awesome for alignment, why alignment generalizes further than capabilities, which is the opposite of Nate Soares’s model, why you can’t update from the null string like Eliezer did, and much, much more:
https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/?commentId=Gcigdmuje4EacwirD