DavidW comments on Deceptive Alignment is <1% Likely by Default

DavidW 24 Apr 2023 14:29 UTC
5 points
10
I have a whole section on the key assumptions about the training process and why I expect them to be the default. It’s all in line with what’s already happening, and the labs don’t have to do anything special to prevent deceptive alignment. Did I miss anything important in that section?