I have a whole section on the key assumptions about the training process and why I expect them to be the default. It’s all in line with what’s already happening, and the labs don’t have to do anything special to prevent deceptive alignment. Did I miss anything important in that section?
I have a whole section on the key assumptions about the training process and why I expect them to be the default. It’s all in line with what’s already happening, and the labs don’t have to do anything special to prevent deceptive alignment. Did I miss anything important in that section?