I think the post Deceptive Alignment is <1% Likely by Default attempts to argue that deceptive alignment is very unlikely given the training setup that Ajeya lays out.
I think the post Deceptive Alignment is <1% Likely by Default attempts to argue that deceptive alignment is very unlikely given the training setup that Ajeya lays out.