I don’t think so. That’s one breaking point for alignment, but I’m saying in that post that even if we avoid a sharp left turn and make it to an aligned, superintelligent AGI, that its alignment may drift away from human values as it continues to learn. Learning may necessarily shift the meanings of existing concepts, including values.
Maybe the alignment stability problem is the same thing as the sharp left turn?
I don’t think so. That’s one breaking point for alignment, but I’m saying in that post that even if we avoid a sharp left turn and make it to an aligned, superintelligent AGI, that its alignment may drift away from human values as it continues to learn. Learning may necessarily shift the meanings of existing concepts, including values.