I don’t think this is the plan? The hope is that, as capabilities grow, so does alignment, whatever this “alignment” thing is. The reality is different, of course.
Edited post to rename “intrinsically aligned AI” to “intrinsically kind AI” for clarity. As I understand it, the hope is to develop capability techniques and control techniques in parallel. But there’s no major plan I know of to have a process for developing capabilities that are hard-linked to control/kindness/whatever in a way you can’t easily remove. (I have heard an idea or two though and am planning on writing a post about it soon.)
I don’t think this is the plan? The hope is that, as capabilities grow, so does alignment, whatever this “alignment” thing is. The reality is different, of course.
Edited post to rename “intrinsically aligned AI” to “intrinsically kind AI” for clarity. As I understand it, the hope is to develop capability techniques and control techniques in parallel. But there’s no major plan I know of to have a process for developing capabilities that are hard-linked to control/kindness/whatever in a way you can’t easily remove. (I have heard an idea or two though and am planning on writing a post about it soon.)