Edited post to rename “intrinsically aligned AI” to “intrinsically kind AI” for clarity. As I understand it, the hope is to develop capability techniques and control techniques in parallel. But there’s no major plan I know of to have a process for developing capabilities that are hard-linked to control/kindness/whatever in a way you can’t easily remove. (I have heard an idea or two though and am planning on writing a post about it soon.)
Edited post to rename “intrinsically aligned AI” to “intrinsically kind AI” for clarity. As I understand it, the hope is to develop capability techniques and control techniques in parallel. But there’s no major plan I know of to have a process for developing capabilities that are hard-linked to control/kindness/whatever in a way you can’t easily remove. (I have heard an idea or two though and am planning on writing a post about it soon.)