I agree! I contributed to and endorse this Corrigibility plan by Max Harms (MIRI researcher): Corrigibility as Singular Target
(See also posts by Seth Herd)
I think CAST offers much better safety under higher capabilities and more agentic workflows.
I agree! I contributed to and endorse this Corrigibility plan by Max Harms (MIRI researcher): Corrigibility as Singular Target
(See also posts by Seth Herd)
I think CAST offers much better safety under higher capabilities and more agentic workflows.