Strong upvote for tackling alignment from a more deontological framework than is typically taken in AI safety. I think there’s a lot of value to be found along such line of thought. However, building LFAI seems like a strictly harder problem then building an off switch corrigible AI. Off switch corrigibility can be formulated as a “law” after all.
As I understand things, MIRI’s results suggest that corrigibility is contrary to the consequentialist cognition needed to effectively accomplishing goals in the real world. Do you view that as an issue for LFAI? If not, why not?
Strong upvote for tackling alignment from a more deontological framework than is typically taken in AI safety. I think there’s a lot of value to be found along such line of thought. However, building LFAI seems like a strictly harder problem then building an off switch corrigible AI. Off switch corrigibility can be formulated as a “law” after all.
As I understand things, MIRI’s results suggest that corrigibility is contrary to the consequentialist cognition needed to effectively accomplishing goals in the real world. Do you view that as an issue for LFAI? If not, why not?