One argument for alignment difficulty is that corrigibility is “anti-natural” in a certain sense. I’ve tried to write out my understanding of this argument, and would be curious if anyone could add or improve anything about it.
I’d be equally interested in any attempts at succinctly stating other arguments for/against alignment difficulty.
One argument for alignment difficulty is that corrigibility is “anti-natural” in a certain sense. I’ve tried to write out my understanding of this argument, and would be curious if anyone could add or improve anything about it.
I’d be equally interested in any attempts at succinctly stating other arguments for/against alignment difficulty.