Thanks. I agree with your first four bulletpoints. I disagree that the post is quibbling. Weak man or not, the-coherence-argument-as-I-stated-it was prominent on LW for a long time. And figuring out the truth here matters. If the coherence argument doesn’t work, we can (try to) use incomplete preferences to keep agents shutdownable. As I write elsewhere:
The List of Lethalities mention of ‘Corrigibility is anti-natural to consequentialist reasoning’ points to Corrigibility (2015) and notes that MIRI failed to find a formula for a shutdownable agent. MIRI failed because they only considered agents with complete preferences. Useful agents with complete (and transitive and option-set-independent) preferences will often have some preference regarding the pressing of the shutdown button, as this theorem shows. MIRI thought that they had to assume completeness, because of coherence arguments. But coherence arguments aremistaken: there are no theorems which imply that agents must have complete preferences in order to avoid pursuing dominated strategies. So we can relax the assumption of completeness and use this extra leeway to find a formula for a corrigible consequentialist. That formula is what I purport to give in this post.
I feel that coherence arguments, broadly construed, are a reason to be skeptical of such proposals, but debating coherence arguments because of this seems backward. Instead, we should just be discussing your proposal directly. Since I haven’t read your proposal yet, I don’t have an opinion, but some coherence-inspired question I would be asking are:
Can you define an incomplete-preferences AIXI consistent with this proposal?
Is there an incomplete-preferences version of RL regret bound theory consistent with this proposal?
What happens when your agent is constructing a new agent? Does the new agent inherit the same incomplete preferences?
Thanks. I agree with your first four bulletpoints. I disagree that the post is quibbling. Weak man or not, the-coherence-argument-as-I-stated-it was prominent on LW for a long time. And figuring out the truth here matters. If the coherence argument doesn’t work, we can (try to) use incomplete preferences to keep agents shutdownable. As I write elsewhere:
I feel that coherence arguments, broadly construed, are a reason to be skeptical of such proposals, but debating coherence arguments because of this seems backward. Instead, we should just be discussing your proposal directly. Since I haven’t read your proposal yet, I don’t have an opinion, but some coherence-inspired question I would be asking are:
Can you define an incomplete-preferences AIXI consistent with this proposal?
Is there an incomplete-preferences version of RL regret bound theory consistent with this proposal?
What happens when your agent is constructing a new agent? Does the new agent inherit the same incomplete preferences?