What is the point of trying to figure out what your friendly AI will choose in each standard difficult moral choice situation, if in each case the answer will be “how dare you disagree with it since it is so much smarter and more moral than you?” If the point is that your design of this AI will depend on how well various proposed designs agree with your moral intuitions in specific cases, well then the rest of us have great cause to be concerned about how much we trust your specific intuitions.
James is right; you only need one moment of “weakness” to approve a protection against all future moments of weakness, so it is not clear there is an asymmetric problem here.
What is the point of trying to figure out what your friendly AI will choose in each standard difficult moral choice situation, if in each case the answer will be “how dare you disagree with it since it is so much smarter and more moral than you?” If the point is that your design of this AI will depend on how well various proposed designs agree with your moral intuitions in specific cases, well then the rest of us have great cause to be concerned about how much we trust your specific intuitions.
James is right; you only need one moment of “weakness” to approve a protection against all future moments of weakness, so it is not clear there is an asymmetric problem here.