My immediate reaction is: you should definitely update as far as you can and do this investigation! But no matter how much you investigate the learned preferences, you should still deploy your AI with some residual uncertainty because it’s unlikely you can update it “all the way”. Two reasons why this might be
Some of the data you will need to update all the way will require the superintelligent agent’s help to collect—e.g. collecting human preferences about the specifics of far future interstellar colonization seems impossible right now because we don’t know what is technologically feasible.
You might decide that the human preferences we really care about are the outcomes of some very long-running process like the Long Reflection; then you can’t investigate the learned preferences ahead of time, but in the meantime still want to create superintelligences that safeguard the Long Reflection until it completes.
My immediate reaction is: you should definitely update as far as you can and do this investigation! But no matter how much you investigate the learned preferences, you should still deploy your AI with some residual uncertainty because it’s unlikely you can update it “all the way”. Two reasons why this might be
Some of the data you will need to update all the way will require the superintelligent agent’s help to collect—e.g. collecting human preferences about the specifics of far future interstellar colonization seems impossible right now because we don’t know what is technologically feasible.
You might decide that the human preferences we really care about are the outcomes of some very long-running process like the Long Reflection; then you can’t investigate the learned preferences ahead of time, but in the meantime still want to create superintelligences that safeguard the Long Reflection until it completes.