Couldn’t it be beneficial to rewrite its utility function in a few circumstances? I’m thinking of Eliezer’s decision theory ideas here. Imagine the utility function was to maximise human happiness but another agent (AI or human) refused to cooperate unless it changed its utility function to maximising human happiness while maintaining democracy, for instance. If cooperation would be necessary for happiness maximisation, it might be willing to edit the utility function to something more likely to achieve the ends of its current utility function...
Couldn’t it be beneficial to rewrite its utility function in a few circumstances? I’m thinking of Eliezer’s decision theory ideas here. Imagine the utility function was to maximise human happiness but another agent (AI or human) refused to cooperate unless it changed its utility function to maximising human happiness while maintaining democracy, for instance. If cooperation would be necessary for happiness maximisation, it might be willing to edit the utility function to something more likely to achieve the ends of its current utility function...