Hmm, after some thought I ’m getting more pessimistic about finding a workable, broadly-effective definition of “different” with less complexity than “don’t do things this model of a human doesn’t want you to,” though I’d still put it above 20%.
Example: maybe using the AI’s predictive power “against paperclipping” would work. We want the AI to be free to change the inside of the box, but not want to exploit the outside for extra utility. So we use some magical method to sew together the inside of the box with a world that doesn’t communicate with the box, and have the utility U’ = 2*E(U in patchwork world) - E(U in real world), thus actually making having higher E(U) in the real world be bad. Does this work? No, the AI drops a meteor on its head in the real world to get the full points of almost 2.
Hmm, after some thought I ’m getting more pessimistic about finding a workable, broadly-effective definition of “different” with less complexity than “don’t do things this model of a human doesn’t want you to,” though I’d still put it above 20%.
Example: maybe using the AI’s predictive power “against paperclipping” would work. We want the AI to be free to change the inside of the box, but not want to exploit the outside for extra utility. So we use some magical method to sew together the inside of the box with a world that doesn’t communicate with the box, and have the utility U’ = 2*E(U in patchwork world) - E(U in real world), thus actually making having higher E(U) in the real world be bad. Does this work? No, the AI drops a meteor on its head in the real world to get the full points of almost 2.