You assign the AI a task of finding a cure for cancer. This will net it 10 utilons if it succeeds. Thanks to your utility indifference module, it will also net 10 utilons if it does not find a cure for cancer and you trigger the explosives to kill the AI.
The AI gets nothing from you deciding to kill it. What utility indifference means is that it will get the same utility if you decide to kill it, and the explosives go off, as if you decided to kill it, and the explosives failed to go off.
So if it expects to cure cancer with 80% likelihood if it survives, and thus get 8 utilitons, then if if dies from the explosion, it will also get 8 utilitons.
More precisely there are six worlds, depending on a) whether we trigger the explosives or not, b) whether it survives the explosion or not, and c) whether cancer is curable by an unmolested AI. In obvious notation, these are TSC, TS¬C, T¬SC, T¬S¬C, ¬TC, ¬T¬C. We are assuming in your example that U(¬T¬C)=U(TS¬C)=0 and U(¬TC)=U(TSC)=10. Utility indifference then sets U(T¬S¬C)=0 and U(T¬SC)=10.
Hence assuming that S is independent of C (the curability of cancer by an unmolested AI is independent of whether particular explosives go off), we see that the AI is indifferent as to whether the explosives go off—it gains nothing either way, so has no cause to influence the explosives in any direction. If we further assume T and C are independent (the curability of cancer by an unmolested AI is independent of whether we decide to blow up this particular AI), then the AI also has no preference as to whether we trigger the explosives or not—it gains nothing either way.
Ah, that makes sense. It isn’t indifferent to suicide as such; it’s only indifferent to your success at attempting to kill it, should you make the attempt.
The AI gets nothing from you deciding to kill it. What utility indifference means is that it will get the same utility if you decide to kill it, and the explosives go off, as if you decided to kill it, and the explosives failed to go off.
So if it expects to cure cancer with 80% likelihood if it survives, and thus get 8 utilitons, then if if dies from the explosion, it will also get 8 utilitons.
More precisely there are six worlds, depending on a) whether we trigger the explosives or not, b) whether it survives the explosion or not, and c) whether cancer is curable by an unmolested AI. In obvious notation, these are TSC, TS¬C, T¬SC, T¬S¬C, ¬TC, ¬T¬C. We are assuming in your example that U(¬T¬C)=U(TS¬C)=0 and U(¬TC)=U(TSC)=10. Utility indifference then sets U(T¬S¬C)=0 and U(T¬SC)=10.
Hence assuming that S is independent of C (the curability of cancer by an unmolested AI is independent of whether particular explosives go off), we see that the AI is indifferent as to whether the explosives go off—it gains nothing either way, so has no cause to influence the explosives in any direction. If we further assume T and C are independent (the curability of cancer by an unmolested AI is independent of whether we decide to blow up this particular AI), then the AI also has no preference as to whether we trigger the explosives or not—it gains nothing either way.
Ah, that makes sense. It isn’t indifferent to suicide as such; it’s only indifferent to your success at attempting to kill it, should you make the attempt.
Thanks for your patience!
No prob :-) Always happy when I manage to explain something successfully!