What if I say “maximize x for just a little while, then talk to me for further instructions”? A human can understand that without difficulty, so for a superintelligent AI it should be easy right?
I think it depends on how you mean “a little while”, but it’s quite possible the world would now contain safeguards against further changes, or simply no longer contain you (or a version of “you” that shares your goals.)
(Also, millennia of subjective torture (or whatever) might be a high price for the experiment, even if it got reset.)
What if I say “maximize x for just a little while, then talk to me for further instructions”? A human can understand that without difficulty, so for a superintelligent AI it should be easy right?
I think it depends on how you mean “a little while”, but it’s quite possible the world would now contain safeguards against further changes, or simply no longer contain you (or a version of “you” that shares your goals.)
(Also, millennia of subjective torture (or whatever) might be a high price for the experiment, even if it got reset.)