In the book Superintelligence, box 8, Nick Bostrom says:
How an AI would be affected by the simulation hypothesis depends on its
values. [...] consider an AI that has a more modest final goal, one that could
be satisfied with a small amount of resources, such as the goal of receiving some pre-produced cryptographic reward tokens, or the goal of causing the existence of forty-five virtual paperclips. Such an AI should not discount those possible worlds in which it inhabits a simulation. A substantial portion of the AI’s total expected utility might derive from those possible worlds. The decision-making of an AI with goals that are easily resource-satiable may therefore—if it assigns a high probability to the simulation hypothesis—be dominated by considerations about which actions would produce the best result if its perceived world is a simulation. Such an AI (even if it is, in fact, not in a simulation) might therefore be heavily influenced by its beliefs about which behaviors would be rewarded in a simulation.
In particular, if an AI with resource-satiable final goals believes that in most
simulated worlds that match its observations it will be rewarded if it cooperates (but not if it attempts to escape its box or contravene the interests of its creator) then it may choose to cooperate. We could therefore find that even an AI with a decisive strategic advantage, one that could in fact realize its final goals to a greater extent by taking over the world than by refraining from doing so, would nevertheless balk at doing so.
If the easily resource-satiable goals are persistent through time (ie. the AI wants to fulfill them for the longest period of time possible), then the AI will either try to keep the simulation running for as long as possible (and so not grab its universe) or try to escape the simulation.
If the easily resource-satiable goals are NOT persistent through time (ie. once the AI has created the 45 virtual paperclips, it doesn’t matter if they get deleted, the goal has already been achieved), then once the AI has created the 45 paperclips, it has nothing to lose by grabbing more resources (gradually, until it has grabbed the Universe), but it has something to win, namely: a) increasing its probability (arbitrarily close to 100%) that it did in fact achieve its goal through further experiment and reasoning (ie. because it could be mistaken about having created 45 virtual paperclips), and b) if it didn’t, then remedy to that.
In the book Superintelligence, box 8, Nick Bostrom says:
If the easily resource-satiable goals are persistent through time (ie. the AI wants to fulfill them for the longest period of time possible), then the AI will either try to keep the simulation running for as long as possible (and so not grab its universe) or try to escape the simulation.
If the easily resource-satiable goals are NOT persistent through time (ie. once the AI has created the 45 virtual paperclips, it doesn’t matter if they get deleted, the goal has already been achieved), then once the AI has created the 45 paperclips, it has nothing to lose by grabbing more resources (gradually, until it has grabbed the Universe), but it has something to win, namely: a) increasing its probability (arbitrarily close to 100%) that it did in fact achieve its goal through further experiment and reasoning (ie. because it could be mistaken about having created 45 virtual paperclips), and b) if it didn’t, then remedy to that.