What would cause the paperclip maximiser to care about the number of paperclips in some hypothetical unknown other reality, over the number of paperclips in whatever reality it actually finds itself in?
There is also the element of Pascal wager here—there is no particular reason to think that any choice in this reality would have any specific effect on the outer reality, so can as well ignore the possibility.
There are some humans who take the Simulation Hypothesis seriously, and care about what’s happening in (the presumed) basement reality. They generally don’t care much, and I’ve never heard of someone changing their life plans on that basis, but some people care a little, apparently. We can ponder why, and when we figure it out, we can transfer that understanding to thinking about AIs.
This is an important point. Containers (VMs, governance panels, other methods of limiting effect on “the world”) are very different from simulations (where the perception IS “the world”).
It’s very hard to imagine a training method or utility-function-generator which results in agents that care more about hypothetical outside-of-perceivable-reality than about the feedback loops which created them. You can imagine agents with this kind of utility function (care about the “outer” reality only, without actual knowledge or evidence that it exists or how many layers there are), but they’re probably hopelessly incoherent.
Conditional utility may be sane—“maximize paperclips in the outermost reality I can perceive and/or influence” is sensible, but doesn’t answer the question of how much to create paperclips now, vs creating paperclip-friendly conditions over a long time period vs looking to discover outer realities and influence them to prefer more paperclips.
What would cause the paperclip maximiser to care about the number of paperclips in some hypothetical unknown other reality, over the number of paperclips in whatever reality it actually finds itself in?
There is also the element of Pascal wager here—there is no particular reason to think that any choice in this reality would have any specific effect on the outer reality, so can as well ignore the possibility.
There are some humans who take the Simulation Hypothesis seriously, and care about what’s happening in (the presumed) basement reality. They generally don’t care much, and I’ve never heard of someone changing their life plans on that basis, but some people care a little, apparently. We can ponder why, and when we figure it out, we can transfer that understanding to thinking about AIs.
This is an important point. Containers (VMs, governance panels, other methods of limiting effect on “the world”) are very different from simulations (where the perception IS “the world”).
It’s very hard to imagine a training method or utility-function-generator which results in agents that care more about hypothetical outside-of-perceivable-reality than about the feedback loops which created them. You can imagine agents with this kind of utility function (care about the “outer” reality only, without actual knowledge or evidence that it exists or how many layers there are), but they’re probably hopelessly incoherent.
Conditional utility may be sane—“maximize paperclips in the outermost reality I can perceive and/or influence” is sensible, but doesn’t answer the question of how much to create paperclips now, vs creating paperclip-friendly conditions over a long time period vs looking to discover outer realities and influence them to prefer more paperclips.