This is an important point. Containers (VMs, governance panels, other methods of limiting effect on “the world”) are very different from simulations (where the perception IS “the world”).
It’s very hard to imagine a training method or utility-function-generator which results in agents that care more about hypothetical outside-of-perceivable-reality than about the feedback loops which created them. You can imagine agents with this kind of utility function (care about the “outer” reality only, without actual knowledge or evidence that it exists or how many layers there are), but they’re probably hopelessly incoherent.
Conditional utility may be sane—“maximize paperclips in the outermost reality I can perceive and/or influence” is sensible, but doesn’t answer the question of how much to create paperclips now, vs creating paperclip-friendly conditions over a long time period vs looking to discover outer realities and influence them to prefer more paperclips.
This is an important point. Containers (VMs, governance panels, other methods of limiting effect on “the world”) are very different from simulations (where the perception IS “the world”).
It’s very hard to imagine a training method or utility-function-generator which results in agents that care more about hypothetical outside-of-perceivable-reality than about the feedback loops which created them. You can imagine agents with this kind of utility function (care about the “outer” reality only, without actual knowledge or evidence that it exists or how many layers there are), but they’re probably hopelessly incoherent.
Conditional utility may be sane—“maximize paperclips in the outermost reality I can perceive and/or influence” is sensible, but doesn’t answer the question of how much to create paperclips now, vs creating paperclip-friendly conditions over a long time period vs looking to discover outer realities and influence them to prefer more paperclips.