In your pattern philosophy of identity, what counts as a pattern? In particular, a simulation or our world (of the kind we are likely to run) doesn’t contain all the information needed to map it to our (simulating) world. Some of the information that describes this mapping resides in the brains of those who look at and interpret the simulation.
It’s not obvious to me that there couldn’t be equally valid mappings from the same simulation to different worlds, and perhaps in such a different world is a copy of you being tortured. Or perhaps there is a mapping of our own world to itself that would produce such a thing.
Is there some sort of result that says this is very improbable given sufficiently complex patterns, or something of the kind, that you rely on?
Know in what sense? If you’re asking for a formal proof, of course there isn’t one because Kolmogorov complexity is incomputable. But if you take a radically skeptical position about that, you have no basis for using induction at all, which in turn means you have no basis for believing you know anything whatsoever; Solomonoff’s lightsaber is the only logical justification anyone has ever come up with for using experience as a guide instead of just acting entirely at random.
I’m not arguing with Solomonoff as a means for learning and understanding the world. But when we’re talking about patterns representing selves, the issue isn’t just to identify the patterns represented and the complexity of their interpretation, but also to assign utility to these patterns.
Suppose that I’m choosing whether to run a new simulation. It will have a simple (‘default’) interpretation, which I have identified, and which has positive utility to me. It also has alternative interpretations, whose decoder complexities are much higher (but still lower than the complexity of specifying the simulation itself). It would be computationally intractable for me to identify all of them. These alternatives may well have highly negative utility to me.
To choose whether the run the simulation, I need to sum the utilities of these alternatives. More complex interpretations will carry lower weight. But what is the guarantee that my utility function is built in such a way that the total utility will still be positive?
I’m guessing this particular question has probably been answered in the context of analyzing behavior of utility functions. I haven’t read all of that material, and a specific pointer would be helpful.
The reason this whole discussion arises is that we’re talking about running simulations that can’t be interacted with. You say that you assign utility to the mere existence of patterns, even non-interacting. A simpler utility function specified only in terms of affecting our single physical world would not have that difficulty.
ETA: as Nisan helped me understand in comments below, I myself in practical situations do accept the ‘default’ interpretation of a simulation. I still think non-human agents could behave differently.
These are interesting questions. They might also apply to a utility function that only cares about things affecting our physical world.
If there were a person in a machine, isolated from the rest of the world and suffering, would we try to rescue it, or would we be satisfied with ensuring that the person never interacts with the real world?
I understood the original stipulation that the simulation doesn’t interact with our world to mean that we can’t affect it to rescue the suffering person.
Let’s consider your alternative scenario: the person in the simulation can’t affect our universe usefully (the simulating machine is well-wrapped and looks like a uniform black body from the outside), and we can’t observe it directly, but we know there’s a suffering person inside and we can choose to break in and modify (or stop) the simulation.
In this situation I would indeed choose to intervene to stop the suffering. Your question is a very good one. Why do I choose here to accept the ‘default’ interpretation which says that inside the simulation is a suffering person?
The simple answer is that I’m human, and I don’t have an explicit or implicit-and-consistent utility function anyway. If people around me tell me there’s a suffering person inside the simulation, I’d be inclined to accept this view.
How much effort or money would I be willing to spend to help that suffering simulated person? Probably zero or near zero. There are many real people alive today who are suffering and I’ve never done anything to explicitly help anyone anonymously.
In my previous comments I was thinking about utility functions in general—what is possible, self-consistent, and optimizes something—rather than human utility functions or my own. As far as I personally am concerned, I do indeed accept the ‘default’ interpretation of a simulation (when forced to make a judgement) because it’s easiest to operate that way and my main goal (in adjusting my utility function) is to achieve my supergoals smoothly, rather than to achieve some objectively correct super-theory of morals. Thanks for helping me see that.
In Solomonoff induction, the weight of a program is the inverse of the exponential of its length. (I have an argument that says this doesn’t need to be assumed a priori, it can be derived, though I don’t have a formal proof of this.) Given that, it’s easy to see that the total weight of all the weird interpretations is negligible compared to that of the normal interpretation.
It’s true that some things become easier when you try to restrict your attention to “our single physical world”, but other things become less easy. Anyway, that’s a metaphysical question, so let’s leave it aside; in which case, to be consistent, we should also forget about the notion of simulations and look at an at least potentially physical scenario.
Suppose the copy took the form of a physical duplicate of our solar system, with the non-interaction requirement met by flinging same over the cosmic event horizon. Now do you think it makes sense to assign this a positive utility?
Given that, it’s easy to see that the total weight of all the weird interpretations is negligible compared to that of the normal interpretation.
I don’t see why. My utility function could also assign a negative utility to (some, not necessarily all) ‘weird’ interpretations whose magnitude would scale exponentially with the bit-lengths of the interpretations.
Is there a proof that this is inconsistent? if I understand correctly, you’re saying that any utility function that assigns very large-magnitude negative utility to alternate interpretations of patterns in simulations, is directly incompatible with Solomonoff induction. That’s a pretty strong claim.
Suppose the copy took the form of a physical duplicate of our solar system, with the non-interaction requirement met by flinging same over the cosmic event horizon. Now do you think it makes sense to assign this a positive utility?
I don’t assign positive utility to it myself. Not above the level of “it might be a neat thing to do”. But I find your utility function much more understandable (as well as more similar to that of many other people) when you say you’d like to create physical clone worlds. It’s quite different from assigning utility to simulated patterns requiring certain interpretations.
Well, not exactly; I’m saying Solomonoff induction has implications for what degree of reality (weight, subjective probability, magnitude, measure, etc.) we should assign certain worlds (interpretations, patterns, universes, possibilities, etc.).
Utility is a different matter. You are perfectly free to have a utility function that assigns Ackermann(4,4) units of disutility to each penguin that exists in a particular universe, whereupon the absence of penguins will presumably outweigh all other desiderata. I might feel this utility function is unreasonable, but I can’t claim it to be inconsistent.
In your pattern philosophy of identity, what counts as a pattern? In particular, a simulation or our world (of the kind we are likely to run) doesn’t contain all the information needed to map it to our (simulating) world. Some of the information that describes this mapping resides in the brains of those who look at and interpret the simulation.
It’s not obvious to me that there couldn’t be equally valid mappings from the same simulation to different worlds, and perhaps in such a different world is a copy of you being tortured. Or perhaps there is a mapping of our own world to itself that would produce such a thing.
Is there some sort of result that says this is very improbable given sufficiently complex patterns, or something of the kind, that you rely on?
Yes, Solomonoff’s Lightsaber: the usual interpretations need much shorter decoder programs.
Why? How do we know this?
Know in what sense? If you’re asking for a formal proof, of course there isn’t one because Kolmogorov complexity is incomputable. But if you take a radically skeptical position about that, you have no basis for using induction at all, which in turn means you have no basis for believing you know anything whatsoever; Solomonoff’s lightsaber is the only logical justification anyone has ever come up with for using experience as a guide instead of just acting entirely at random.
I’m not arguing with Solomonoff as a means for learning and understanding the world. But when we’re talking about patterns representing selves, the issue isn’t just to identify the patterns represented and the complexity of their interpretation, but also to assign utility to these patterns.
Suppose that I’m choosing whether to run a new simulation. It will have a simple (‘default’) interpretation, which I have identified, and which has positive utility to me. It also has alternative interpretations, whose decoder complexities are much higher (but still lower than the complexity of specifying the simulation itself). It would be computationally intractable for me to identify all of them. These alternatives may well have highly negative utility to me.
To choose whether the run the simulation, I need to sum the utilities of these alternatives. More complex interpretations will carry lower weight. But what is the guarantee that my utility function is built in such a way that the total utility will still be positive?
I’m guessing this particular question has probably been answered in the context of analyzing behavior of utility functions. I haven’t read all of that material, and a specific pointer would be helpful.
The reason this whole discussion arises is that we’re talking about running simulations that can’t be interacted with. You say that you assign utility to the mere existence of patterns, even non-interacting. A simpler utility function specified only in terms of affecting our single physical world would not have that difficulty.
ETA: as Nisan helped me understand in comments below, I myself in practical situations do accept the ‘default’ interpretation of a simulation. I still think non-human agents could behave differently.
These are interesting questions. They might also apply to a utility function that only cares about things affecting our physical world.
If there were a person in a machine, isolated from the rest of the world and suffering, would we try to rescue it, or would we be satisfied with ensuring that the person never interacts with the real world?
I understood the original stipulation that the simulation doesn’t interact with our world to mean that we can’t affect it to rescue the suffering person.
Let’s consider your alternative scenario: the person in the simulation can’t affect our universe usefully (the simulating machine is well-wrapped and looks like a uniform black body from the outside), and we can’t observe it directly, but we know there’s a suffering person inside and we can choose to break in and modify (or stop) the simulation.
In this situation I would indeed choose to intervene to stop the suffering. Your question is a very good one. Why do I choose here to accept the ‘default’ interpretation which says that inside the simulation is a suffering person?
The simple answer is that I’m human, and I don’t have an explicit or implicit-and-consistent utility function anyway. If people around me tell me there’s a suffering person inside the simulation, I’d be inclined to accept this view.
How much effort or money would I be willing to spend to help that suffering simulated person? Probably zero or near zero. There are many real people alive today who are suffering and I’ve never done anything to explicitly help anyone anonymously.
In my previous comments I was thinking about utility functions in general—what is possible, self-consistent, and optimizes something—rather than human utility functions or my own. As far as I personally am concerned, I do indeed accept the ‘default’ interpretation of a simulation (when forced to make a judgement) because it’s easiest to operate that way and my main goal (in adjusting my utility function) is to achieve my supergoals smoothly, rather than to achieve some objectively correct super-theory of morals. Thanks for helping me see that.
In Solomonoff induction, the weight of a program is the inverse of the exponential of its length. (I have an argument that says this doesn’t need to be assumed a priori, it can be derived, though I don’t have a formal proof of this.) Given that, it’s easy to see that the total weight of all the weird interpretations is negligible compared to that of the normal interpretation.
It’s true that some things become easier when you try to restrict your attention to “our single physical world”, but other things become less easy. Anyway, that’s a metaphysical question, so let’s leave it aside; in which case, to be consistent, we should also forget about the notion of simulations and look at an at least potentially physical scenario.
Suppose the copy took the form of a physical duplicate of our solar system, with the non-interaction requirement met by flinging same over the cosmic event horizon. Now do you think it makes sense to assign this a positive utility?
I don’t see why. My utility function could also assign a negative utility to (some, not necessarily all) ‘weird’ interpretations whose magnitude would scale exponentially with the bit-lengths of the interpretations.
Is there a proof that this is inconsistent? if I understand correctly, you’re saying that any utility function that assigns very large-magnitude negative utility to alternate interpretations of patterns in simulations, is directly incompatible with Solomonoff induction. That’s a pretty strong claim.
I don’t assign positive utility to it myself. Not above the level of “it might be a neat thing to do”. But I find your utility function much more understandable (as well as more similar to that of many other people) when you say you’d like to create physical clone worlds. It’s quite different from assigning utility to simulated patterns requiring certain interpretations.
Well, not exactly; I’m saying Solomonoff induction has implications for what degree of reality (weight, subjective probability, magnitude, measure, etc.) we should assign certain worlds (interpretations, patterns, universes, possibilities, etc.).
Utility is a different matter. You are perfectly free to have a utility function that assigns Ackermann(4,4) units of disutility to each penguin that exists in a particular universe, whereupon the absence of penguins will presumably outweigh all other desiderata. I might feel this utility function is unreasonable, but I can’t claim it to be inconsistent.