Your problem setup contains a contradiction. You said that X and X*_i are identical copies, and then you said that they have different utility functions. This happened because you defined the utility function over the wrong domain; you specified it as (world-history, identity)=>R when it should be (world-history)=>R.
How I interpreted the problem- it’s not that identical agents have different utility functions, it’s just that different things happen to them. In reality, what’s behind the door is behind the door, while in the simulation rewards X with something else. X is only unaware of whether or not he’s in a simulation before he presses the button- obviously once he actually receives the utility he can tell the difference.
Although the fact that nobody else has stated this makes me unsure. OP, can you clarify a little bit more?
What’s the point of utility functions if you can’t even in principle know their value for the universe you’re actually in? Utility functions are supposed to guide decisions. A utility function that can’t be approximated, even a little, even with infinite computing power, can’t be linked to a decision theory or used in any other way.
I’m generally inclined to agree with you because there’s generally a lot of issues that come up with anthropics, but in order to drop the matter altogether you would need to genuinely dissolve the question.
The steelman response to your point is this:
For each possible strategy you could choose, you can evaluate the probability of which “you” you actually are. You can then evaluate the utility values conditional on each possible self, and calculate the expected value over the probability distribution of selves.
As such, it is clearly possible to approximate and calculate utilities for such functions, and use them to make decisions.
The question is not whether or not you can do the calculations, the question is whether or not those calculations correspond to something meaningful.
A simpler version of the original post is this. Let there be a single, consistent utility function shared by all copies of the agent (X and all Xi). It assigns these utility values:
X chooses “sim”, and then N instances of Xi choose “sim” and 1000-N instances choose “don’t sim” → 1.0 + 0.2N + 0.1(1000-N)
X chooses “don’t sim”, no Xi gets created → 0.9
Of course, the post’s premise is that the only actually possible universe in category 1 is that where all 1000 Xi instances choose “sim” (because they can’t tell if they’re in the simulation or not), so the total utility is then 1 + 0.2*1000 = 201.
This is a simple demonstration of TDT giving the right answer which maximizes the utility (“sim”) while CDT doesn’t (I think?)
What didn’t make sense to me was saying X and Xi somehow have “different” utility functions. Maybe this was just confusion generated by imprecise use of words, and not any real difference.
The post then says:
For every agent it is true that she does not gain anything from the utility of another agent despite the fact she and the other agents are identical!
I’m not sure if this is intended to change the situation. Once you have a utility function that gives out actual numbers, you don’t care how it works on the inside and whether it takes into accounts another agent’s utility or anything else.
Thanks for mentioning this. I know this wasn’t put very nicely. Imagine you were a very selfish person X only caring about yourself. If I make a really good copy of X which is then placed 100 meters next to X, then this copy X only cares about the spatiotemporal dots of what we define X. Both agents, X and X, are identical if we formalize their algorithms incorporating indexical information. If we don’t do that then a disparity remains, namely that X is different to X in that, intrinsically, X only cares about the set of spatiotemporal dots constituting X. The same goes for X accordingly. But this semantical issue doesn’t seem to be relevant for the decision problem itself. The kind of similarity that is of interest here seems to be the one that determines similiar behavior in such games. (Probably you could set up games where the non-indexical formalization of the agents X and X are relevantly different, I merely claim that this game is not one of them)
You’ve then incorporated identity twice: once when you gave each agent its own goals, and again inside of those goals. If an agent’s goals have a dangling identity-pointer inside, then they won’t stay consistent (or well-defined) in case of self-copying, so by the same argument which says agents should stop their utility functions from drifting over time, it should replace that pointer with a specific value.
So, in other words:
If I am D and all I want is to be king of the universe, then before stepping into a copying machine I should self-modify so that my utility function will say “+1000 if D is king of the universe” rather than “+1000 if I am king of the universe”, because then my copy D2 will have a utility function of “+1000 if D is king of the universe”, and that maximises my chances of being king of the universe.
That is what you mean, right?
I guess the anthropic counter is this:
What if, after stepping into the machine, I will end up being D2 instead of being D!? If I was to self-modify to care only about D then I wouldn’t end up being king of the universe, D would!
The agent, and the utility function’s implementation in the agent, are already part of the world and its world-history. If two agents in two universes cannot be distinguished by any observation in their universes, then they must exhibit identical behavior. I claim it makes no sense to say two agents have different goals or different utility functions if they are physically identical.
There is a difference between X and Xi: the original X can choose to simulate copies of herself, which exist in the world_history and are legitimate subjects to assign utility to.
A copy X_i can’t create further copies (pressing “sim” does nothing in the simulation), so her utility for the action is different.
Your problem setup contains a contradiction. You said that X and X*_i are identical copies, and then you said that they have different utility functions. This happened because you defined the utility function over the wrong domain; you specified it as (world-history, identity)=>R when it should be (world-history)=>R.
How I interpreted the problem- it’s not that identical agents have different utility functions, it’s just that different things happen to them. In reality, what’s behind the door is behind the door, while in the simulation rewards X with something else. X is only unaware of whether or not he’s in a simulation before he presses the button- obviously once he actually receives the utility he can tell the difference. Although the fact that nobody else has stated this makes me unsure. OP, can you clarify a little bit more?
Yes, this is how I view the problem as well.
If that is the only way that utility functions can be defined then it means that anthropic egotism is incoherent.
What’s the point of utility functions if you can’t even in principle know their value for the universe you’re actually in? Utility functions are supposed to guide decisions. A utility function that can’t be approximated, even a little, even with infinite computing power, can’t be linked to a decision theory or used in any other way.
I’m generally inclined to agree with you because there’s generally a lot of issues that come up with anthropics, but in order to drop the matter altogether you would need to genuinely dissolve the question.
The steelman response to your point is this: For each possible strategy you could choose, you can evaluate the probability of which “you” you actually are. You can then evaluate the utility values conditional on each possible self, and calculate the expected value over the probability distribution of selves. As such, it is clearly possible to approximate and calculate utilities for such functions, and use them to make decisions.
The question is not whether or not you can do the calculations, the question is whether or not those calculations correspond to something meaningful.
A simpler version of the original post is this. Let there be a single, consistent utility function shared by all copies of the agent (X and all Xi). It assigns these utility values:
X chooses “sim”, and then N instances of Xi choose “sim” and 1000-N instances choose “don’t sim” → 1.0 + 0.2N + 0.1(1000-N)
X chooses “don’t sim”, no Xi gets created → 0.9
Of course, the post’s premise is that the only actually possible universe in category 1 is that where all 1000 Xi instances choose “sim” (because they can’t tell if they’re in the simulation or not), so the total utility is then 1 + 0.2*1000 = 201.
This is a simple demonstration of TDT giving the right answer which maximizes the utility (“sim”) while CDT doesn’t (I think?)
What didn’t make sense to me was saying X and Xi somehow have “different” utility functions. Maybe this was just confusion generated by imprecise use of words, and not any real difference.
The post then says:
I’m not sure if this is intended to change the situation. Once you have a utility function that gives out actual numbers, you don’t care how it works on the inside and whether it takes into accounts another agent’s utility or anything else.
The idea is that they have the same utility function, but the utility function takes values over anthropic states (values of “I”).
U(I am X and X chooses sim) = 1
U(I am Xi and Xi chooses sim) = 0.2 etc.
I don’t like it, but I also don’t see an obvious way to reject the idea.
Thanks for mentioning this. I know this wasn’t put very nicely.
Imagine you were a very selfish person X only caring about yourself. If I make a really good copy of X which is then placed 100 meters next to X, then this copy X only cares about the spatiotemporal dots of what we define X. Both agents, X and X, are identical if we formalize their algorithms incorporating indexical information. If we don’t do that then a disparity remains, namely that X is different to X in that, intrinsically, X only cares about the set of spatiotemporal dots constituting X. The same goes for X accordingly. But this semantical issue doesn’t seem to be relevant for the decision problem itself. The kind of similarity that is of interest here seems to be the one that determines similiar behavior in such games. (Probably you could set up games where the non-indexical formalization of the agents X and X are relevantly different, I merely claim that this game is not one of them)
It should be (world-history, identity)=>R. Different agents have different goals, which give different utility values to actions.
You’ve then incorporated identity twice: once when you gave each agent its own goals, and again inside of those goals. If an agent’s goals have a dangling identity-pointer inside, then they won’t stay consistent (or well-defined) in case of self-copying, so by the same argument which says agents should stop their utility functions from drifting over time, it should replace that pointer with a specific value.
So, in other words: If I am D and all I want is to be king of the universe, then before stepping into a copying machine I should self-modify so that my utility function will say “+1000 if D is king of the universe” rather than “+1000 if I am king of the universe”, because then my copy D2 will have a utility function of “+1000 if D is king of the universe”, and that maximises my chances of being king of the universe.
That is what you mean, right?
I guess the anthropic counter is this: What if, after stepping into the machine, I will end up being D2 instead of being D!? If I was to self-modify to care only about D then I wouldn’t end up being king of the universe, D would!
The agent, and the utility function’s implementation in the agent, are already part of the world and its world-history. If two agents in two universes cannot be distinguished by any observation in their universes, then they must exhibit identical behavior. I claim it makes no sense to say two agents have different goals or different utility functions if they are physically identical.
There is a difference between X and Xi: the original X can choose to simulate copies of herself, which exist in the world_history and are legitimate subjects to assign utility to.
A copy X_i can’t create further copies (pressing “sim” does nothing in the simulation), so her utility for the action is different.