Consider the case where you are trying to value (a) just yourself versus (b) the set of all future yous that satisfy the constraint of not going into negative utility.
The shannon information of the set (b) could be (probably would be) lower than that of (a). To see this, note that the complexity (information) of the set of all future yous is just the info required to specify (you,now) (because to compute the time evolution of the set, you just need the initial condition), whereas the complexity (information) of just you is a series of snapshots (you, now), (you, 1 microsecond from now), … . This is like the difference between a JPEG and an MPEG. The complexity of the constraint probably won’t make up for this.
If the constraint of going into negative utility is particularly complex, one could pick a simple subset of nonnegative utility future yous, for example by specifying relatively simple constraints that ensure that the vast majority of yous satisfying those constraints don’t go into negative utility.
This is problematic because it means that you would assign less value to a large set of happy future yous than to just one future you.
This is very disturbing. But I don’t think the set of all possible future yous has no information. You seem to be assuming it’s a discrete distribution, with 1 copy of all possible future yous. I expect the distribution to be uneven, with many copies clustered near each other in possible-you-space. The distribution, being a function over possible yous, contains even more information than a you.
In your new example, (b) is unrelated to the original question. For (b) a simulation of multiple diverging copies is required in order to create this set of all future yous. However, in your original example, the copies don’t statistically diverge.
The entropy of (a) would be the information required to specify you at state t0 + the entropy of a random distribution of input used to generate the set of all possible t1s. In the original example, the simulations of the copies are closed (otherwise you couldn’t keep them identical) so the information contained in the single possible t1 cannot be any higher than the information in t0.
It is possible that we are using different unstated assumptions. Do you agree with these assumptions:
1) An uploaded copy running in a simulation is Turing-complete (As JoshuaZ points out, the copy should also be Turing-equivalent). Because of this, state t_n+1 of a given simulation can be determined by the value of t_n and value of the input D_n at that state. (The sequence D is not random so I can always calculate the value of D_n. In the easiest case D_n=0 for all values of n.) Similarly, if I have multiple copies of the simulation at the same state t_n and all of them have the same input D_n, they should all have the same value for t_n+1. In the top level post, having multiple identical copies means that they all start at the same state t_0 and are passed in the same inputs D_0, D_1, etc as they run in order to force them to remain identical. Because no new information is gained as we run the simulation, the entropy (and thus the value) remains the same no matter how many copies are being run.
2)For examples (a) and (b) you are talking about replacing the input sequence D with a random number generator R. The value of t_1 depends on t_0 and the output of R. Since R is no longer predictable, there is information being added at each stage. This means the entropy of this new simulation depends on the entropy of R
1) An uploaded copy running in a simulation is Turing-complete
That is not what Turing complete means. Roughly speaking, something is Turing complete if it can simulate any valid Turing machine. What you are talking about is simply that the state change in question is determined by input data and state. This says nothing about Turing completness of the class of simulations, or even whether the class of simulations can be simulated on Turing machines.. For example, if the physical laws of the universe actually require real numbers then you might need a Blum-Shub-Smale machine to model the simulation.
Oops, I should have said Turing-equivalent. I tend to treat the two concepts as the same because they are the same from a practical perspective. I’ve updated the post.
Ok, let me see if you agree on something simple. What is the complexity (information content) of a randomly chosen integer of length N binary digits? About N bits, right?
What is the information content of the set of all 2^N integers of length N binary digits, then? Do you think it is N*2^N ?
I agree with the first part. In the second part, where is the randomness in the information? The set of all N-bit integers is completely predictable for a given N.
So the complexity of the set of all possible continuations of a person has less information content than just the person.
And the complexity of the set of happy or positive utility continuations is determined by the complexity of specifying a boundary. Rather like the complexity of the set of all integers of binary length ⇐ N digits that also satisfy property P is really the same as the complexity of property P.
So the complexity of the set of all possible continuations of a person has less information content than just the person.
When you say “just the person” do you mean just the person at H(T_n) or a specific continuation of the person at H(T_n)? I would say H(T_n) < H(all possible T_n+1) < H(specific T_n+1)
Consider the case where you are trying to value (a) just yourself versus (b) the set of all future yous that satisfy the constraint of not going into negative utility.
The shannon information of the set (b) could be (probably would be) lower than that of (a). To see this, note that the complexity (information) of the set of all future yous is just the info required to specify (you,now) (because to compute the time evolution of the set, you just need the initial condition), whereas the complexity (information) of just you is a series of snapshots (you, now), (you, 1 microsecond from now), … . This is like the difference between a JPEG and an MPEG. The complexity of the constraint probably won’t make up for this.
If the constraint of going into negative utility is particularly complex, one could pick a simple subset of nonnegative utility future yous, for example by specifying relatively simple constraints that ensure that the vast majority of yous satisfying those constraints don’t go into negative utility.
This is problematic because it means that you would assign less value to a large set of happy future yous than to just one future you.
This is very disturbing. But I don’t think the set of all possible future yous has no information. You seem to be assuming it’s a discrete distribution, with 1 copy of all possible future yous. I expect the distribution to be uneven, with many copies clustered near each other in possible-you-space. The distribution, being a function over possible yous, contains even more information than a you.
Why more?
In your new example, (b) is unrelated to the original question. For (b) a simulation of multiple diverging copies is required in order to create this set of all future yous. However, in your original example, the copies don’t statistically diverge.
The entropy of (a) would be the information required to specify you at state t0 + the entropy of a random distribution of input used to generate the set of all possible t1s. In the original example, the simulations of the copies are closed (otherwise you couldn’t keep them identical) so the information contained in the single possible t1 cannot be any higher than the information in t0.
Sorry I don’t understand this.
Which part(s) don’t you understand?
It is possible that we are using different unstated assumptions. Do you agree with these assumptions:
1) An uploaded copy running in a simulation is Turing-complete (As JoshuaZ points out, the copy should also be Turing-equivalent). Because of this, state t_n+1 of a given simulation can be determined by the value of t_n and value of the input D_n at that state. (The sequence D is not random so I can always calculate the value of D_n. In the easiest case D_n=0 for all values of n.) Similarly, if I have multiple copies of the simulation at the same state t_n and all of them have the same input D_n, they should all have the same value for t_n+1. In the top level post, having multiple identical copies means that they all start at the same state t_0 and are passed in the same inputs D_0, D_1, etc as they run in order to force them to remain identical. Because no new information is gained as we run the simulation, the entropy (and thus the value) remains the same no matter how many copies are being run.
2)For examples (a) and (b) you are talking about replacing the input sequence D with a random number generator R. The value of t_1 depends on t_0 and the output of R. Since R is no longer predictable, there is information being added at each stage. This means the entropy of this new simulation depends on the entropy of R
That is not what Turing complete means. Roughly speaking, something is Turing complete if it can simulate any valid Turing machine. What you are talking about is simply that the state change in question is determined by input data and state. This says nothing about Turing completness of the class of simulations, or even whether the class of simulations can be simulated on Turing machines.. For example, if the physical laws of the universe actually require real numbers then you might need a Blum-Shub-Smale machine to model the simulation.
Oops, I should have said Turing-equivalent. I tend to treat the two concepts as the same because they are the same from a practical perspective. I’ve updated the post.
Ok, let me see if you agree on something simple. What is the complexity (information content) of a randomly chosen integer of length N binary digits? About N bits, right?
What is the information content of the set of all 2^N integers of length N binary digits, then? Do you think it is N*2^N ?
I agree with the first part. In the second part, where is the randomness in the information? The set of all N-bit integers is completely predictable for a given N.
Exactly. So, the same phenomenon occurs when considering the set of all possible continuations of a person. Yes?
For the set of all possible inputs (and thus all possible continuations), yes.
So the complexity of the set of all possible continuations of a person has less information content than just the person.
And the complexity of the set of happy or positive utility continuations is determined by the complexity of specifying a boundary. Rather like the complexity of the set of all integers of binary length ⇐ N digits that also satisfy property P is really the same as the complexity of property P.
When you say “just the person” do you mean just the person at H(T_n) or a specific continuation of the person at H(T_n)? I would say H(T_n) < H(all possible T_n+1) < H(specific T_n+1)
I agree with the second part.
“More can be said of one apple than of all the apples in the world”. (I can’t find the quote I’m paraphrasing...)
Escape the underscores to block their markup effect: to get A_i, type “A\_i”.