What about using compressibility as a way of determining the value of the set of copies?
In computer science, there is a concept known as deduplication (http://en.wikipedia.org/wiki/Data_deduplication) which is related to determining the value of copies of data. Normally, if you have 100MB of uncompressable data (e.g. an image or an upload of a human), it will take up 100MB on a disk. If make a copy of that file, a standard computer system will require a total of 200MB to track both files on disk. A smart system that uses deduplication will see that they are the same file and discard the redundant data so that only 100MB is actually required. However, this is done transparently so the user will see two files and think that there is 200MB of data. This can be done with N copies and the user will think there is N*100MB of data, but the file system is smart enough to only use up 100MB of disk space as long as no one modifies the files.
For the case of an upload, you have N copies of a human of X MB each which will only require X MB on the disk even though the end user sees N*X MB of data being processed. As long as the simulations never diverge, the file system will never use more that X MB of data. As long as the copies never diverge, running N copies of an upload should never take up more than X MB of space (though they will take up more time since the each process is still being run).
In the case where the copies /do/ diverge, you can use COW optimization (http://en.wikipedia.org/wiki/Copy_on_write) to determine the amount of resources used. In the first example, if you change the first 1MB of one of the two 100MB files but leave the rest untouched, a smart computer will only use 101MB of disk space. It will use up 99MB for the shared data, 1MB for the first file’s unique data, and 1MB for the second file’s unique data. So in this case, the resources for the two copies is 1% more than the resources used for the single copy.
From a purely theoretical perspective deduplication and COW will give you an efficiency equivalent to what you would get if you tried to compress an upload or a bunch of uploads. (In practice it depends on the type of data) So value of N copies is equal to the Shannon entropy (alternatively, you probably could use the Komogorov complexity) of the data that is the same in both copies plus the unique data in each copy. I figure that any supercomputer designed to run multiple copies of an upload would use these types of compression by default since all modern high end file storage systems use dedup and COW to save on costs.
Note that this calculation of value is different from if you make a backup of youself to guard against disaster. In the case of a backup, you would normally run the second copy in a more isolated environment from the first that would make deduplication impossible. E.g. you would have one upload running in California and another running in Australia. That way if the computer in California falls into the ocean, you still have a working copy in Australia. This this case, the value of the two copies is greater than the value of just one copy because the second copy adds a measure of redundancy even though it adds no new information.
P.S.
While we’re on the topic, this is a good time you backup your own computer if you haven’t done so recently. If your hard drive crashes, then you will fully comprehend the value of a copy :)
Consider the case where you are trying to value (a) just yourself versus (b) the set of all future yous that satisfy the constraint of not going into negative utility.
The shannon information of the set (b) could be (probably would be) lower than that of (a). To see this, note that the complexity (information) of the set of all future yous is just the info required to specify (you,now) (because to compute the time evolution of the set, you just need the initial condition), whereas the complexity (information) of just you is a series of snapshots (you, now), (you, 1 microsecond from now), … . This is like the difference between a JPEG and an MPEG. The complexity of the constraint probably won’t make up for this.
If the constraint of going into negative utility is particularly complex, one could pick a simple subset of nonnegative utility future yous, for example by specifying relatively simple constraints that ensure that the vast majority of yous satisfying those constraints don’t go into negative utility.
This is problematic because it means that you would assign less value to a large set of happy future yous than to just one future you.
This is very disturbing. But I don’t think the set of all possible future yous has no information. You seem to be assuming it’s a discrete distribution, with 1 copy of all possible future yous. I expect the distribution to be uneven, with many copies clustered near each other in possible-you-space. The distribution, being a function over possible yous, contains even more information than a you.
In your new example, (b) is unrelated to the original question. For (b) a simulation of multiple diverging copies is required in order to create this set of all future yous. However, in your original example, the copies don’t statistically diverge.
The entropy of (a) would be the information required to specify you at state t0 + the entropy of a random distribution of input used to generate the set of all possible t1s. In the original example, the simulations of the copies are closed (otherwise you couldn’t keep them identical) so the information contained in the single possible t1 cannot be any higher than the information in t0.
It is possible that we are using different unstated assumptions. Do you agree with these assumptions:
1) An uploaded copy running in a simulation is Turing-complete (As JoshuaZ points out, the copy should also be Turing-equivalent). Because of this, state t_n+1 of a given simulation can be determined by the value of t_n and value of the input D_n at that state. (The sequence D is not random so I can always calculate the value of D_n. In the easiest case D_n=0 for all values of n.) Similarly, if I have multiple copies of the simulation at the same state t_n and all of them have the same input D_n, they should all have the same value for t_n+1. In the top level post, having multiple identical copies means that they all start at the same state t_0 and are passed in the same inputs D_0, D_1, etc as they run in order to force them to remain identical. Because no new information is gained as we run the simulation, the entropy (and thus the value) remains the same no matter how many copies are being run.
2)For examples (a) and (b) you are talking about replacing the input sequence D with a random number generator R. The value of t_1 depends on t_0 and the output of R. Since R is no longer predictable, there is information being added at each stage. This means the entropy of this new simulation depends on the entropy of R
1) An uploaded copy running in a simulation is Turing-complete
That is not what Turing complete means. Roughly speaking, something is Turing complete if it can simulate any valid Turing machine. What you are talking about is simply that the state change in question is determined by input data and state. This says nothing about Turing completness of the class of simulations, or even whether the class of simulations can be simulated on Turing machines.. For example, if the physical laws of the universe actually require real numbers then you might need a Blum-Shub-Smale machine to model the simulation.
Oops, I should have said Turing-equivalent. I tend to treat the two concepts as the same because they are the same from a practical perspective. I’ve updated the post.
Ok, let me see if you agree on something simple. What is the complexity (information content) of a randomly chosen integer of length N binary digits? About N bits, right?
What is the information content of the set of all 2^N integers of length N binary digits, then? Do you think it is N*2^N ?
I agree with the first part. In the second part, where is the randomness in the information? The set of all N-bit integers is completely predictable for a given N.
So the complexity of the set of all possible continuations of a person has less information content than just the person.
And the complexity of the set of happy or positive utility continuations is determined by the complexity of specifying a boundary. Rather like the complexity of the set of all integers of binary length ⇐ N digits that also satisfy property P is really the same as the complexity of property P.
So the complexity of the set of all possible continuations of a person has less information content than just the person.
When you say “just the person” do you mean just the person at H(T_n) or a specific continuation of the person at H(T_n)? I would say H(T_n) < H(all possible T_n+1) < H(specific T_n+1)
What about using compressibility as a way of determining the value of the set of copies?
In computer science, there is a concept known as deduplication (http://en.wikipedia.org/wiki/Data_deduplication) which is related to determining the value of copies of data. Normally, if you have 100MB of uncompressable data (e.g. an image or an upload of a human), it will take up 100MB on a disk. If make a copy of that file, a standard computer system will require a total of 200MB to track both files on disk. A smart system that uses deduplication will see that they are the same file and discard the redundant data so that only 100MB is actually required. However, this is done transparently so the user will see two files and think that there is 200MB of data. This can be done with N copies and the user will think there is N*100MB of data, but the file system is smart enough to only use up 100MB of disk space as long as no one modifies the files.
For the case of an upload, you have N copies of a human of X MB each which will only require X MB on the disk even though the end user sees N*X MB of data being processed. As long as the simulations never diverge, the file system will never use more that X MB of data. As long as the copies never diverge, running N copies of an upload should never take up more than X MB of space (though they will take up more time since the each process is still being run).
In the case where the copies /do/ diverge, you can use COW optimization (http://en.wikipedia.org/wiki/Copy_on_write) to determine the amount of resources used. In the first example, if you change the first 1MB of one of the two 100MB files but leave the rest untouched, a smart computer will only use 101MB of disk space. It will use up 99MB for the shared data, 1MB for the first file’s unique data, and 1MB for the second file’s unique data. So in this case, the resources for the two copies is 1% more than the resources used for the single copy.
From a purely theoretical perspective deduplication and COW will give you an efficiency equivalent to what you would get if you tried to compress an upload or a bunch of uploads. (In practice it depends on the type of data) So value of N copies is equal to the Shannon entropy (alternatively, you probably could use the Komogorov complexity) of the data that is the same in both copies plus the unique data in each copy. I figure that any supercomputer designed to run multiple copies of an upload would use these types of compression by default since all modern high end file storage systems use dedup and COW to save on costs.
Note that this calculation of value is different from if you make a backup of youself to guard against disaster. In the case of a backup, you would normally run the second copy in a more isolated environment from the first that would make deduplication impossible. E.g. you would have one upload running in California and another running in Australia. That way if the computer in California falls into the ocean, you still have a working copy in Australia. This this case, the value of the two copies is greater than the value of just one copy because the second copy adds a measure of redundancy even though it adds no new information.
P.S. While we’re on the topic, this is a good time you backup your own computer if you haven’t done so recently. If your hard drive crashes, then you will fully comprehend the value of a copy :)
Consider the case where you are trying to value (a) just yourself versus (b) the set of all future yous that satisfy the constraint of not going into negative utility.
The shannon information of the set (b) could be (probably would be) lower than that of (a). To see this, note that the complexity (information) of the set of all future yous is just the info required to specify (you,now) (because to compute the time evolution of the set, you just need the initial condition), whereas the complexity (information) of just you is a series of snapshots (you, now), (you, 1 microsecond from now), … . This is like the difference between a JPEG and an MPEG. The complexity of the constraint probably won’t make up for this.
If the constraint of going into negative utility is particularly complex, one could pick a simple subset of nonnegative utility future yous, for example by specifying relatively simple constraints that ensure that the vast majority of yous satisfying those constraints don’t go into negative utility.
This is problematic because it means that you would assign less value to a large set of happy future yous than to just one future you.
This is very disturbing. But I don’t think the set of all possible future yous has no information. You seem to be assuming it’s a discrete distribution, with 1 copy of all possible future yous. I expect the distribution to be uneven, with many copies clustered near each other in possible-you-space. The distribution, being a function over possible yous, contains even more information than a you.
Why more?
In your new example, (b) is unrelated to the original question. For (b) a simulation of multiple diverging copies is required in order to create this set of all future yous. However, in your original example, the copies don’t statistically diverge.
The entropy of (a) would be the information required to specify you at state t0 + the entropy of a random distribution of input used to generate the set of all possible t1s. In the original example, the simulations of the copies are closed (otherwise you couldn’t keep them identical) so the information contained in the single possible t1 cannot be any higher than the information in t0.
Sorry I don’t understand this.
Which part(s) don’t you understand?
It is possible that we are using different unstated assumptions. Do you agree with these assumptions:
1) An uploaded copy running in a simulation is Turing-complete (As JoshuaZ points out, the copy should also be Turing-equivalent). Because of this, state t_n+1 of a given simulation can be determined by the value of t_n and value of the input D_n at that state. (The sequence D is not random so I can always calculate the value of D_n. In the easiest case D_n=0 for all values of n.) Similarly, if I have multiple copies of the simulation at the same state t_n and all of them have the same input D_n, they should all have the same value for t_n+1. In the top level post, having multiple identical copies means that they all start at the same state t_0 and are passed in the same inputs D_0, D_1, etc as they run in order to force them to remain identical. Because no new information is gained as we run the simulation, the entropy (and thus the value) remains the same no matter how many copies are being run.
2)For examples (a) and (b) you are talking about replacing the input sequence D with a random number generator R. The value of t_1 depends on t_0 and the output of R. Since R is no longer predictable, there is information being added at each stage. This means the entropy of this new simulation depends on the entropy of R
That is not what Turing complete means. Roughly speaking, something is Turing complete if it can simulate any valid Turing machine. What you are talking about is simply that the state change in question is determined by input data and state. This says nothing about Turing completness of the class of simulations, or even whether the class of simulations can be simulated on Turing machines.. For example, if the physical laws of the universe actually require real numbers then you might need a Blum-Shub-Smale machine to model the simulation.
Oops, I should have said Turing-equivalent. I tend to treat the two concepts as the same because they are the same from a practical perspective. I’ve updated the post.
Ok, let me see if you agree on something simple. What is the complexity (information content) of a randomly chosen integer of length N binary digits? About N bits, right?
What is the information content of the set of all 2^N integers of length N binary digits, then? Do you think it is N*2^N ?
I agree with the first part. In the second part, where is the randomness in the information? The set of all N-bit integers is completely predictable for a given N.
Exactly. So, the same phenomenon occurs when considering the set of all possible continuations of a person. Yes?
For the set of all possible inputs (and thus all possible continuations), yes.
So the complexity of the set of all possible continuations of a person has less information content than just the person.
And the complexity of the set of happy or positive utility continuations is determined by the complexity of specifying a boundary. Rather like the complexity of the set of all integers of binary length ⇐ N digits that also satisfy property P is really the same as the complexity of property P.
When you say “just the person” do you mean just the person at H(T_n) or a specific continuation of the person at H(T_n)? I would say H(T_n) < H(all possible T_n+1) < H(specific T_n+1)
I agree with the second part.
“More can be said of one apple than of all the apples in the world”. (I can’t find the quote I’m paraphrasing...)
Escape the underscores to block their markup effect: to get A_i, type “A\_i”.
Note that Wei Dai also had this idea.
I don’t quite understand sigmaxipi’s idea, but from what I can tell, it’s not the same as mine.
In my proposal, your counter-example isn’t a problem, because something that is less complex (easier to specify) is given a higher utility bound.
Oh, I see, so your proposal is actually the opposite of sigmaxipi’s. He wants lower complexity to correspond to lower utility.