Thinking about this a bit more, and assuming CEV operates on the humans that exist at the time of its application: Why would CEV operate on humans that do exist, and not on humans that could exist? It seems this is what Dr. Evil is taking advantage of, by densely populating identity-space around him and crowding out the rest of humanity. But this could occur for many other reasons: Certain cultures encouraging high birth rates, certain technologies or memes being popular at the time of CEV-activation that affect the wiring of the human brain, or certain historical turns that shape the direction of mankind. A more imaginative scenario: what if another scientist, who knows nothing about FAI and CEV, finds it useful to address a problem by copying himself into trillions of branches, each examining a certain hypothesis, and all the branches are (merged/discarded) when the answer is found. Let’s further say that CEV t-zero occurs when the scientist is deep in a problem-solving cycle. Would the FAI take each branch as a separate human/vote? This scenario involves no intent to defraud the system. It also is not manipulation of a proxy, as there is a real definitional problem here whose answer is not easily apparent to a human. Applying CEV to all potential humans that could have existed in identity-space would deal with this, but pushes CEV further and further into uncomputable territory.
Why would CEV operate on humans that do exist, and not on humans that could exist?
To do the latter, you would need a definition of “human” that can not just distinguish existing humans from existing non-humans, but also pick out all human minds from the space of all possible minds. I don’t see how to specify this definition. (Is this problem not obvious to everyone else?)
For example, we might specify a prototypical human mind, and say that “human” is any mind which is less than a certain distance from the prototypical mind in design space. But then the CEV of this “humankind” is entirely dependent on the prototype that we pick. If the FAI designers are allowed to just pick any prototype they want, they can make the CEV of “humanity” come out however they wish, so they might as well have the FAI use the CEV of themselves. If they pick the prototype by taking the average of all existing humans, then that allows the same attack described in my post.
The problem is indeed there, but if the goal is to find out the human coherent extrapolated volition, then a definition of human is necessary.
If we have no way of picking out human minds from the space of all possible minds, then we don’t really know what we’re optimizing for. We can’t rule out the possibility that a human mind will come into existence that will not be (perfectly) happy with the way things turn out.* This may well be an inherent problem in CEV. If FAI will prevent such humans from coming into existence, then it has in effect enforced its own definition of a human on humanity.
But let’s try to salvage it. What if you were to use existing humans as a training set for an AI to determine what a human is and is not (assuming you can indeed carve reality/mind-space at the joints, which I am unsure about). Then you can use this definition to pick out the possible human minds from mind-space and calculate their coherent extrapolated volition.
This would be resistant to identity-space stuffing like what you describe, but not resistant to systematic wiping out of certain genes/portions of identity-space before CEV-application.
But the wiping out of genes and introduction of new ones is the very definition of evolution. We then would need to differentiate between intentional wiping out of genes by certain humans from natural wiping out of genes by reality/evolution, a rabbit-hole I can’t see the way out of, possibly a category error. If we can’t do that, we have to accept the gene-pool at the time of CEV-activation as the verdict of evolution about what a human is, which leaves a window open to gaming by genocide.
Perhaps taking as a starting poing the time of introduction of the idea of CEV as a means of preventing the possibility of manipulation would work, or perhaps trying to infer if there was any intentional gaming of CEV would also work. Actually this would deal with both genocide and stuffing without any additional steps. But this is assuming rewinding time and global knowledge of all human thoughts and memories as capabilities. Great fun :)
*coming to think of it, what guarantees that the result of CEV will not be something that some of us simply do not want? If such clusters exist, will the FAI create separate worlds for each one?
EDIT: Do you think there would be a noticeable difference between 1900AD!CEV and 2000AD!CEV?
If the FAI designers are allowed to just pick any prototype they want, they can make the CEV of “humanity” come out however they wish, so they might as well have the FAI use the CEV of themselves. If they pick the prototype by taking the average of all existing humans, then that allows the same attack described in my post.
Who ever said that CEV is about taking the average utility of all existing humans? The method of aggregating personal utilities should be determined by the extrapolation, on the basis of human cognitive architecture, and not by programmer fiat.
So how about taking all humans that do exist, determining the boundary humans, and using the entire section of identity-space delineated by them? That is still vulnerable to Dr. Evil killing everyone, but not to the trillion near-copy strategy. No?
If it’s just to use volitions of “people in general” as opposed to “people alive now”, it might also turn out to be correct to ignore “people present now” altogether, including cryonauts, in constructing the future. So, this is not just an argument for cryonics not working, but also for people alive at the time in no sense contributing/surviving (in other words, everyone survives to an equal degree, irrespective of whether they opted in on cryonics).
Conditional on the development of FAI. The fact that humanity arises and produces FAI capable of timeless trade at a certain rate would cause the human-volition regarding behaviors of all trading super-intelligences and of all human friendly AIs in the multiverse.
The fact that humanity arises and produces FAI capable of timeless trade at a certain rate would cause the human-volition regarding behaviors of all trading super-intelligences and of all human friendly AIs in the multiverse.
Can’t parse this statement. The fact that a FAI could appear causes “human-volition” about all AGIs? What’s human-volition, what does it mean to cause human-volition, how is the fact that FAI could appear relevant?
Why would CEV operate on humans that do exist, and not on humans that could exist?
Because for every possible human mind, there’s one with the sign flipped on it’s utility function. Unless your definition of “human” describes their utility function, in which case …
Thinking about this a bit more, and assuming CEV operates on the humans that exist at the time of its application: Why would CEV operate on humans that do exist, and not on humans that could exist? It seems this is what Dr. Evil is taking advantage of, by densely populating identity-space around him and crowding out the rest of humanity. But this could occur for many other reasons: Certain cultures encouraging high birth rates, certain technologies or memes being popular at the time of CEV-activation that affect the wiring of the human brain, or certain historical turns that shape the direction of mankind. A more imaginative scenario: what if another scientist, who knows nothing about FAI and CEV, finds it useful to address a problem by copying himself into trillions of branches, each examining a certain hypothesis, and all the branches are (merged/discarded) when the answer is found. Let’s further say that CEV t-zero occurs when the scientist is deep in a problem-solving cycle. Would the FAI take each branch as a separate human/vote? This scenario involves no intent to defraud the system. It also is not manipulation of a proxy, as there is a real definitional problem here whose answer is not easily apparent to a human. Applying CEV to all potential humans that could have existed in identity-space would deal with this, but pushes CEV further and further into uncomputable territory.
To do the latter, you would need a definition of “human” that can not just distinguish existing humans from existing non-humans, but also pick out all human minds from the space of all possible minds. I don’t see how to specify this definition. (Is this problem not obvious to everyone else?)
For example, we might specify a prototypical human mind, and say that “human” is any mind which is less than a certain distance from the prototypical mind in design space. But then the CEV of this “humankind” is entirely dependent on the prototype that we pick. If the FAI designers are allowed to just pick any prototype they want, they can make the CEV of “humanity” come out however they wish, so they might as well have the FAI use the CEV of themselves. If they pick the prototype by taking the average of all existing humans, then that allows the same attack described in my post.
The problem is indeed there, but if the goal is to find out the human coherent extrapolated volition, then a definition of human is necessary.
If we have no way of picking out human minds from the space of all possible minds, then we don’t really know what we’re optimizing for. We can’t rule out the possibility that a human mind will come into existence that will not be (perfectly) happy with the way things turn out.* This may well be an inherent problem in CEV. If FAI will prevent such humans from coming into existence, then it has in effect enforced its own definition of a human on humanity.
But let’s try to salvage it. What if you were to use existing humans as a training set for an AI to determine what a human is and is not (assuming you can indeed carve reality/mind-space at the joints, which I am unsure about). Then you can use this definition to pick out the possible human minds from mind-space and calculate their coherent extrapolated volition.
This would be resistant to identity-space stuffing like what you describe, but not resistant to systematic wiping out of certain genes/portions of identity-space before CEV-application.
But the wiping out of genes and introduction of new ones is the very definition of evolution. We then would need to differentiate between intentional wiping out of genes by certain humans from natural wiping out of genes by reality/evolution, a rabbit-hole I can’t see the way out of, possibly a category error. If we can’t do that, we have to accept the gene-pool at the time of CEV-activation as the verdict of evolution about what a human is, which leaves a window open to gaming by genocide.
Perhaps taking as a starting poing the time of introduction of the idea of CEV as a means of preventing the possibility of manipulation would work, or perhaps trying to infer if there was any intentional gaming of CEV would also work. Actually this would deal with both genocide and stuffing without any additional steps. But this is assuming rewinding time and global knowledge of all human thoughts and memories as capabilities. Great fun :)
*coming to think of it, what guarantees that the result of CEV will not be something that some of us simply do not want? If such clusters exist, will the FAI create separate worlds for each one?
EDIT: Do you think there would be a noticeable difference between 1900AD!CEV and 2000AD!CEV?
Who ever said that CEV is about taking the average utility of all existing humans? The method of aggregating personal utilities should be determined by the extrapolation, on the basis of human cognitive architecture, and not by programmer fiat.
So how about taking all humans that do exist, determining the boundary humans, and using the entire section of identity-space delineated by them? That is still vulnerable to Dr. Evil killing everyone, but not to the trillion near-copy strategy. No?
Yep. This is also, arguably, why cryonics doesn’t work.
I don’t follow your logic. What is the connection to cryonics?
If it’s just to use volitions of “people in general” as opposed to “people alive now”, it might also turn out to be correct to ignore “people present now” altogether, including cryonauts, in constructing the future. So, this is not just an argument for cryonics not working, but also for people alive at the time in no sense contributing/surviving (in other words, everyone survives to an equal degree, irrespective of whether they opted in on cryonics).
Conditional on the development of FAI. The fact that humanity arises and produces FAI capable of timeless trade at a certain rate would cause the human-volition regarding behaviors of all trading super-intelligences and of all human friendly AIs in the multiverse.
Can’t parse this statement. The fact that a FAI could appear causes “human-volition” about all AGIs? What’s human-volition, what does it mean to cause human-volition, how is the fact that FAI could appear relevant?
I’m stumped. Can you elaborate?
Do you mean that cryonics doesn’t work, or that cryonics isn’t worth doing?
Because for every possible human mind, there’s one with the sign flipped on it’s utility function. Unless your definition of “human” describes their utility function, in which case …