Does is rely on true meanings of words, particularly? Why not on concepts? Individually, “vibrations of air” and “auditory experiences” can be coherent.
gRR
I think seeking and refining such plans would be a worthy goal. For one thing, it would make LW discussions more constructive. Currently, as far as I can tell, CEV is very broadly defined, and its critics usually point at some feature and cast (legitimate) doubt on it. Very soon, CEV is apparently full of holes and one may wonder why is it not thrown away already. But they may be not real holes, just places where we do not know enough yet. If these points are identified and stated in a form of questions of fact, which can be answered by future research, then a global plan, in the form of a decision tree, could be made and reasoned about. That would be a definite progress, I think.
Why is it important that it be uncontroversial?
I’m not sure. But it seems a useful property to have for an AI being developed. It might allow centralizing the development. Or something.
Ok, you’re right in that a complete lack of controversy is impossible, because there are always trolls, cranks, conspiracy theorists, etc. But is it possible to reach a consensus among all sufficiently well-informed sufficiently intelligent people? Where “sufficiently” is not a too high threshold?
What I’m trying to do is find some way to fix the goalposts. Find a set of conditions on CEV that would satisfy. Whether such CEV actually exists and how to build it are questions for later. Lets just pile up constraints until a sufficient set is reached. So, lets assume that:
“Unanimous” CEV exists
And is unique
And is definable via some easy, obviously correct, and unique process, to be discovered in the future,
And it basically does what I want it to do (fulfil universal wishes of people, minimize interference otherwise),
would you say that running it is uncontroversial? If not, what other conditions are required?
I value the universe with my friend in it more than one without her.
Ok, but do you grant that running a FAI with “unanimous CEV” is at least (1) safe, and (2) uncontroversial? That the worst problem with it is that it may just stand there doing nothing—if I’m wrong about my hypothesis?
People are happy, by definition, if their actual values are fulfilled
Yes, but values depend on knowledge. There was an example by EY, I forgot where, in which someone values a blue box because they think the blue box contains a diamond. But if they’re wrong, and it’s actually the red box that contains the diamond, then what would actually make them happy—giving them the blue or the red box? And would you say giving them the red box is making them suffer?
Well, perhaps yes. Therefore, a good extrapolated wish would include constraints on the speed of its own fulfillment: allow the person to take the blue box, then convince them that it is the red box they actually want, and only then present it. But in cases where this is impossible (example: blue box contains horrible violent death), then it is wrong to say that following the extrapolated values (withholding the blue box) is making the person suffer. Following their extrapolated values is the only way to allow them to have a happy life.
- May 22, 2012, 6:30 PM; 1 point) 's comment on Holden’s Objection 1: Friendliness is dangerous by (
VHEMT supports human extinction primarily because, in the group’s view, it would prevent environmental degradation. The group states that a decrease in the human population would prevent a significant amount of man-made human suffering.
Obviously, human extinction is not their terminal value.
I believe there exist (extrapolated) wishes universal for humans (meaning, true for literally everyone). Among these wishes, I think there is the wish for humans to continue existing. I would like for AI to fulfil this wish (and other universal wishes if there are any), while letting people decide everything else for themselves.
But he assumes that it is worse for me because it is bad for my friend to have died. Whereas, in fact, it is worse for me directly.
People sometimes respond that death isn’t bad for the person who is dead. Death is bad for the survivors. But I don’t think that can be central to what’s bad about death. Compare two stories.
Story 1. Your friend is about to go on the spaceship that is leaving for 100 Earth years to explore a distant solar system. By the time the spaceship comes back, you will be long dead. Worse still, 20 minutes after the ship takes off, all radio contact between the Earth and the ship will be lost until its return. You’re losing all contact with your closest friend.
Story 2. The spaceship takes off, and then 25 minutes into the flight, it explodes and everybody on board is killed instantly.
Story 2 is worse. But why? It can’t be the separation, because we had that in Story 1. What’s worse is that your friend has died. Admittedly, that is worse for you, too, since you care about your friend. But that upsets you because it is bad for her to have died.Actually, I think the universe is better for me with my friend being alive in it, even if I won’t ever see her. My utility function is defined over the world states, not over my sensory inputs.
For extrapolation to be conceptually plausible, I imagine “knowledge” and “intelligence level” to be independent variables of a mind, knobs to turn. To be sure, this picture looks ridiculous. But assuming, for the sake of argument, that this picture is realizable, extrapolation appears to be definable.
Yes, many religious people wouldn’t want their beliefs erased, but only because they believe them to be true. They wouldn’t oppose increasing their knowledge if they knew it was true knowledge. Cases of belief in belief would be dissolved if it was known that true beliefs were better in all respects, including individual happiness.
Coherence, one way or another, is unlikely to exist. Humans want a bunch of different things...
Yes, I agree with this. But, I believe there exist wishes universal for (extrapolated) humans, among which I think there is the wish for humans to continue existing. I would like for AI to fulfil this wish (and other universal wishes if there are any), while letting people decide everything else for themselves.
Paperclipping is also self-consistent in that limit. That doesn’t make me want to include it in the CEV
Then we can label paperclipping as a “true” value too. However, I still prefer true human values to be maximized, not true clippy values.
Evidence please. There’s a long long leap from ordinary gaining knowledge and intelligence through human life, to “the limit of infinite knowledge and intelligence”. Moreover we’re considering people who currently explicitly value not updating their beliefs in the face of knowledge, and basing their values on faith not evidence. For all I know they’d never approach your limit in the lifetime of the universe, even if it is the limit given infinite time. And meanwhile they’d be very unhappy.
As I said before, if someone’s mind is that incompatible with truth, I’m ok with ignoring their preferences in the actual world. They can be made happy in a simulation, or wireheaded, or whatever the combined other people’s CEV thinks best.
So you’re saying it wouldn’t modify the world to fit their new evolved values until they actually evolved those values?
No, I’m saying, the extrapolated values would probably estimate the optimal speed for their own optimization. You’re right, though, it is all speculations, and the burden of proof is on me. Or on whoever will actually define CEV.
What makes you give them such a label as “true”?
They are reflectively consistent in the limit of infinite knowledge and intelligence. This is a very special and interesting property.
In your CEV future, the extrapolated values are maximized. Conflicting values, like the actual values held today by many or all people, are necessarily not maximized.
But people would change—gaining knowledge and intelligence—and thus would become happier and happier with time. And I think CEV would try to synchronize this with the timing of its optimization process.
why extrapolate values at all
Extrapolated values are the true values. Whereas the current values are approximations, sometimes very bad and corrupted approximations.
they will suffer in the CEV future
This does not follow.
Errr. This is a question of simple fact, which is either true or false. I believe it’s true, and build the plans accordingly. We can certainly think about contingency plans, of what to do if the belief turns out to be false, but so far no one agreed that the plan is good even in case the belief is true.
Dunno… propose to kill them quickly and painlessly, maybe? But why do you ask? As I said, I don’t expect this to happen.
No, because “does CEV fulfill....?” is not a well-defined or fully specified question. But I think, if you asked “whether it is possible to build FAI+CEV in such a way that it fulfills the wish(es) of literally everyone while affecting everything else the least”, they would say they do not know.
I’d think someone’s playing a practical joke on me.
If it extrapolates coherently, then it’s a single concept, otherwise it’s a mixture :)
This may actually be doable, even at present level of technology. You gather a huge text corpus, find the contexts where the word “sound” appears, do the clustering using some word co-occurence metric. The result is a list of different meanings of “sound”, and a mapping from each mention to the specific meaning. You can also do this simultaneously for many words together, then it is a global optimization problem.
Of course, AGI would be able to do this at a deeper level than this trivial syntactic one.