Ignore all the stuff about provably friendly AI, because AFAIK its fairly stuck at the fundamental level of theoretical impossibility due to lob’s theorem and its prob going to take a lot more than five years. Instead, work on cruder methods which have less chance of working but far more chance of actually being developed in time. Specifically, if Google are developing it in 5 years, then its probably going to be deepmind with DNNs and RL, so work on methods that can fit in with that approach.
I agree. I think it’s very unlikely FAI could be produced from MIRI’s very abstract approach. At least anytime soon.
There are some methods that may work on NN based approaches. For instance my idea for an AI that pretends to be human. In general, you can make AIs that do not have long-term goals, only short term ones. Or even AIs that don’t have goals at all and just make predictions. E.g., predicting what a human would do. The point is to avoid making them agents that maximize values in the real world.
These ideas don’t solve FAI on their own. But they do give a way of getting useful work out of even very powerful AIs. You could task them with coming up with FAI ideas. The AIs could write research papers, review papers, prove theorems, write and review code, etc.
I also think it’s possible that RL isn’t that dangerous. Reinforcement learners can’t model death and don’t care about self-preservation. They may try to hijack their own reward signal, but it’s difficult to understand what they would do after that. E.g. if they just tweak their own RAM to have reward = +Inf, and then not do anything else. It may be harder to create a working paperclip maximizer than is commonly believed, even if we do get superintelligent AI.
I agree. FAI somehow should use human upload or human-like architecture for its value core. In this case values will be presented in it in complex and non-ortogonal ways, and at least one human-like creature will survive.
Yes. I think that we need not only workable solution, but also implementable. If someone create 800 pages pdf starting with new set theory, solution of Lob theorem problem etc and come to Google with it and say: “Hi, please, switch off all you have and implement this”—it will not work.
But MIRI added in 2016 the line of research for machine learning.
Get a job at Google or seek to influence the people developing the AI. If, say, you were a beautiful woman you could, probably successfully, start a relationship with one of Google’s AI developers.
We don’t have an AGI that doesn’t kill us. Having one would be a significant step towards FAI. In fact, “a human-equivalent-or-better AGI that doesn’t do anything greatly harmful to humanity” is a pretty good definition of FAI, or maybe “weak FAI”.
If it’s a tool AGI, I don’t see how it would help with friendliness, and if it’s an active self-developing AGI, I thought the canonical position of LW was that there could be only one? and it’s too late to do anything about friendliness at this point?
I agree there would probably only be one successful AGI, so it’s not the first step of many. I meant it would be a step in that direction. Poor phrasing on my part.
We don’t know what an AI which maximizes human values is because we don’t know what human values are at the necessary level of precision. Not to mention the assumption that the AI will be a maximizer and that values can be maximized.
Who says we need to hardcode human values though? Any reasonable solution will involve an AI that learns what human values are. Or some other method to the control problem that makes AIs that don’t want to harm or defy their creators.
But if you don’t know what human values are, how can you be sure that the AI will learn them correctly?
So you make an AI and tell it: “Go forth and learn human values!” It goes and in a while comes back and says “Behold, I have learned them”. How do you know this is true?
If I train a neural network to recognize dogs, I have no way of knowing if it learned correctly. I can’t look at the weights and see if they are correct dog image recognizing weights and not something else. But I can trust the process of training and validation, that the AI has learned to recognize what dogs look like.
It’s a similar principle with learning human values. Of course it’s more complicated than just feeding it images of dogs, but the principle of letting AIs learn models from real world data is the important part.
If I train a neural network to recognize dogs, I have no way of knowing if it learned correctly.
Of course you do. You test it. You show it a lot of images (that it hasn’t seen before) of dogs and not-dogs and check how good it is at differentiating them.
How would that process work for an AI and human values?
the principle of letting AIs learn models from real world data
Right, human values: “A man’s greatest pleasure is to defeat his enemies, to drive them before him, to take from them that which they possessed, to see those whom they cherished in tears, to ride their horses, and to hold their wives and daughters in his arms.”
Do you expect me to give you the complete solution to AI right here, right now? What are you even trying to say? You seem to be arguing that FAI is impossible. How can you possibly know that? Just because you can’t immediately see a solution to the problem, doesn’t mean a solution doesn’t exist.
I think an AI will easily be able to learn human values from observations. It will be able to build a model of humans, and predict what we will do and say. It certainly won’t base all it’s understanding on a stupid movie quote. The AI will know what you want.
I’m saying that if you can’t recognize Friendliness (and I don’t think you can), trying to build a FAI is pointless as you will not be able to answer “Is it Friendly?” even when looking at it.
I think an AI will easily be able to learn human values from observations.
So if you can’t build a supervised model, you think going to unsupervised learning will solve your problems? The quote I gave you is part of human values—humans do value triumph over their enemies. Evolution taught humans to eliminate competition, it taught them to be aggressive and greedy—all human values. Why do you think your values will be preferred by the AI to values of, say, ISIS or third-world Maoist guerrillas? They’re human, too.
Why do I need to recognize Friendliness to build an FAI? I only need to know that the process used to construct it results in a friendly AI. Trying to inspect the weights of a complex neural network (or whatever) is pointless as I stated earlier. We haven’t the slightest idea how alphaGo’s net really works, but we can trust it to beat the best Go champions.
Evolution taught humans to eliminate competition, it taught them to be aggressive and greedy—all human values.
Evolution also taught humans to be cooperative, empathetic, and kind.
Really your objection seems to be the whole point of CEV. A CEV wouldn’t just include the values of ISIS members, but also their victims. And it would be extrapolated, to not just be their current opinions on things, but what their opinions would be if they knew more. Their values if they had more time to think about and consider issues. With those two conditions, the negative parts of human values are entirely eliminated.
I only need to know that the process used to construct it results in a friendly AI.
You are still facing the same problem. Given that you can’t recognize friendliness, how will you create or choose a process which will build a FAI? Would you be able to answer “Will it be friendly?” by looking at the process?
the negative parts of human values are entirely eliminated.
That doesn’t make much sense. What do you mean by “negative” and from which point of view? If from the point of view of the AI, that’s just a trivial tautology. If from the point of view of (at least some) humans, this seems to be not so.
In general, do you treat morals/values as subjective or objective? If objective, the whole “if they knew more” part is entirely unnecessary: you’re discovering empirical reality, not consulting with people on what do they like. And subjectivism here, of course, makes the whole idea of CEV meaningless.
Also, I see no evidence to support the view that as people know more, their morals improve, for pretty much any value of “improve”.
how will you create or choose a process which will build a FAI?
You are literally asking me to solve the FAI problem right here and now. I understand that FAI is a very hard problem and I don’t expect to solve it instantly. Just because a problem is hard, doesn’t mean it can’t have a solution.
First of all let me adopt some terminology from Superintelligence. I think FAI requires solving two somewhat different problems. Value Learning and Value Loading.
You seem to think Value Learning is the hard problem, getting an AI to learn what humans actually want. I think that’s the easy problem, and any intelligent AI will form a model of humans and understand what we want. Getting it to care about what we want seems like the hard problem to me.
But I do see some promising ideas to approach the problem. For instance have AIs that predict what choices a human would make in each situation. So you basically get an AI which is just a human, but sped up a lot. Or have an AI which presents arguments for and against each choice, so that humans can make more informed choices. Then it could predict what choice a human would make after hearing all the arguments, and do that.
More complicated ideas were mentioned in Superintelligence. I like the idea of “motivational scaffolding”.Somehow train an AI that can learn how the world works and can generate an “interpretable model”. Like e.g. being able to understand English sentences and translate their meanings to representations the AI can use. Then you can explicitly program a utility function into the AI using its learned model.
That doesn’t make much sense. What do you mean by “negative” and from which point of view?
From your point of view. You gave me examples of values which you consider bad, as an argument against FAI. I’m showing you that CEV would eliminate these things.
Also, I see no evidence to support the view that as people know more, their morals improve, for pretty much any value of “improve”.
Your stated example was ISIS. ISIS is so bad because they incorrectly believe that God is on their side and wants them to do the things they do. That the people that die will go to heaven, so loss of life isn’t so bad. If they were more intelligent, informed, and rational… If they knew all the arguments for and against religion, then their values would be more like ours. They would see how bad killing people is, and that their religion is wrong.
The second thing CEV does is average everyone’s values together. So even if ISIS really does value killing people, their victims value not being killed even more. So a CEV of all of humanity would still value life, even if evil people’s values are included. Even if everyone was a sociopath, their CEV would still be the best compromise possible, between everyone’s values.
You are literally asking me to solve the FAI problem right here and now.
No, I’m asking you to specify it. My point is that you can’t build X if you can’t even recognize X.
You seem to think Value Learning is the hard problem, getting an AI to learn what humans actually want.
Learning what humans want is pretty easy. However it’s an inconsistent mess which involves many things contemporary people find unsavory. Making it all coherent and formulating a (single) policy on the basis of this mess is the hard part.
From your point of view. You gave me examples of values which you consider bad, as an argument against FAI. I’m showing you that CEV would eliminate these things.
Why would CEV eliminate things I find negative? This is just a projected typical mind fallacy. Things I consider positive and negatve are not (necessarily) things many or most people consider positive and negative. Since I don’t expect to find myself in a privileged position, I should expect CEV to eliminate some things I believe are positive and impose some things I believe are negative.
Later you say that CEV will average values. I don’t have average values.
If they knew all the arguments for and against religion, then their values would be more like ours. They would see how bad killing people is, and that their religion is wrong.
I see no evidence to believe this is true and lots of evidence to believe this is false.
You are essentially saying that religious people are idiots and if only you could sit them down and explain things to them, the scales would fall from their eyes and they will become atheists.This is a popular idea, but it fails real-life testing very very hard.
No, I’m asking you to specify it. My point is that you can’t build X if you can’t even recognize X.
And I don’t agree with that. I’ve presented some ideas on how an FAI could be built, and how CEV would work. None of them require “recognizing” FAI. What would it even mean to “recognize” FAI, except to see that it values the kinds of things we value and makes the world better for us.
Learning what humans want is pretty easy. However it’s an inconsistent mess which involves many things contemporary people find unsavory. Making it all coherent and formulating a (single) policy on the basis of this mess is the hard part.
I’ve written about one method to accomplish this, though there may be better methods.
Why would CEV eliminate things I find negative? This is just a projected typical mind fallacy. Things I consider positive and negatve are not (necessarily) things many or most people consider positive and negative.
Humans are 99.999% identical. We have the same genetics, the same brain structures, and mostly the same environments. The only reason this isn’t obvious, is because we spend almost all our time focusing on the differences between people, because that’s what’s useful in everyday life.
I should expect CEV to eliminate some things I believe are positive and impose some things I believe are negative.
That may be the case, but that’s still not a bad outcome. In the example I used, the values dropped from ISIS members were taken for 2 reasons. That they were based on false beliefs, or that they hurt other people. If you have values based on false beliefs, you should want them to be eliminated. If you have values that hurt other people then it’s only fair that be eliminated. Or else you risk the values of people that want to hurt you.
Later you say that CEV will average values. I don’t have average values.
Well I think it’s accurate, but it’s somewhat nonspecific. Specifically, CEV will find the optimal compromise of values. The values that satisfy the most people the most amount. Or at least dissatisfy the fewest people the least. See the post I just linked for more details, on one example of how that could be implemented. That’s not necessarily “average values”.
In the worst case, people with totally incompatible values will just be allowed to go separate ways, or whatever the most satisfying compromise is. Muslims live on one side of the dyson sphere, Christians on the other, and they never have to interact and can do their own thing.
You are essentially saying that religious people are idiots and if only you could sit them down and explain things to them, the scales would fall from their eyes and they will become atheists.This is a popular idea, but it fails real-life testing very very hard.
My exact words were “If they were more intelligent, informed, and rational… If they knew all the arguments for and against...” Real world problems of persuading people don’t apply. Most people don’t research all the arguments against their beliefs, and most people aren’t rational and seriously consider the hypothesis that they are wrong.
For what it’s worth, I was deconverted like this. Not overnight by any means. But over time I found that the arguments against my beliefs were correct and I updated my belief.
Changing world views is really really hard. There’s no one piece of evidence or one argument to dispute. Religious people believe that there is tons of evidence of God. To them it just seems obviously true. From miracles, to recorded stories, to their own personal experiences, etc. It takes a lot of time to get at every single pillar of the belief and show its flaws. But it is possible. It’s not like Muslims were born believing in Islam. Islam is not encoded in genetics. People deconvert from religions all the time, entire societies have even done it.
In any case, my proposal does not require literally doing this. It’s just a thought experiment. To show that the ideal set of values is what you choose if you had all the correct beliefs.
It means that when you look an an AI system, you can tell whether it’s FAI or not.
If you can’t tell, you may be able to build an AI system, but you still won’t know whether it’s FAI or not.
I’ve written about one method to accomplish this
I don’t see what voting systems have to do with CEV. The “E” part means you don’t trust what the real, current humans say, so to making them vote on anything is pointless.
Humans are 99.999% identical.
That’s a meaningless expression without a context. Notably, we don’t have the same genes or the same brain structures. I don’t know about you, but it is really obvious to me that humans are not identical.
...false beliefs … it’s only fair …
How do you know what’s false? You are a mere human, you might well be mistaken. How do you know what’s fair? Is it an objective thing, something that exists in the territory?
The values that satisfy the most people the most amount.
Right, so the fat man gets thrown under the train… X-)
Muslims live on one side of the dyson sphere, Christians on the other
Hey, I want to live on the inside. The outside is going to be pretty gloomy and cold :-/
Real world problems of persuading people don’t apply.
LOL. You’re just handwaving then. “And here, in the difficult part, insert magic and everything works great!”
It means that when you look an an AI system, you can tell whether it’s FAI or not.
Look at it how? Look at it’s source code? I argued that we can write source code that will result in FAI, and you could recognize that. Look at the weights of it’s “brain”? Probably not, anymore than we can look at human brains and recognize what they do. Look at it’s actions? Definitely, FAI is an AI that doesn’t destroy the world etc.
I don’t see what voting systems have to do with CEV. The “E” part means you don’t trust what the real, current humans say, so to making them vote on anything is pointless.
The voting doesn’t have to actually happen. The AI can predict what we would vote for, if we had plenty of time to debate it. And you can get even more abstract than that and have the FAI just figure out the details of E itself.
The point is to solve the “coherent” part. That you can find a set of coherent values from a bunch of different agents or messy human brains. And to show that mathematicians have actually extensively studied a special case of this problem, voting systems.
That’s a meaningless expression without a context. Notably, we don’t have the same genes or the same brain structures. I don’t know about you, but it is really obvious to me that humans are not identical.
Compared to other animals, compared to aliens, yes we are incredibly similar. We do have 99.99% identical DNA, our brains all have the same structure with minor variations.
How do you know what’s false?
Did I claim that I did?
How do you know what’s fair? Is it an objective thing, something that exists in the territory?
I gave a precise algorithm for doing that actually.
Right, so the fat man gets thrown under the train… X-)
Which is the best possible outcome, vs killing 5 other people. But I don’t think these kinds of scenarios are realistic once we have incredibly powerful AI.
LOL. You’re just handwaving then. “And here, in the difficult part, insert magic and everything works great!”
I’m not handwaving anything… There is no magic involved at all. The whole scenario of persuading people is counterfactual and doesn’t need to actually be done. The point is to define more exactly what CEV is. It’s the values you would want if you had the correct beliefs. You don’t need to actually have the correct beliefs, to give your CEV.
We typically imagine CEV asking what people would do if they ‘knew what the AI knew’ - let’s say the AI tries to estimate expected value of a given action, with utility defined by extrapolated versions of us who know the truth, and probabilities taken from the AI’s own distribution. I am absolutely saying that theism fails under any credible epistemology, and any well-programmed FAI would expect ‘more knowledgeable versions of us’ to become atheists on general principles. Whether or not this means they would change “if they knew all the arguments for and against religion,” depends on whether or not they can accept some extremely basic premise.
(Note that nobody comes into the word with anything even vaguely resembling a prior that favors a major religion. We might start with a bias in favor of animism, but nearly everyone would verbally agree this anthropomorphism is false.)
It seems much less clear if CEV would make psychopathy irrelevant. But potential victims must object to their own suffering at least as much as real-world psychopaths want to hurt them. So the most obvious worst-case scenario, under implausibly cynical premises, looks more like Omelas than it does a Mongol invasion. (Here I’m completely ignoring the clause meant to address such scenarios, “had grown up farther together”.)
We typically imagine CEV asking what people would do if they ‘knew what the AI knew’
No, we don’t, because this would be a stupid question. CEV doesn’t ask people, CEV tells people what they want.
any well-programmed FAI would expect ‘more knowledgeable versions of us’ to become atheists on general principles.
I see little evidence to support this point of view. You might think that atheism is obvious, but a great deal of people, many of them smarter than you, disagree.
It is not irrelevant. You said, “With those two conditions, the negative parts of human values are entirely eliminated.” That certainly meant that things like ISIS opinions would be eliminated. I agree in that particular case, but there are many other things that you would consider negative which will not be eliminated. I can probably guess some of them, although I won’t do that here.
I read that. You say there, “Your stated example was ISIS. ISIS is so bad because they incorrectly believe… If they knew all the arguments for and against religion, then their values would be more like ours.” As I said, I agree with you in that case. But you are indeed saying, “it is because I am right and when they know better they will know I was right.” And that will not always be true, even if it is true in that case.
I never claimed I am right about everything. I don’t need to be right about everything. I would love to have an AI show me what I am wrong about and show me the perfect set of values.
And most importantly, I’m saying that this process would result in the optimal set of values for everyone. Do you disagree?
Yes, I disagree. I think that “babyeater values are different from human values” differs only in degree from “my values are different from your values.” I do not think there is a reasonable chance that I will turn out to be wrong about this, just like there is no reasonable chance that if we measure our heights with sufficient accuracy, we will turn out to have different heights. This is still another reason why we should speak of “babyeater morality” and “human morality,” namely because if morality is inconsistent with variety, then morality does not exist.
That said, I already said that I would not be willing to wipe out non-human values from the cosmos, and likewise I have no interest in imposing my personal values on everything else. I think these are really the same thing, and in that sense wanting to impose a CEV on the universe is being a “racist” in relation to human beings vs other intelligent beings.
People may have different values (although I think deep down we are very similar, humans sharing all the same brains and not having that much diversity.) Regardless, CEV should find the best possible compromise between our different values. That’s literally the whole point.
If there is a difference in our values, the AI will find the compromise that satisfies us the most (or dissatisfies us the least.) There is no alternative, besides not compromising at all and just taking the values of a single random person. From behind the veil of ignorance, the first is definitely preferable.
I don’t think this will be so bad. Because I don’t think our values diverge so much, or that decent compromises are impossible between most values. I imagine that in the worst case, the compromise will be that two groups with different values will have to go their separate ways. Live on opposite sides of the world, never interact, and do their own thing. That’s not so bad, and a post-singularity future will have more than enough resources to support it.
That said, I already said that I would not be willing to wipe out non-human values from the cosmos
No one is suggesting we wipe out non-human values. But we have yet to meet any intelligent aliens with different values. Once we do so, we may very well just apply CEV to them and get the best compromise of our values again. Or we may keep our own values, but still allow them to live separately and do their own thing, because we value their existence.
This reminds me a lot of the post value is fragile. It’s ok to want a future that has different beings in it, that are totally different than humans. That doesn’t violate my values at all. But I don’t want a future that has beings die or suffer involuntarily. I don’t think it’s “value racist” to want to stop beings that do value that.
“Once we do so, we may very well just apply CEV to them and get the best compromise of our values again. Or we may keep our own values, but still allow them to live separately and do their own thing, because we value their existence.”
The problem I have with what you are saying is that these are two different things. And if they are two different things in the case of the aliens, they are two different things in the case of the humans.
The CEV process might well be immoral for everyone concerned, since by definition it is compromising a person’s fundamental values. Eliezer agrees this is true in the case of the aliens, but he does not seem to notice that it would also be true in the case of the humans.
In any case, I choose in advance to keep my own values, not to participate in changing my fundamental values. But I am also not going to impose those on anyone else. If you define CEV to mean “the best possible way to keep your values completely intact and still not impose them on anyone else,” then I would agree with it, but only because we will be stipulating the desired conclusion.
That does not necessarily mean “living separately”. Even now I live with people who, in every noticeable way, have values that are fundamentally different from mine. That does not mean that we have to live separately.
In regard to the last point, you are saying that you don’t want to eliminate all potential aliens, but you want to eliminate ones with values that you really dislike. I think that is basically racist.
There is some truth in it, however, insofar as in reality, for reasons I have been saying, beings that have fundamental desires for others to suffer and die are very unlikely indeed, and any such desires are likely to be radically qualified. To that degree you are somewhat right: desires like that are in fact evil. But because they are evil, they cannot exist.
The CEV process might well be immoral for everyone concerned, since by definition it is compromising a person’s fundamental values.
The world we live in is “immoral” in that it’s not optimized towards anyone’s values. Taking a single person’s values would be “immoral” to everyone else. CEV, finding the best possible compromise of values, would be the least immoral option, on average. Optimize the world in a way that dissatisfies the least people the least amount.
That does not necessarily mean “living separately”.
Right. I said that’s the realistic worst case, when no compromise is possible. I think most people have similar enough values that this would be rare.
you want to eliminate ones with values that you really dislike. I think that is basically racist.
I don’t necessarily want to kill them, but I would definitely stop them from hurting other beings. Imagine you came upon a race of aliens that practiced a very cruel form of slavery. Say 90% of their population was slaves, and the slave owning class treated regularly tortured and overworked them. Would you stop them, if you could? Is that racist? What about the values of the slaves?
I think optimizing anything is always immoral, exactly because it means imposing things that you should not be imposing. It is also the behavior of a fanatic, not a normal human being; that is the whole reason for the belief that AIs would destroy the world, namely because of the belief that they would behave like fanatics instead of like intelligent beings.
In the case of the slave owning race, I am quite sure that slavery is not consistent with their fundamental values, even if they are practicing it for a certain time. I don’t admit that values are arbitrary, and consequently you cannot assume (at least without first proving me wrong about this) that any arbitrary value could be a fundamental value for something.
Well now I see we disagree at a much more fundamental level.
There is nothing inherently sinister about “optimization”. Humans are optimizers in a sense, manipulating the world to be more like how we want it to be. We build sophisticated technology and industries that are many steps removed from our various end goals. We dam rivers, and build roads, and convert deserts into sprawling cities. We convert the resources of the world into the things we want. That’s just what humans do, that’s probably what most intelligent beings do.
The definition of FAI, to me, is something that continues that process, but improves it. Takes over from us, and continues to run the world for human ends. Makes our technologies better and our industries more efficient, and solves our various conflicts. The best FAI is one that constructs a utopia for humans.
I don’t know why you believe a slave owning race is impossible. Humans of course practiced slavery in many different cultures. It’s very easy for even humans to not care about the suffering of other groups. And even if you do believe most humans could be convinced it’s wrong (I’m not so sure), there are actual sociopaths that don’t experience empathy at all.
Humans also have plenty of sinister values, and I can easily believe aliens could exist that are far worse. Evolution tended to evolve humans that cooperate and have empathy. But under different conditions, we could have evolved completely differently. There is no law of the universe that says beings have to have values like us.
“Well now I see we disagree at a much more fundamental level.” Yes. I’ve been saying that since the beginning of this conversation.
If humans are optimizers, they must be optimizing for something. Now suppose someone comes to you and says, “do you agree to turn on this CEV machine?”, when you respond, are you optimizing for the thing or not? If you say yes, and you are optimizing the original thing, then the CEV cannot (as far as you know) be compromising the thing you were optimizing for. If you say yes and are not optimizing for it, then you are not an optimizer. So you must agree with me on at least one point: either 1) you are not an optimizer, or 2) you should not agree with CEV if it compromises your personal values in any way. I maintain both of those, but you must maintain at least one of them.
In earlier posts I have explained why it is not possible that you are really an optimizer (not during this particular discussion.) People here tend to neglect the fact that an intelligent thing has a body. So e.g. Eliezer believes that an AI is an algorithm, and nothing else. But in fact an AI has a body just as much as we do. And those bodies have various tendencies, and they do not collectively add up to optimizing for anything, except in an abstract sense in which everything is an optimizer, like a rock is an optimizer, and so on.
“We convert the resources of the world into the things we want.” To some extent, but not infinitely, in a fanatical way. Again, that is the whole worry about AI—that it might do that fanatically. We don’t.
I understand you think that some creatures could have fundamental values that are perverse from your point of view. This is because you, like Eliezer, think that values are intrinsically arbitrary. I don’t, and I have said so from the beginning. It might be true that slave owning values could be fundamental in some exterrestrial race, but if they were, slavery in that race would be very, very different from slavery in the human race, and there would be no reason to oppose it in that race. In fact, you could say that slavery exists in a fundamental way in the human race, and there is no reason to oppose it: parents can tell their kids to stay out of the road, and they have to obey them, whether they want to or not. Note that this is very, very different from the kind of slavery you are concerned about, and there is no reason to oppose the real kind.
I can still think the CEV machine is better than whatever the alternative is (for instance, no AI at all.) But yes, in theory, you should prefer to make AIs that have your own values and not bother with CEV.
Having a body is irrelevant. Bodies are just one way to manipulate the world to optimize your goals.
“We convert the resources of the world into the things we want.” To some extent, but not infinitely, in a fanatical way. Again, that is the whole worry about AI—that it might do that fanatically. We don’t.
What do you mean by “fanatically”? This is a pretty vague word. Humans would sure seem fanatical to other animals. We’ve cut down entire continent sized forests, drained massive lakes, and built billions of complex structures.
The only reason we haven’t “optimized” the Earth further, is because of physical and economic limits. If we could we probably would.
Whether you call that “optimization” or not, is mostly irrelevant. If superintelligent AIs acted similarly, humans would be screwed.
I’m deeply concerned that you are theoretically ok with slave owning aliens. If the slaves are ok with it, then perhaps it could be justified. But if they strongly object to it, and suffer from it, and don’t get any benefit from it, then it’s just obviously wrong.
“Having a body is irrelevant. Bodies are just one way to manipulate the world to optimize your goals.”
This is not true. Bodies are physical objects that follow the laws of physics, and the laws of physics are not “just one way to manipulate the world to optimize your goals,” because the laws have nothing to do with your goals. For example, we often don’t keep doing something because we are tired, not because we have a goal of not continuing. AIs will be quite capable of doing the same thing, as for example if thinking too hard about something begins to weaken its circuits.
What I mean by fanatically is trying to optimize for a single goal as though it were the only thing that mattered. We do not do that, nor does anything else with a body, nor is it even possible, for the above reason.
Yes you should be concerned about what I said about slaves and aliens, as it suggests that the CEV machine might result in things that you consider utterly wicked. I said that from the beginning, when you claimed that it would eliminate all negative results, obviously intending that to mean from your subjective point of view.
The CEV process might well be immoral for everyone concerned, since by definition it is compromising a person’s fundamental values.
If ithey find it immoral in the sense of crossing a line that should never be crossed, then they are not going to play.
I don’t think the morals=values theory can tell you where the bright lines are, and that is why I think rules and a few other things are involved in ethics.
There is some truth in it, however, insofar as in reality, for reasons I have been saying, beings that have fundamental desires for others to suffer and die are very unlikely indeed, and any such desires are likely to be radically qualified. To that degree you are somewhat right: desires like that are in fact evil. But because they are evil, they cannot exist
Consider a harder case....a society that is ruthless in crushing any society that offers any rivalry or opposition to them, but otherwise leaves people alone. Since that is a survival promoting strategy, you can’t argue that it would just be selected out. But it doesn’t seem as ethical as more conciliatory approaches.
“It doesn’t seem as ethical as more conciliatory approaches.” I agree. That is because it is not the best strategy. It may not be the worst possible strategy, but it is not the best. And since the people engaging in that strategy, their ability to think about it, over time, will lead them to adopt better strategies, namely more conciliatory approaches.
I don’t say that the good is achieved by selection alone. It is also achieved by the use of reason, by things that use reason.
Are you sure? Ont the face of it, doing things like attending peace negotiations exposes you to risks (they take the opportunity to assassinate you, they renege on the agreement, etc) that simply nuking them doesn’t.
It is also achieved by the use of reason, by things that use reason.
If people who reason well don’t get selected, where does the prevalence of good come from?
You can try to permanently exterminate them and fail. Additionally, even if you succeed in one case, you will ensure that no one else will be willing to negotiate with you even when it would be beneficial for you because they are stronger. So overall you will be decreasing your options, which makes your situation worse.
But humans share a lot of values (e.g. wanting to live and not be turned into a dyson sphere.) And a collection of individuals may still have a set of values (see e.g. coherent extrapolated volition.)
Good point, but my question was about what we can do to raise chances that it will be friendly AI.
Ignore all the stuff about provably friendly AI, because AFAIK its fairly stuck at the fundamental level of theoretical impossibility due to lob’s theorem and its prob going to take a lot more than five years. Instead, work on cruder methods which have less chance of working but far more chance of actually being developed in time. Specifically, if Google are developing it in 5 years, then its probably going to be deepmind with DNNs and RL, so work on methods that can fit in with that approach.
I agree. I think it’s very unlikely FAI could be produced from MIRI’s very abstract approach. At least anytime soon.
There are some methods that may work on NN based approaches. For instance my idea for an AI that pretends to be human. In general, you can make AIs that do not have long-term goals, only short term ones. Or even AIs that don’t have goals at all and just make predictions. E.g., predicting what a human would do. The point is to avoid making them agents that maximize values in the real world.
These ideas don’t solve FAI on their own. But they do give a way of getting useful work out of even very powerful AIs. You could task them with coming up with FAI ideas. The AIs could write research papers, review papers, prove theorems, write and review code, etc.
I also think it’s possible that RL isn’t that dangerous. Reinforcement learners can’t model death and don’t care about self-preservation. They may try to hijack their own reward signal, but it’s difficult to understand what they would do after that. E.g. if they just tweak their own RAM to have reward = +Inf, and then not do anything else. It may be harder to create a working paperclip maximizer than is commonly believed, even if we do get superintelligent AI.
I agree. FAI somehow should use human upload or human-like architecture for its value core. In this case values will be presented in it in complex and non-ortogonal ways, and at least one human-like creature will survive.
Yes. I think that we need not only workable solution, but also implementable. If someone create 800 pages pdf starting with new set theory, solution of Lob theorem problem etc and come to Google with it and say: “Hi, please, switch off all you have and implement this”—it will not work.
But MIRI added in 2016 the line of research for machine learning.
Get a job at Google or seek to influence the people developing the AI. If, say, you were a beautiful woman you could, probably successfully, start a relationship with one of Google’s AI developers.
And how she will use this relation to make safer AI?
She could read “The Basic AI Drives” to him at night.
In hope that he will stop creating AI? But in 6 years it will be Microsoft.
I am confused as to whether I should upvote for “get a job at Google” or downvoter for “prostitute yourself”.
Nothing, because we still don’t know what a friendly AI is.
That doesn’t mean that there is nothing to do—if you don’t know what FAI is, then you try to work out what it is.
And how do you find out whether you’re right or not?
We do know it isn’t an AI that kills us. Options b and c still qualify.
Options (b) and (c) are basically wishes and those are complex X-D
“Not kill us” is an easy criterion, we already have an AI like that, it plays Go well.
We don’t have an AGI that doesn’t kill us. Having one would be a significant step towards FAI. In fact, “a human-equivalent-or-better AGI that doesn’t do anything greatly harmful to humanity” is a pretty good definition of FAI, or maybe “weak FAI”.
If it’s a tool AGI, I don’t see how it would help with friendliness, and if it’s an active self-developing AGI, I thought the canonical position of LW was that there could be only one? and it’s too late to do anything about friendliness at this point?
I agree there would probably only be one successful AGI, so it’s not the first step of many. I meant it would be a step in that direction. Poor phrasing on my part.
Friendly AI is an AI which maximizes human values. We know what it is, we just don’t know how to build one. Yet, anyway.
We don’t know what an AI which maximizes human values is because we don’t know what human values are at the necessary level of precision. Not to mention the assumption that the AI will be a maximizer and that values can be maximized.
Who says we need to hardcode human values though? Any reasonable solution will involve an AI that learns what human values are. Or some other method to the control problem that makes AIs that don’t want to harm or defy their creators.
But if you don’t know what human values are, how can you be sure that the AI will learn them correctly?
So you make an AI and tell it: “Go forth and learn human values!” It goes and in a while comes back and says “Behold, I have learned them”. How do you know this is true?
If I train a neural network to recognize dogs, I have no way of knowing if it learned correctly. I can’t look at the weights and see if they are correct dog image recognizing weights and not something else. But I can trust the process of training and validation, that the AI has learned to recognize what dogs look like.
It’s a similar principle with learning human values. Of course it’s more complicated than just feeding it images of dogs, but the principle of letting AIs learn models from real world data is the important part.
Of course you do. You test it. You show it a lot of images (that it hasn’t seen before) of dogs and not-dogs and check how good it is at differentiating them.
How would that process work for an AI and human values?
Right, human values: “A man’s greatest pleasure is to defeat his enemies, to drive them before him, to take from them that which they possessed, to see those whom they cherished in tears, to ride their horses, and to hold their wives and daughters in his arms.”
Do you expect me to give you the complete solution to AI right here, right now? What are you even trying to say? You seem to be arguing that FAI is impossible. How can you possibly know that? Just because you can’t immediately see a solution to the problem, doesn’t mean a solution doesn’t exist.
I think an AI will easily be able to learn human values from observations. It will be able to build a model of humans, and predict what we will do and say. It certainly won’t base all it’s understanding on a stupid movie quote. The AI will know what you want.
I’m saying that if you can’t recognize Friendliness (and I don’t think you can), trying to build a FAI is pointless as you will not be able to answer “Is it Friendly?” even when looking at it.
So if you can’t build a supervised model, you think going to unsupervised learning will solve your problems? The quote I gave you is part of human values—humans do value triumph over their enemies. Evolution taught humans to eliminate competition, it taught them to be aggressive and greedy—all human values. Why do you think your values will be preferred by the AI to values of, say, ISIS or third-world Maoist guerrillas? They’re human, too.
Why do I need to recognize Friendliness to build an FAI? I only need to know that the process used to construct it results in a friendly AI. Trying to inspect the weights of a complex neural network (or whatever) is pointless as I stated earlier. We haven’t the slightest idea how alphaGo’s net really works, but we can trust it to beat the best Go champions.
Evolution also taught humans to be cooperative, empathetic, and kind.
Really your objection seems to be the whole point of CEV. A CEV wouldn’t just include the values of ISIS members, but also their victims. And it would be extrapolated, to not just be their current opinions on things, but what their opinions would be if they knew more. Their values if they had more time to think about and consider issues. With those two conditions, the negative parts of human values are entirely eliminated.
You are still facing the same problem. Given that you can’t recognize friendliness, how will you create or choose a process which will build a FAI? Would you be able to answer “Will it be friendly?” by looking at the process?
That doesn’t make much sense. What do you mean by “negative” and from which point of view? If from the point of view of the AI, that’s just a trivial tautology. If from the point of view of (at least some) humans, this seems to be not so.
In general, do you treat morals/values as subjective or objective? If objective, the whole “if they knew more” part is entirely unnecessary: you’re discovering empirical reality, not consulting with people on what do they like. And subjectivism here, of course, makes the whole idea of CEV meaningless.
Also, I see no evidence to support the view that as people know more, their morals improve, for pretty much any value of “improve”.
You are literally asking me to solve the FAI problem right here and now. I understand that FAI is a very hard problem and I don’t expect to solve it instantly. Just because a problem is hard, doesn’t mean it can’t have a solution.
First of all let me adopt some terminology from Superintelligence. I think FAI requires solving two somewhat different problems. Value Learning and Value Loading.
You seem to think Value Learning is the hard problem, getting an AI to learn what humans actually want. I think that’s the easy problem, and any intelligent AI will form a model of humans and understand what we want. Getting it to care about what we want seems like the hard problem to me.
But I do see some promising ideas to approach the problem. For instance have AIs that predict what choices a human would make in each situation. So you basically get an AI which is just a human, but sped up a lot. Or have an AI which presents arguments for and against each choice, so that humans can make more informed choices. Then it could predict what choice a human would make after hearing all the arguments, and do that.
More complicated ideas were mentioned in Superintelligence. I like the idea of “motivational scaffolding”.Somehow train an AI that can learn how the world works and can generate an “interpretable model”. Like e.g. being able to understand English sentences and translate their meanings to representations the AI can use. Then you can explicitly program a utility function into the AI using its learned model.
From your point of view. You gave me examples of values which you consider bad, as an argument against FAI. I’m showing you that CEV would eliminate these things.
Your stated example was ISIS. ISIS is so bad because they incorrectly believe that God is on their side and wants them to do the things they do. That the people that die will go to heaven, so loss of life isn’t so bad. If they were more intelligent, informed, and rational… If they knew all the arguments for and against religion, then their values would be more like ours. They would see how bad killing people is, and that their religion is wrong.
The second thing CEV does is average everyone’s values together. So even if ISIS really does value killing people, their victims value not being killed even more. So a CEV of all of humanity would still value life, even if evil people’s values are included. Even if everyone was a sociopath, their CEV would still be the best compromise possible, between everyone’s values.
No, I’m asking you to specify it. My point is that you can’t build X if you can’t even recognize X.
Learning what humans want is pretty easy. However it’s an inconsistent mess which involves many things contemporary people find unsavory. Making it all coherent and formulating a (single) policy on the basis of this mess is the hard part.
Why would CEV eliminate things I find negative? This is just a projected typical mind fallacy. Things I consider positive and negatve are not (necessarily) things many or most people consider positive and negative. Since I don’t expect to find myself in a privileged position, I should expect CEV to eliminate some things I believe are positive and impose some things I believe are negative.
Later you say that CEV will average values. I don’t have average values.
I see no evidence to believe this is true and lots of evidence to believe this is false.
You are essentially saying that religious people are idiots and if only you could sit them down and explain things to them, the scales would fall from their eyes and they will become atheists.This is a popular idea, but it fails real-life testing very very hard.
And I don’t agree with that. I’ve presented some ideas on how an FAI could be built, and how CEV would work. None of them require “recognizing” FAI. What would it even mean to “recognize” FAI, except to see that it values the kinds of things we value and makes the world better for us.
I’ve written about one method to accomplish this, though there may be better methods.
Humans are 99.999% identical. We have the same genetics, the same brain structures, and mostly the same environments. The only reason this isn’t obvious, is because we spend almost all our time focusing on the differences between people, because that’s what’s useful in everyday life.
That may be the case, but that’s still not a bad outcome. In the example I used, the values dropped from ISIS members were taken for 2 reasons. That they were based on false beliefs, or that they hurt other people. If you have values based on false beliefs, you should want them to be eliminated. If you have values that hurt other people then it’s only fair that be eliminated. Or else you risk the values of people that want to hurt you.
Well I think it’s accurate, but it’s somewhat nonspecific. Specifically, CEV will find the optimal compromise of values. The values that satisfy the most people the most amount. Or at least dissatisfy the fewest people the least. See the post I just linked for more details, on one example of how that could be implemented. That’s not necessarily “average values”.
In the worst case, people with totally incompatible values will just be allowed to go separate ways, or whatever the most satisfying compromise is. Muslims live on one side of the dyson sphere, Christians on the other, and they never have to interact and can do their own thing.
My exact words were “If they were more intelligent, informed, and rational… If they knew all the arguments for and against...” Real world problems of persuading people don’t apply. Most people don’t research all the arguments against their beliefs, and most people aren’t rational and seriously consider the hypothesis that they are wrong.
For what it’s worth, I was deconverted like this. Not overnight by any means. But over time I found that the arguments against my beliefs were correct and I updated my belief.
Changing world views is really really hard. There’s no one piece of evidence or one argument to dispute. Religious people believe that there is tons of evidence of God. To them it just seems obviously true. From miracles, to recorded stories, to their own personal experiences, etc. It takes a lot of time to get at every single pillar of the belief and show its flaws. But it is possible. It’s not like Muslims were born believing in Islam. Islam is not encoded in genetics. People deconvert from religions all the time, entire societies have even done it.
In any case, my proposal does not require literally doing this. It’s just a thought experiment. To show that the ideal set of values is what you choose if you had all the correct beliefs.
It means that when you look an an AI system, you can tell whether it’s FAI or not.
If you can’t tell, you may be able to build an AI system, but you still won’t know whether it’s FAI or not.
I don’t see what voting systems have to do with CEV. The “E” part means you don’t trust what the real, current humans say, so to making them vote on anything is pointless.
That’s a meaningless expression without a context. Notably, we don’t have the same genes or the same brain structures. I don’t know about you, but it is really obvious to me that humans are not identical.
How do you know what’s false? You are a mere human, you might well be mistaken. How do you know what’s fair? Is it an objective thing, something that exists in the territory?
Right, so the fat man gets thrown under the train… X-)
Hey, I want to live on the inside. The outside is going to be pretty gloomy and cold :-/
LOL. You’re just handwaving then. “And here, in the difficult part, insert magic and everything works great!”
Look at it how? Look at it’s source code? I argued that we can write source code that will result in FAI, and you could recognize that. Look at the weights of it’s “brain”? Probably not, anymore than we can look at human brains and recognize what they do. Look at it’s actions? Definitely, FAI is an AI that doesn’t destroy the world etc.
The voting doesn’t have to actually happen. The AI can predict what we would vote for, if we had plenty of time to debate it. And you can get even more abstract than that and have the FAI just figure out the details of E itself.
The point is to solve the “coherent” part. That you can find a set of coherent values from a bunch of different agents or messy human brains. And to show that mathematicians have actually extensively studied a special case of this problem, voting systems.
Compared to other animals, compared to aliens, yes we are incredibly similar. We do have 99.99% identical DNA, our brains all have the same structure with minor variations.
Did I claim that I did?
I gave a precise algorithm for doing that actually.
Which is the best possible outcome, vs killing 5 other people. But I don’t think these kinds of scenarios are realistic once we have incredibly powerful AI.
I’m not handwaving anything… There is no magic involved at all. The whole scenario of persuading people is counterfactual and doesn’t need to actually be done. The point is to define more exactly what CEV is. It’s the values you would want if you had the correct beliefs. You don’t need to actually have the correct beliefs, to give your CEV.
I think we have, um, irreconcilable differences and are just spinning wheels here. I’m happy to agree to disagree.
We typically imagine CEV asking what people would do if they ‘knew what the AI knew’ - let’s say the AI tries to estimate expected value of a given action, with utility defined by extrapolated versions of us who know the truth, and probabilities taken from the AI’s own distribution. I am absolutely saying that theism fails under any credible epistemology, and any well-programmed FAI would expect ‘more knowledgeable versions of us’ to become atheists on general principles. Whether or not this means they would change “if they knew all the arguments for and against religion,” depends on whether or not they can accept some extremely basic premise.
(Note that nobody comes into the word with anything even vaguely resembling a prior that favors a major religion. We might start with a bias in favor of animism, but nearly everyone would verbally agree this anthropomorphism is false.)
It seems much less clear if CEV would make psychopathy irrelevant. But potential victims must object to their own suffering at least as much as real-world psychopaths want to hurt them. So the most obvious worst-case scenario, under implausibly cynical premises, looks more like Omelas than it does a Mongol invasion. (Here I’m completely ignoring the clause meant to address such scenarios, “had grown up farther together”.)
No, we don’t, because this would be a stupid question. CEV doesn’t ask people, CEV tells people what they want.
I see little evidence to support this point of view. You might think that atheism is obvious, but a great deal of people, many of them smarter than you, disagree.
This amounts to saying “because I’m right and once everyone gets to know reality better, they’ll figure out I’m right.”
In reality they will also figure out the places where you are wrong, and there will be many of them.
I’m not claiming that at all. I may be wrong about many things. It’s irrelevant.
It is not irrelevant. You said, “With those two conditions, the negative parts of human values are entirely eliminated.” That certainly meant that things like ISIS opinions would be eliminated. I agree in that particular case, but there are many other things that you would consider negative which will not be eliminated. I can probably guess some of them, although I won’t do that here.
See my other comment for more clarification on how CEV would eliminate negative values.
I read that. You say there, “Your stated example was ISIS. ISIS is so bad because they incorrectly believe… If they knew all the arguments for and against religion, then their values would be more like ours.” As I said, I agree with you in that case. But you are indeed saying, “it is because I am right and when they know better they will know I was right.” And that will not always be true, even if it is true in that case.
I never claimed I am right about everything. I don’t need to be right about everything. I would love to have an AI show me what I am wrong about and show me the perfect set of values.
And most importantly, I’m saying that this process would result in the optimal set of values for everyone. Do you disagree?
Yes, I disagree. I think that “babyeater values are different from human values” differs only in degree from “my values are different from your values.” I do not think there is a reasonable chance that I will turn out to be wrong about this, just like there is no reasonable chance that if we measure our heights with sufficient accuracy, we will turn out to have different heights. This is still another reason why we should speak of “babyeater morality” and “human morality,” namely because if morality is inconsistent with variety, then morality does not exist.
That said, I already said that I would not be willing to wipe out non-human values from the cosmos, and likewise I have no interest in imposing my personal values on everything else. I think these are really the same thing, and in that sense wanting to impose a CEV on the universe is being a “racist” in relation to human beings vs other intelligent beings.
People may have different values (although I think deep down we are very similar, humans sharing all the same brains and not having that much diversity.) Regardless, CEV should find the best possible compromise between our different values. That’s literally the whole point.
If there is a difference in our values, the AI will find the compromise that satisfies us the most (or dissatisfies us the least.) There is no alternative, besides not compromising at all and just taking the values of a single random person. From behind the veil of ignorance, the first is definitely preferable.
I don’t think this will be so bad. Because I don’t think our values diverge so much, or that decent compromises are impossible between most values. I imagine that in the worst case, the compromise will be that two groups with different values will have to go their separate ways. Live on opposite sides of the world, never interact, and do their own thing. That’s not so bad, and a post-singularity future will have more than enough resources to support it.
No one is suggesting we wipe out non-human values. But we have yet to meet any intelligent aliens with different values. Once we do so, we may very well just apply CEV to them and get the best compromise of our values again. Or we may keep our own values, but still allow them to live separately and do their own thing, because we value their existence.
This reminds me a lot of the post value is fragile. It’s ok to want a future that has different beings in it, that are totally different than humans. That doesn’t violate my values at all. But I don’t want a future that has beings die or suffer involuntarily. I don’t think it’s “value racist” to want to stop beings that do value that.
“Once we do so, we may very well just apply CEV to them and get the best compromise of our values again. Or we may keep our own values, but still allow them to live separately and do their own thing, because we value their existence.”
The problem I have with what you are saying is that these are two different things. And if they are two different things in the case of the aliens, they are two different things in the case of the humans.
The CEV process might well be immoral for everyone concerned, since by definition it is compromising a person’s fundamental values. Eliezer agrees this is true in the case of the aliens, but he does not seem to notice that it would also be true in the case of the humans.
In any case, I choose in advance to keep my own values, not to participate in changing my fundamental values. But I am also not going to impose those on anyone else. If you define CEV to mean “the best possible way to keep your values completely intact and still not impose them on anyone else,” then I would agree with it, but only because we will be stipulating the desired conclusion.
That does not necessarily mean “living separately”. Even now I live with people who, in every noticeable way, have values that are fundamentally different from mine. That does not mean that we have to live separately.
In regard to the last point, you are saying that you don’t want to eliminate all potential aliens, but you want to eliminate ones with values that you really dislike. I think that is basically racist.
There is some truth in it, however, insofar as in reality, for reasons I have been saying, beings that have fundamental desires for others to suffer and die are very unlikely indeed, and any such desires are likely to be radically qualified. To that degree you are somewhat right: desires like that are in fact evil. But because they are evil, they cannot exist.
The world we live in is “immoral” in that it’s not optimized towards anyone’s values. Taking a single person’s values would be “immoral” to everyone else. CEV, finding the best possible compromise of values, would be the least immoral option, on average. Optimize the world in a way that dissatisfies the least people the least amount.
Right. I said that’s the realistic worst case, when no compromise is possible. I think most people have similar enough values that this would be rare.
I don’t necessarily want to kill them, but I would definitely stop them from hurting other beings. Imagine you came upon a race of aliens that practiced a very cruel form of slavery. Say 90% of their population was slaves, and the slave owning class treated regularly tortured and overworked them. Would you stop them, if you could? Is that racist? What about the values of the slaves?
I think optimizing anything is always immoral, exactly because it means imposing things that you should not be imposing. It is also the behavior of a fanatic, not a normal human being; that is the whole reason for the belief that AIs would destroy the world, namely because of the belief that they would behave like fanatics instead of like intelligent beings.
In the case of the slave owning race, I am quite sure that slavery is not consistent with their fundamental values, even if they are practicing it for a certain time. I don’t admit that values are arbitrary, and consequently you cannot assume (at least without first proving me wrong about this) that any arbitrary value could be a fundamental value for something.
Well now I see we disagree at a much more fundamental level.
There is nothing inherently sinister about “optimization”. Humans are optimizers in a sense, manipulating the world to be more like how we want it to be. We build sophisticated technology and industries that are many steps removed from our various end goals. We dam rivers, and build roads, and convert deserts into sprawling cities. We convert the resources of the world into the things we want. That’s just what humans do, that’s probably what most intelligent beings do.
The definition of FAI, to me, is something that continues that process, but improves it. Takes over from us, and continues to run the world for human ends. Makes our technologies better and our industries more efficient, and solves our various conflicts. The best FAI is one that constructs a utopia for humans.
I don’t know why you believe a slave owning race is impossible. Humans of course practiced slavery in many different cultures. It’s very easy for even humans to not care about the suffering of other groups. And even if you do believe most humans could be convinced it’s wrong (I’m not so sure), there are actual sociopaths that don’t experience empathy at all.
Humans also have plenty of sinister values, and I can easily believe aliens could exist that are far worse. Evolution tended to evolve humans that cooperate and have empathy. But under different conditions, we could have evolved completely differently. There is no law of the universe that says beings have to have values like us.
“Well now I see we disagree at a much more fundamental level.” Yes. I’ve been saying that since the beginning of this conversation.
If humans are optimizers, they must be optimizing for something. Now suppose someone comes to you and says, “do you agree to turn on this CEV machine?”, when you respond, are you optimizing for the thing or not? If you say yes, and you are optimizing the original thing, then the CEV cannot (as far as you know) be compromising the thing you were optimizing for. If you say yes and are not optimizing for it, then you are not an optimizer. So you must agree with me on at least one point: either 1) you are not an optimizer, or 2) you should not agree with CEV if it compromises your personal values in any way. I maintain both of those, but you must maintain at least one of them.
In earlier posts I have explained why it is not possible that you are really an optimizer (not during this particular discussion.) People here tend to neglect the fact that an intelligent thing has a body. So e.g. Eliezer believes that an AI is an algorithm, and nothing else. But in fact an AI has a body just as much as we do. And those bodies have various tendencies, and they do not collectively add up to optimizing for anything, except in an abstract sense in which everything is an optimizer, like a rock is an optimizer, and so on.
“We convert the resources of the world into the things we want.” To some extent, but not infinitely, in a fanatical way. Again, that is the whole worry about AI—that it might do that fanatically. We don’t.
I understand you think that some creatures could have fundamental values that are perverse from your point of view. This is because you, like Eliezer, think that values are intrinsically arbitrary. I don’t, and I have said so from the beginning. It might be true that slave owning values could be fundamental in some exterrestrial race, but if they were, slavery in that race would be very, very different from slavery in the human race, and there would be no reason to oppose it in that race. In fact, you could say that slavery exists in a fundamental way in the human race, and there is no reason to oppose it: parents can tell their kids to stay out of the road, and they have to obey them, whether they want to or not. Note that this is very, very different from the kind of slavery you are concerned about, and there is no reason to oppose the real kind.
I can still think the CEV machine is better than whatever the alternative is (for instance, no AI at all.) But yes, in theory, you should prefer to make AIs that have your own values and not bother with CEV.
Having a body is irrelevant. Bodies are just one way to manipulate the world to optimize your goals.
What do you mean by “fanatically”? This is a pretty vague word. Humans would sure seem fanatical to other animals. We’ve cut down entire continent sized forests, drained massive lakes, and built billions of complex structures.
The only reason we haven’t “optimized” the Earth further, is because of physical and economic limits. If we could we probably would.
Whether you call that “optimization” or not, is mostly irrelevant. If superintelligent AIs acted similarly, humans would be screwed.
I’m deeply concerned that you are theoretically ok with slave owning aliens. If the slaves are ok with it, then perhaps it could be justified. But if they strongly object to it, and suffer from it, and don’t get any benefit from it, then it’s just obviously wrong.
“Having a body is irrelevant. Bodies are just one way to manipulate the world to optimize your goals.”
This is not true. Bodies are physical objects that follow the laws of physics, and the laws of physics are not “just one way to manipulate the world to optimize your goals,” because the laws have nothing to do with your goals. For example, we often don’t keep doing something because we are tired, not because we have a goal of not continuing. AIs will be quite capable of doing the same thing, as for example if thinking too hard about something begins to weaken its circuits.
What I mean by fanatically is trying to optimize for a single goal as though it were the only thing that mattered. We do not do that, nor does anything else with a body, nor is it even possible, for the above reason.
Yes you should be concerned about what I said about slaves and aliens, as it suggests that the CEV machine might result in things that you consider utterly wicked. I said that from the beginning, when you claimed that it would eliminate all negative results, obviously intending that to mean from your subjective point of view.
If ithey find it immoral in the sense of crossing a line that should never be crossed, then they are not going to play. I don’t think the morals=values theory can tell you where the bright lines are, and that is why I think rules and a few other things are involved in ethics.
Consider a harder case....a society that is ruthless in crushing any society that offers any rivalry or opposition to them, but otherwise leaves people alone. Since that is a survival promoting strategy, you can’t argue that it would just be selected out. But it doesn’t seem as ethical as more conciliatory approaches.
“It doesn’t seem as ethical as more conciliatory approaches.” I agree. That is because it is not the best strategy. It may not be the worst possible strategy, but it is not the best. And since the people engaging in that strategy, their ability to think about it, over time, will lead them to adopt better strategies, namely more conciliatory approaches.
I don’t say that the good is achieved by selection alone. It is also achieved by the use of reason, by things that use reason.
Are you sure? Ont the face of it, doing things like attending peace negotiations exposes you to risks (they take the opportunity to assassinate you, they renege on the agreement, etc) that simply nuking them doesn’t.
If people who reason well don’t get selected, where does the prevalence of good come from?
Yes I am sure. Of course negotiating has risks, but it doesn’t automatically make permanent enemies, and it is better not to have permanent enemies.
People who reason well do get selected. I am just saying once they are selected they can start thinking about what is good as well.
If the alternative to negotation is completely exterminating you enemies, you don’t have to worry about permanent enemies!
You can try to permanently exterminate them and fail. Additionally, even if you succeed in one case, you will ensure that no one else will be willing to negotiate with you even when it would be beneficial for you because they are stronger. So overall you will be decreasing your options, which makes your situation worse.
Each human differs in their values. So it is impossible to build the machine of which you speak.
But humans share a lot of values (e.g. wanting to live and not be turned into a dyson sphere.) And a collection of individuals may still have a set of values (see e.g. coherent extrapolated volition.)