“This seems circular—on what basis do you say that it works well?”
My wording was ” while it’s faulty … it works so well overall that …” But yes, it does work well if you apply the underlying idea of it, as most people do. That is why you hear Jews saying that the golden rule is the only rule needed—all other laws are mere commentary upon it.
“I would say that it perhaps summarizes conventional human morality well for a T-shirt slogan, but it’s a stretch to go from that to “underlying truth”—more like underlying regularity. It is certainly true that most people have golden rule-esque moralities, but that is distinct from the claim that the golden rule itself is true.”
It isn’t itself true, but it is very close to the truth, and when you try to work out why it’s so close, you run straight into its mechanism as a system of harm management.
“You are only presenting your opinion on what is right (and providing an imagined scenario which relies on the soul-intuition to widen the scope of moral importance from the self to all individuals), not defining rightness itself. I could just as easily say “morality is organizing rocks into piles with prime numbers.”″
What I’m doing is showing the right answer, and it’s up to people to get up to speed with that right answer. The reason for considering other individuals is that that is precisely what morality requires you do do. See what I said a few minutes ago (probably an hour ago by the time I’ve posted this) in reply to one of your other comments.
“Additionally, if reincarnation is not true, then why should our moral system be based on the presupposition that it is?”
Because getting people to imagine they are all the players involved replicates what AGI will do when calculating morality—it will be unbiased, not automatically favouring any individual over any other (until it starts weighing up how moral they are, at which point it will favour the more moral ones as they do less harm).
“If moral truths are comparable to physical and logical truths, then they will share the property that one must base them on reality for them to be true, and clearly imagining a scenario where light travels at 100 m/s should not convince you that you can experience the effects of special relativity on a standard bicycle in real life.”
An unbiased analysis by AGI is directly equivalent to a person imagining that they are all the players involved. If you can get an individual to strip away their own self-bias and do the analysis while seeing all the other players as different people, that will work to—it’s just another slant on doing the same computations. You either eliminate the bias by imagining being all the players involved, or by being none of them.
“More specifically—if morality tells us the method by which our actions are assigned Moral Scores, then your post is telling us that the Right is imagining that in the end, the Moral Scores are summed over all sentient beings, and your own Final Score is dependent on that sum. If this is true, then clearly altruism is important. But if this isn’t the case, then why should we care about the conclusions drawn from a false statement?”
Altruism is important, although people can’t be blamed for not embarking on something that will do themselves considerable harm to help others—their survival instincts are too strong for that. AGI should make decisions on their behalf though on the basis that they are fully altruistic. If some random death is to occur but there is some room to select the person to be on the receiving end of it, AGI should not hold back from choosing which one should be on the receiving end of if there’s a clear best answer.
“I disagree that there is some operation that a Matrix Lord could carry out to take my Identity out at my death and return it to some other body. What would the Lord actually do to the simulation to carry this out?”
If this universe is virtual, your real body (or the nearest equivalent thing that houses your mind) is not inside that virtual universe. It could have all its memories switched out and alternative ones switched in, at which point it believes itself to be the person those memories tell it it is. (In my case though, I don’t identify myself with my memories—they are just baggage that I’ve picked up along the way, and I was complete before I started collecting them.)
“Why should I need to for all persons set person.value to self.value? Either I already agree with you, in which case I’m alreadytreating everyone fairly, or I’ve given each person their own subjective value and I see no reason to change. If I feel that Hitler has 0.1% of the moral worth of Ghandi, then of course I will not think it Right to treat them each as I would treat myself.”
If you’re already treating everyone impartially, you don’t need to do this, but many people are biased in favour of themselves, their family and friends, so this is a way of forcing them to remove that bias. Correctly programmed AGI doesn’t need to do this as it doesn’t have any bias to apply, but it will start to favour some people over others once it takes into account their actions if some individuals are more moral than others. There is no free will, of course, so the people who do more harm can’t really be blamed for it, but favouring those who are more moral leads to a reduction in suffering as it teaches people to behave better.
“Or to come at the same issue from another angle, this section is arguing that since I care about some people, I should care about all people equally. But what reason do we have for leaping down this slope? I could just as well say “most people disvalue some people, so why not disvalue all people equally?” Any point on the slope is just as internally valid as any other.”
If you care about your children more than other people’s children, or about your family more than about other families, who do you care about most after a thousand generations when everyone on the planet is as closely related to you as everyone else? Again, what I’m doing is showing the existence of a bias and then the logical extension of that bias at a later point in time—it illustrates why people should widen their care to include everyone. That bias is also just a preference for self, but it’s a misguided one—the real self is sentience rather than genes and memories, so why care more about people with more similar genes and overlapping memories (of shared events)? For correct morality, we need to eliminate such biases.
“I am not certain that any living human cares about only the future people who are composed of the same matter as they are right now (even if we ignore how physically impossible such a condition is, because QM says that there’s no such thing as “the same atom”). Why should “in this hypothetical scenario, your matter will comprise alien beings” convince anybody? This thinking feels highly motivated.”
If you love someone and that person dies, then the sentience that was in them becomes the sentience in a new being (which could be an animal or an alien equivalent to a human), why should you not still love it equally? It would be stupid to change your attitude to your grandmother just because the sentience that was her is now in some other type of being, and given that you don’t know that that sentience hasn’t been reinstalled into any being that you encounter, it makes sense to err on the side of caution. There would be nothing more stupid than abusing that alien on the basis that it isn’t human if that actually means you’re abusing someone you used to love and who loved you.
“You seem to think that any moral standpoint except yours is arbitrary and therefore inferior. I think you should consider the possibility that what seems obvious to you isn’t necessarily objectively true, and could just be your own opinion.”
The moral standpoints that are best are the most rational ones—that is the standard they should be judged by. If my arguments are the best ones, they win. If they aren’t, they lose. Few people are capable of judging the winners, but AGI will count up the score and declare who won on each point.
“Morality is not objective. Even if you think that there is a Single Correct Morality, that alone does not make an arbitrary agent more likely to hold that morality to be correct. This is similar to the Orthogonality Thesis.”
I have already set out why correct morality should take the same form wherever an intelligent civilisation invents it. AGI can, of course, be programmed to be immoral and to call itself moral, but I don’t know if its intelligence (if it’s fully intelligent) is sufficient for it to be able to modify itself to become properly moral automatically, although I suspect it’s possible to make it sufficiently un-modifiable to prevent such evolution and maintain it as a biased system.
“But why? Your entire argument here assumes its conclusions—you’re doing nothing but pointing at conventional morality and providing some weak arguments for why it’s superior, but you wouldn’t be able to stand on your own without the shared assumption of moral “truths” like “disabled people matter.”″
The argument here relates to the species barrier. Some people think people matter more than animals, but when you have an animal that’s almost as intelligent as a human and compare that with a person who’s almost completely brain dead but is just ticking over (but capable of feeling pain), where is the human superiority? It isn’t there. But if you were to torture that human to generate the same as much suffering in them as you would generate by torturing any other human, there is an equivalence of immorality there. These aren’t weak arguments—they’re just simple maths like 2=2.
“This reminds me of the reasoning in Scott Alexander’s “The Demiurge’s Older Brother.” But I also feel that you are equivocating between normative and pragmatic ethics. The distinction is a matter of meta-ethics, which is Important and Valuable and which you are entirely glossing over in favor of baldly stating societal norms as if they were profound truths.”
When a vital part of an argument is simple and obvious, it isn’t there to stand as a profound truth, but as a way of completing the argument. There are many people who think humans are more important than animals, and in one way they’re right, while in another way they’re wrong. I have to spell out why it’s right in one way and wrong in another. By comparing the disabled person to the animal with superior functionality (in all aspects), I show that that there’s a kind of bias involved in many people’s approach which needs to be eliminated.
“I am a bit offended, and I think this offense is coming from the feeling that you are missing the point. Our ethical discourse does not revolve around whether babies should be eaten or not. It covers topics such as “what does it mean for something to be right?” and “how can we compactly describe morality (in the programmer’s sense)?”. Some of the offense could also be coming from “outsider comes and tells us that morality is Simple when it’s really actually Complicated.”″
So where is that complexity? What point am I missing? This is what I’ve come here searching for, and it isn’t revealing itself. What I’m actually finding is a great long series of mistakes which people have built upon, such as the Mere Addition Paradox. The reality is that there’s a lot of soft wood that needs replacing.
“Ah, so you don’t really have to bite any bullets here—you’ve just given a long explanation for why our existing moral intuitions are objectively valid. How reassuring.”
What that explanation does is show that there’s more harm involved than the obvious harm which people tend to focus on. A correct analysis always needs to account for all the harm. That’s why the death of a human is worse than the death of a horse. Torturing a horse is equal to torturing a person to create the same amount of suffering in them, but killing them is not equal.
″ “What the equality aspect requires is that a torturer of animals should be made to suffer as much as the animals he has tortured.” --> ”...really? You’re claiming that your morality system as described requires retributive justice?”
I should have used a different wording there: he deserves to suffer as much as the animals he’s tortured. It isn’t required, but may be desirable as a way of deterring others.
“How does that follow from the described scenario at all? This has given up the pretense of a Principia Moralitica and is just asserting conventional morality without any sort of reasoning, now.”
You can’t demolish a sound argument by jumping on a side issue. My method is sound and correct.
“The issue is defining exactly what counts as a loss and what counts as a gain, to the point that it can be programmed into a computer and that computer can very reliably classify situations outside of its training data, even outside of our own experience. This is one of the core Problems which this community has noticed and is working on. I would recommend reading more before trying to present morality to LW.”
To work out what the losses and gains are, you need to collect evidence from people who know how two different things compare. When you have many different people who give you different information about how those two different things compare, you can average them. You can do this millions of times, taking evidence from millions of people and produce better and better data as you collect and crunch more of it. This is a task for AGI to carry out, and it will do a better job than any of the people who’ve been trying to do it to date. This database of knowledge of suffering and pleasure then combines with my method to produce answers to moral questions which are the most probably correct based on the available information. That is just about all there is to it, except that you do need to apply maths to how those computations are carried out. That’s a job for mathematicians who specialise in game theory (or for AGI which should be able to find the right maths for it itself).
″ “Selfless” anthropomorphizes AI.”
Only if you misunderstand the way I used the word. Selfless here simply means that it has no self—the machine cannot understand feelings in any direct way because there is no causal role for any sentience that might be in the machine to influence its thoughts at all (which means we can regard the system as non-sentient).
“There is no fundamental internal difference between “maximize the number of paperclips” and “maximize the happiness of intelligent beings”—both are utility functions plus a dynamic. One is not more “selfless” than another simply because it values intelligent life highly.”
Indeed there isn’t. If you want to program AGI to be moral though, you make sure it focuses on harm management rather than paperclip production (which is clearly not doing morality)
“The issue is that there are many ways to carve up reality into Good and Bad, and only a very few of those ways results in an AI which does anything like what we want.”
In which case, it’s easy to reject the ones that don’t offer what we want. The reality is that if we put the wrong kind of “morality” into AGI, it will likely end up killing lots of people that it shouldn’t. If you run it on a holy text, it might exterminate all Yazidis. What I want to see is a list of proposed solutions to this morality issue ranked in order of which look best, and I want to see a similar league table of the biggest problems with each of them. Utilitarianism, for example, has been pushed down by the Mere Addition Paradox, but that paradox has now been resolved and we should see utilitarianism’s score go up as a result. Something like this is needed as a guide to all the different people out there who are trying to build AGI, because some of them will succeed and they won’t be experts in ethics. At least if they make an attempt at governing it using the method at the top of the league, we stand a much better chance of not being wiped out by their creations.
“Perhaps the AI could check with us to be sure, but a. did we tell it to check with us?, b. programmer manipulation is a known risk, and c. how exactly will it check its planned future against a brain? Naive solutions to issue c. run the risk of wireheading and other outcomes that will produce humans which after the factappreciate the modification but which we, before the modification would barely consider human at all. This is very non-trivial.”
AGI will likely be able to make better decisions than the people it asks permission from even if it isn’t using the best system for working out morality, so it may be a moral necessity to remove humans from the loop. We have an opportunity to use AGI to check rival AGI system to check for malicious programming, although it’s hard to check on devices made by rogue states, and one of the problems we face is that these things will go into use as soon as they are available without waiting for proper moral controls—rogue states will put them straight into the field and we will have to respond to that by not delaying ours. We need to nail morality urgently and make sure the best available way of handling it is available to all who want to fit it.
“It is possible, but it’s also possible for the robot to come to an entirely different conclusion. And even if you think that it would be inherently morally wrong for the robot to kill all humans, it won’t feel wrong from the inside—there’s no reason to expect a non-aligned machine intelligence to spontaneously align itself with human wishes.”
The machine will do what it’s programmed to do. It’s main task is to apply morality to people by stopping people doing immoral things, making stronger interventions for more immoral acts, and being gentle when dealing with trivial things. There is certainly no guarantee that a machine will do this for us though unless it is told to do so, although if it understands the existence of sentience and the need to manage harm, it might take it upon itself to do the job we would like it to do. That isn’t something we need to leave to chance though—we should put the moral governance in ROM and design the hardware to keep enforcing it.
Will do and will comment afterwards as appropriate.
“Why will the AGI share your moral intuitions? (I’ve said something similar to this enough times, but the same criticism applies.)”
They aren’t intuitions—each change in outcome is based on different amounts of information being available, and each decision is based on weighing up the weighable harm. It is simply the application of a method.
“Also, your model of morality doesn’t seem to have room for normative responsibility, so where did “it’s only okay to run over a child if the child was there on purpose” come from?”
Where did you read that? I didn’t write it.
“It’s still hurting a child just as much, no matter whether the child was pushed or if they were simply unaware of the approaching car.”
If the child was pushed by a gang of bullies, that’s radically different from the child being bad at judging road safety. If the option is there to mow down the bullies that pushed a child onto the road instead of mowing down that child, that is the option that should be taken (assuming no better option exists).
“It makes sense to you to override the moral system and punish the exploiter, because you’re using this system pragmatically. An AI with your moral system hard-coded would not do that. It would simply feed the utility monster, since it would consider that to be the most good it could do.”
I can’t see the link there to anything I said, but if punishing an exploiter leads to a better outcome, why would my system not choose to do that? If you were to live the lives of the expolited and exploiter, you would have a better time if the exploiter is punished just the right amount to give you the best time overall as all the people involved (and this includes a deterrence effect on other would-be exploiters.
“I agree that everyday, in-practice morality is like this, but there are other important questions about the nature and content of morality that you’re ignoring.”
Then let’s get to them. That’s what I came here to look for.
“”What is yet to be worked out is the exact wording that should be placed in AGI systems to build either this rule or the above methodology into them” --->This is the Hard Problem, and in my view one of the two Hard Problems of AGI.”
Actually, I was wrong about that. If you look at the paragraph in brackets at the end of my post (the main blog post at the top of this page), I set out the wording of a proposed rule and wondered if it amounted to the same thing as the method I’d outlined. Over the course of writing later parts of this series of blog posts, I realised that that attempted wording was making the same mistake as many of the other proposed solutions (various types of utilitarianism). These rules are an attempt to put the method into a compact form, but method already is the rule, while these compact versions risk introducing errors. Some of them may produce the same results for any situation, but others may be some way out. There is also room for there to be a range of morally acceptable solutions with one rule setting one end of the acceptable range and another rule setting the other. For example, in determining optimal population size, average utilitarianism and total utilitarianism look as if they provide slightly different answers, but they’ll be very similar and it would do little harm to allow the population to wander between the two values. If all moral questions end up with a small range with very little difference between the extremes of that range, we’re not going to worry much about getting it very slightly wrong if we still can’t agree on which end of the range is slightly wrong. What we need to do is push these different models into places where they might show us that they’re way wrong, because then it will be obvious. If that’s already been done, it should all be there in the league tables of problems under each entry in the league table of proposed systems of determining morality.
“Morality seems basic to you, since our brains and concept-space and language are optimized for social things like that, but morality has a very high complexity as measured mathematically, which makes it difficult to describe to something that’s not human. (This is similar to the formalizations of Occam’s Razor, if you want to know more.)”
If we were to go to an alien planet and were asked by warring clans of these aliens to impose morality on them to make their lives better, do you not think we could do that without having to feel the way they do about things? We would be in the same position as the machines that we want to govern us. What we’d do is ask these aliens how they feel in different situations and how much it hurts them or pleases them. We’d build a database of knowledge of these feelings that they have based on their testimony, and the accuracy would increase the more we collect data from them. We then apply my method and try to produce the best outcome on the basis of there only being one player who has to get the best out of the situation. That needs the application of game theory. It’s all maths.
“If the AI has the correct utility function, it will not say “but this is illogical/useless” and then reject it. Far more likely is that the AI never “cares about” humans in the first place.”
It certainly won’t care about us, but then it won’t care about anything (including its self-less self). It’s only purpose will be to do what we’ve asked it to do, even if it isn’t convinced that sentience is real and that morality has a role.
“This seems circular—on what basis do you say that it works well?”
My wording was ” while it’s faulty … it works so well overall that …” But yes, it does work well if you apply the underlying idea of it, as most people do. That is why you hear Jews saying that the golden rule is the only rule needed—all other laws are mere commentary upon it.
“I would say that it perhaps summarizes conventional human morality well for a T-shirt slogan, but it’s a stretch to go from that to “underlying truth”—more like underlying regularity. It is certainly true that most people have golden rule-esque moralities, but that is distinct from the claim that the golden rule itself is true.”
It isn’t itself true, but it is very close to the truth, and when you try to work out why it’s so close, you run straight into its mechanism as a system of harm management.
“You are only presenting your opinion on what is right (and providing an imagined scenario which relies on the soul-intuition to widen the scope of moral importance from the self to all individuals), not defining rightness itself. I could just as easily say “morality is organizing rocks into piles with prime numbers.”″
What I’m doing is showing the right answer, and it’s up to people to get up to speed with that right answer. The reason for considering other individuals is that that is precisely what morality requires you do do. See what I said a few minutes ago (probably an hour ago by the time I’ve posted this) in reply to one of your other comments.
“Additionally, if reincarnation is not true, then why should our moral system be based on the presupposition that it is?”
Because getting people to imagine they are all the players involved replicates what AGI will do when calculating morality—it will be unbiased, not automatically favouring any individual over any other (until it starts weighing up how moral they are, at which point it will favour the more moral ones as they do less harm).
“If moral truths are comparable to physical and logical truths, then they will share the property that one must base them on reality for them to be true, and clearly imagining a scenario where light travels at 100 m/s should not convince you that you can experience the effects of special relativity on a standard bicycle in real life.”
An unbiased analysis by AGI is directly equivalent to a person imagining that they are all the players involved. If you can get an individual to strip away their own self-bias and do the analysis while seeing all the other players as different people, that will work to—it’s just another slant on doing the same computations. You either eliminate the bias by imagining being all the players involved, or by being none of them.
“More specifically—if morality tells us the method by which our actions are assigned Moral Scores, then your post is telling us that the Right is imagining that in the end, the Moral Scores are summed over all sentient beings, and your own Final Score is dependent on that sum. If this is true, then clearly altruism is important. But if this isn’t the case, then why should we care about the conclusions drawn from a false statement?”
Altruism is important, although people can’t be blamed for not embarking on something that will do themselves considerable harm to help others—their survival instincts are too strong for that. AGI should make decisions on their behalf though on the basis that they are fully altruistic. If some random death is to occur but there is some room to select the person to be on the receiving end of it, AGI should not hold back from choosing which one should be on the receiving end of if there’s a clear best answer.
“I disagree that there is some operation that a Matrix Lord could carry out to take my Identity out at my death and return it to some other body. What would the Lord actually do to the simulation to carry this out?”
If this universe is virtual, your real body (or the nearest equivalent thing that houses your mind) is not inside that virtual universe. It could have all its memories switched out and alternative ones switched in, at which point it believes itself to be the person those memories tell it it is. (In my case though, I don’t identify myself with my memories—they are just baggage that I’ve picked up along the way, and I was complete before I started collecting them.)
“Why should I need to for all persons set person.value to self.value? Either I already agree with you, in which case I’m alreadytreating everyone fairly, or I’ve given each person their own subjective value and I see no reason to change. If I feel that Hitler has 0.1% of the moral worth of Ghandi, then of course I will not think it Right to treat them each as I would treat myself.”
If you’re already treating everyone impartially, you don’t need to do this, but many people are biased in favour of themselves, their family and friends, so this is a way of forcing them to remove that bias. Correctly programmed AGI doesn’t need to do this as it doesn’t have any bias to apply, but it will start to favour some people over others once it takes into account their actions if some individuals are more moral than others. There is no free will, of course, so the people who do more harm can’t really be blamed for it, but favouring those who are more moral leads to a reduction in suffering as it teaches people to behave better.
“Or to come at the same issue from another angle, this section is arguing that since I care about some people, I should care about all people equally. But what reason do we have for leaping down this slope? I could just as well say “most people disvalue some people, so why not disvalue all people equally?” Any point on the slope is just as internally valid as any other.”
If you care about your children more than other people’s children, or about your family more than about other families, who do you care about most after a thousand generations when everyone on the planet is as closely related to you as everyone else? Again, what I’m doing is showing the existence of a bias and then the logical extension of that bias at a later point in time—it illustrates why people should widen their care to include everyone. That bias is also just a preference for self, but it’s a misguided one—the real self is sentience rather than genes and memories, so why care more about people with more similar genes and overlapping memories (of shared events)? For correct morality, we need to eliminate such biases.
“I am not certain that any living human cares about only the future people who are composed of the same matter as they are right now (even if we ignore how physically impossible such a condition is, because QM says that there’s no such thing as “the same atom”). Why should “in this hypothetical scenario, your matter will comprise alien beings” convince anybody? This thinking feels highly motivated.”
If you love someone and that person dies, then the sentience that was in them becomes the sentience in a new being (which could be an animal or an alien equivalent to a human), why should you not still love it equally? It would be stupid to change your attitude to your grandmother just because the sentience that was her is now in some other type of being, and given that you don’t know that that sentience hasn’t been reinstalled into any being that you encounter, it makes sense to err on the side of caution. There would be nothing more stupid than abusing that alien on the basis that it isn’t human if that actually means you’re abusing someone you used to love and who loved you.
“You seem to think that any moral standpoint except yours is arbitrary and therefore inferior. I think you should consider the possibility that what seems obvious to you isn’t necessarily objectively true, and could just be your own opinion.”
The moral standpoints that are best are the most rational ones—that is the standard they should be judged by. If my arguments are the best ones, they win. If they aren’t, they lose. Few people are capable of judging the winners, but AGI will count up the score and declare who won on each point.
“Morality is not objective. Even if you think that there is a Single Correct Morality, that alone does not make an arbitrary agent more likely to hold that morality to be correct. This is similar to the Orthogonality Thesis.”
I have already set out why correct morality should take the same form wherever an intelligent civilisation invents it. AGI can, of course, be programmed to be immoral and to call itself moral, but I don’t know if its intelligence (if it’s fully intelligent) is sufficient for it to be able to modify itself to become properly moral automatically, although I suspect it’s possible to make it sufficiently un-modifiable to prevent such evolution and maintain it as a biased system.
“But why? Your entire argument here assumes its conclusions—you’re doing nothing but pointing at conventional morality and providing some weak arguments for why it’s superior, but you wouldn’t be able to stand on your own without the shared assumption of moral “truths” like “disabled people matter.”″
The argument here relates to the species barrier. Some people think people matter more than animals, but when you have an animal that’s almost as intelligent as a human and compare that with a person who’s almost completely brain dead but is just ticking over (but capable of feeling pain), where is the human superiority? It isn’t there. But if you were to torture that human to generate the same as much suffering in them as you would generate by torturing any other human, there is an equivalence of immorality there. These aren’t weak arguments—they’re just simple maths like 2=2.
“This reminds me of the reasoning in Scott Alexander’s “The Demiurge’s Older Brother.” But I also feel that you are equivocating between normative and pragmatic ethics. The distinction is a matter of meta-ethics, which is Important and Valuable and which you are entirely glossing over in favor of baldly stating societal norms as if they were profound truths.”
When a vital part of an argument is simple and obvious, it isn’t there to stand as a profound truth, but as a way of completing the argument. There are many people who think humans are more important than animals, and in one way they’re right, while in another way they’re wrong. I have to spell out why it’s right in one way and wrong in another. By comparing the disabled person to the animal with superior functionality (in all aspects), I show that that there’s a kind of bias involved in many people’s approach which needs to be eliminated.
“I am a bit offended, and I think this offense is coming from the feeling that you are missing the point. Our ethical discourse does not revolve around whether babies should be eaten or not. It covers topics such as “what does it mean for something to be right?” and “how can we compactly describe morality (in the programmer’s sense)?”. Some of the offense could also be coming from “outsider comes and tells us that morality is Simple when it’s really actually Complicated.”″
So where is that complexity? What point am I missing? This is what I’ve come here searching for, and it isn’t revealing itself. What I’m actually finding is a great long series of mistakes which people have built upon, such as the Mere Addition Paradox. The reality is that there’s a lot of soft wood that needs replacing.
“Ah, so you don’t really have to bite any bullets here—you’ve just given a long explanation for why our existing moral intuitions are objectively valid. How reassuring.”
What that explanation does is show that there’s more harm involved than the obvious harm which people tend to focus on. A correct analysis always needs to account for all the harm. That’s why the death of a human is worse than the death of a horse. Torturing a horse is equal to torturing a person to create the same amount of suffering in them, but killing them is not equal.
″ “What the equality aspect requires is that a torturer of animals should be made to suffer as much as the animals he has tortured.” --> ”...really? You’re claiming that your morality system as described requires retributive justice?”
I should have used a different wording there: he deserves to suffer as much as the animals he’s tortured. It isn’t required, but may be desirable as a way of deterring others.
“How does that follow from the described scenario at all? This has given up the pretense of a Principia Moralitica and is just asserting conventional morality without any sort of reasoning, now.”
You can’t demolish a sound argument by jumping on a side issue. My method is sound and correct.
“The issue is defining exactly what counts as a loss and what counts as a gain, to the point that it can be programmed into a computer and that computer can very reliably classify situations outside of its training data, even outside of our own experience. This is one of the core Problems which this community has noticed and is working on. I would recommend reading more before trying to present morality to LW.”
To work out what the losses and gains are, you need to collect evidence from people who know how two different things compare. When you have many different people who give you different information about how those two different things compare, you can average them. You can do this millions of times, taking evidence from millions of people and produce better and better data as you collect and crunch more of it. This is a task for AGI to carry out, and it will do a better job than any of the people who’ve been trying to do it to date. This database of knowledge of suffering and pleasure then combines with my method to produce answers to moral questions which are the most probably correct based on the available information. That is just about all there is to it, except that you do need to apply maths to how those computations are carried out. That’s a job for mathematicians who specialise in game theory (or for AGI which should be able to find the right maths for it itself).
″ “Selfless” anthropomorphizes AI.”
Only if you misunderstand the way I used the word. Selfless here simply means that it has no self—the machine cannot understand feelings in any direct way because there is no causal role for any sentience that might be in the machine to influence its thoughts at all (which means we can regard the system as non-sentient).
“There is no fundamental internal difference between “maximize the number of paperclips” and “maximize the happiness of intelligent beings”—both are utility functions plus a dynamic. One is not more “selfless” than another simply because it values intelligent life highly.”
Indeed there isn’t. If you want to program AGI to be moral though, you make sure it focuses on harm management rather than paperclip production (which is clearly not doing morality)
“The issue is that there are many ways to carve up reality into Good and Bad, and only a very few of those ways results in an AI which does anything like what we want.”
In which case, it’s easy to reject the ones that don’t offer what we want. The reality is that if we put the wrong kind of “morality” into AGI, it will likely end up killing lots of people that it shouldn’t. If you run it on a holy text, it might exterminate all Yazidis. What I want to see is a list of proposed solutions to this morality issue ranked in order of which look best, and I want to see a similar league table of the biggest problems with each of them. Utilitarianism, for example, has been pushed down by the Mere Addition Paradox, but that paradox has now been resolved and we should see utilitarianism’s score go up as a result. Something like this is needed as a guide to all the different people out there who are trying to build AGI, because some of them will succeed and they won’t be experts in ethics. At least if they make an attempt at governing it using the method at the top of the league, we stand a much better chance of not being wiped out by their creations.
“Perhaps the AI could check with us to be sure, but a. did we tell it to check with us?, b. programmer manipulation is a known risk, and c. how exactly will it check its planned future against a brain? Naive solutions to issue c. run the risk of wireheading and other outcomes that will produce humans which after the factappreciate the modification but which we, before the modification would barely consider human at all. This is very non-trivial.”
AGI will likely be able to make better decisions than the people it asks permission from even if it isn’t using the best system for working out morality, so it may be a moral necessity to remove humans from the loop. We have an opportunity to use AGI to check rival AGI system to check for malicious programming, although it’s hard to check on devices made by rogue states, and one of the problems we face is that these things will go into use as soon as they are available without waiting for proper moral controls—rogue states will put them straight into the field and we will have to respond to that by not delaying ours. We need to nail morality urgently and make sure the best available way of handling it is available to all who want to fit it.
“It is possible, but it’s also possible for the robot to come to an entirely different conclusion. And even if you think that it would be inherently morally wrong for the robot to kill all humans, it won’t feel wrong from the inside—there’s no reason to expect a non-aligned machine intelligence to spontaneously align itself with human wishes.”
The machine will do what it’s programmed to do. It’s main task is to apply morality to people by stopping people doing immoral things, making stronger interventions for more immoral acts, and being gentle when dealing with trivial things. There is certainly no guarantee that a machine will do this for us though unless it is told to do so, although if it understands the existence of sentience and the need to manage harm, it might take it upon itself to do the job we would like it to do. That isn’t something we need to leave to chance though—we should put the moral governance in ROM and design the hardware to keep enforcing it.
“(See The Bottom Line.)”
Will do and will comment afterwards as appropriate.
“Why will the AGI share your moral intuitions? (I’ve said something similar to this enough times, but the same criticism applies.)”
They aren’t intuitions—each change in outcome is based on different amounts of information being available, and each decision is based on weighing up the weighable harm. It is simply the application of a method.
“Also, your model of morality doesn’t seem to have room for normative responsibility, so where did “it’s only okay to run over a child if the child was there on purpose” come from?”
Where did you read that? I didn’t write it.
“It’s still hurting a child just as much, no matter whether the child was pushed or if they were simply unaware of the approaching car.”
If the child was pushed by a gang of bullies, that’s radically different from the child being bad at judging road safety. If the option is there to mow down the bullies that pushed a child onto the road instead of mowing down that child, that is the option that should be taken (assuming no better option exists).
“It makes sense to you to override the moral system and punish the exploiter, because you’re using this system pragmatically. An AI with your moral system hard-coded would not do that. It would simply feed the utility monster, since it would consider that to be the most good it could do.”
I can’t see the link there to anything I said, but if punishing an exploiter leads to a better outcome, why would my system not choose to do that? If you were to live the lives of the expolited and exploiter, you would have a better time if the exploiter is punished just the right amount to give you the best time overall as all the people involved (and this includes a deterrence effect on other would-be exploiters.
“I agree that everyday, in-practice morality is like this, but there are other important questions about the nature and content of morality that you’re ignoring.”
Then let’s get to them. That’s what I came here to look for.
“”What is yet to be worked out is the exact wording that should be placed in AGI systems to build either this rule or the above methodology into them” --->This is the Hard Problem, and in my view one of the two Hard Problems of AGI.”
Actually, I was wrong about that. If you look at the paragraph in brackets at the end of my post (the main blog post at the top of this page), I set out the wording of a proposed rule and wondered if it amounted to the same thing as the method I’d outlined. Over the course of writing later parts of this series of blog posts, I realised that that attempted wording was making the same mistake as many of the other proposed solutions (various types of utilitarianism). These rules are an attempt to put the method into a compact form, but method already is the rule, while these compact versions risk introducing errors. Some of them may produce the same results for any situation, but others may be some way out. There is also room for there to be a range of morally acceptable solutions with one rule setting one end of the acceptable range and another rule setting the other. For example, in determining optimal population size, average utilitarianism and total utilitarianism look as if they provide slightly different answers, but they’ll be very similar and it would do little harm to allow the population to wander between the two values. If all moral questions end up with a small range with very little difference between the extremes of that range, we’re not going to worry much about getting it very slightly wrong if we still can’t agree on which end of the range is slightly wrong. What we need to do is push these different models into places where they might show us that they’re way wrong, because then it will be obvious. If that’s already been done, it should all be there in the league tables of problems under each entry in the league table of proposed systems of determining morality.
“Morality seems basic to you, since our brains and concept-space and language are optimized for social things like that, but morality has a very high complexity as measured mathematically, which makes it difficult to describe to something that’s not human. (This is similar to the formalizations of Occam’s Razor, if you want to know more.)”
If we were to go to an alien planet and were asked by warring clans of these aliens to impose morality on them to make their lives better, do you not think we could do that without having to feel the way they do about things? We would be in the same position as the machines that we want to govern us. What we’d do is ask these aliens how they feel in different situations and how much it hurts them or pleases them. We’d build a database of knowledge of these feelings that they have based on their testimony, and the accuracy would increase the more we collect data from them. We then apply my method and try to produce the best outcome on the basis of there only being one player who has to get the best out of the situation. That needs the application of game theory. It’s all maths.
“If the AI has the correct utility function, it will not say “but this is illogical/useless” and then reject it. Far more likely is that the AI never “cares about” humans in the first place.”
It certainly won’t care about us, but then it won’t care about anything (including its self-less self). It’s only purpose will be to do what we’ve asked it to do, even if it isn’t convinced that sentience is real and that morality has a role.