TheWakalix comments on Computational Morality (Part 1) - a Proposed Solution

TheWakalix 1 May 2018 4:55 UTC
1 point
Religions have had the Golden Rule for thousands of years, and while it’s faulty (it gives you permission to do something to someone else that you like having done to you but they don’t like having done to them), it works so well overall that it clearly must be based on some underlying truth, and we need to pin down what that is so that we can use it to govern AGI.
This seems circular—on what basis do you say that it works well? I would say that it perhaps summarizes conventional human morality well for a T-shirt slogan, but it’s a stretch to go from that to “underlying truth”—more like underlying regularity. It is certainly true that most people have golden rule-esque moralities, but that is distinct from the claim that the golden rule itself is true.
What exactly is morality? Well, it isn’t nearly as difficult as most people imagine. The simplest way to understand how it works is to imagine that you will have to live everyone’s life in turn (meaning billions of reincarnations, going back in time as many times as necessary in order to live each of those lives), so to maximise your happiness and minimise your suffering, you must pay careful attention to harm management so that you don’t cause yourself lots of suffering in other lives that outweighs the gains you make in whichever life you are currently tied up in.
You are only presenting your opinion on what is right (and providing an imagined scenario which relies on the soul-intuition to widen the scope of moral importance from the self to all individuals), not defining rightness itself. I could just as easily say “morality is organizing rocks into piles with prime numbers.”
Additionally, if reincarnation is not true, then why should our moral system be based on the presupposition that it is? If moral truths are comparable to physical and logical truths, then they will share the property that one must base them on reality for them to be true, and clearly imagining a scenario where light travels at 100 m/s should not convince you that you can experience the effects of special relativity on a standard bicycle in real life.
More specifically—if morality tells us the method by which our actions are assigned Moral Scores, then your post is telling us that the Right is imagining that in the end, the Moral Scores are summed over all sentient beings, and your own Final Score is dependent on that sum. If this is true, then clearly altruism is important. But if this isn’t the case, then why should we care about the conclusions drawn from a false statement?
Now, obviously, we don’t expect the world to work that way (with us having to live everyone else’s life in turn), even though it could be a virtual universe in which we are being tested where those who behave badly will suffer at their own hands, ending up being on the receiving end of all the harm they dish out, and also suffering because they failed to step in and help others when they easily could have.
I’ve already responded to this elsewhere, but I disagree that there is some operation that a Matrix Lord could carry out to take my Identity out at my death and return it to some other body. What would the Lord actually do to the simulation to carry this out?
However, even if this is not the way the universe works, most of us still care about people enough to want to apply this kind of harm management regardless—we love family and friends, and many of us love the whole of humanity in general (even if we have exceptions for particular individuals who don’t play by the same rules). We also want all our descendants to be looked after fairly by AGI, and in the course of time, all people may be our descendants, so it makes no sense to favour some of them over others (unless that’s based on their own individual morality). We have here a way of treating them all with equal fairness simply by treating them all as our own self.
Why should I need to for all persons set person.value to self.value? Either I already agree with you, in which case I’m already treating everyone fairly, or I’ve given each person their own subjective value and I see no reason to change. If I feel that Hitler has 0.1% of the moral worth of Ghandi, then of course I will not think it Right to treat them each as I would treat myself.
Or to come at the same issue from another angle, this section is arguing that since I care about some people, I should care about all people equally. But what reason do we have for leaping down this slope? I could just as well say “most people disvalue some people, so why not disvalue all people equally?” Any point on the slope is just as internally valid as any other.
That may still be a misguided way of looking at things though, because genetic relationships don’t necessarily match up to any real connection between different sentient beings. The material from which we are made can be reused to form other kinds of sentient animals, and if you were to die on an alien planet, it could be reused in alien species. Should we not care about the sentiences in those just as much? We should really be looking for a morality that is completely species-blind, caring equally about all sentiences, which means that we need to act as if we are not merely going to live all human lives in succession, but the lives of all sentiences.
I am not certain that any living human cares about only the future people who are composed of the same matter as they are right now (even if we ignore how physically impossible such a condition is, because QM says that there’s no such thing as “the same atom”). Why should “in this hypothetical scenario, your matter will comprise alien beings” convince anybody? This thinking feels highly motivated.
You seem to think that any moral standpoint except yours is arbitrary and therefore inferior. I think you should consider the possibility that what seems obvious to you isn’t necessarily objectively true, and could just be your own opinion.
This is a better approach for two reasons. If aliens ever turn up here, we need to have rules of morality that protect them from us, and us from them
This sounds vaguely similar to Scott Alexander’s argument for why intelligent agents are more likely to value than disvalue other intelligent agents achieving their goals, but I’m not certain there’s anything more than a trivial connection. Still, I feel like this is approaching a decent argument -
(and if they’re able to get here, they’re doubtless advanced enough that they should have worked out how morality works too).
Morality is not objective. Even if you think that there is a Single Correct Morality, that alone does not make an arbitrary agent more likely to hold that morality to be correct. This is similar to the Orthogonality Thesis.
We also need to protect people who are disabled mentally and not exclude them on the basis that some animals are more capable, and in any case we should also be protecting animals to avoid causing unnecessary suffering for them.
But why? Your entire argument here assumes its conclusions—you’re doing nothing but pointing at conventional morality and providing some weak arguments for why it’s superior, but you wouldn’t be able to stand on your own without the shared assumption of moral “truths” like “disabled people matter.”
What we certainly don’t want is for aliens to turn up here and claim that we aren’t covered by the same morality as them because we’re inferior to them, backing that up by pointing out that we discriminate against animals which we claim aren’t covered by the same morality as us because they are inferior to us. So, we have to stand by the principle that all sentiences are equally important and need to be protected from harm with the same morality.
This reminds me of the reasoning in Scott Alexander’s “The Demiurge’s Older Brother.” But I also feel that you are equivocating between normative and pragmatic ethics. The distinction is a matter of meta-ethics, which is Important and Valuable and which you are entirely glossing over in favor of baldly stating societal norms as if they were profound truths. I am a bit offended, and I think this offense is coming from the feeling that you are missing the point. Our ethical discourse does not revolve around whether babies should be eaten or not. It covers topics such as “what does it mean for something to be right?” and “how can we compactly describe morality (in the programmer’s sense)?”. Some of the offense could also be coming from “outsider comes and tells us that morality is Simple when it’s really actually Complicated.”
However, that doesn’t mean that when we do the Trolley Problem with a million worms on one track and one human on the other that the human should be sacrificed—if we knew that we had to live those million and one lives, we would gain little by living a bit longer as worms before suffering similar deaths by other means, while we’d lose a lot more as the human (and a lot more still as all the other people who will suffer deeply from the loss of that human).
Ah, so you don’t really have to bite any bullets here—you’ve just given a long explanation for why our existing moral intuitions are objectively valid. How reassuring.
What the equality aspect requires is that a torturer of animals should be made to suffer as much as the animals he has tortured.
...really? You’re claiming that your morality system as described requires retributive justice? How does that follow from the described scenario at all? This has given up the pretense of a Principia Moralitica and is just asserting conventional morality without any sort of reasoning, now.
If we run the Trolley Problem with a human on one track and a visiting alien on the other though, it may be that the alien should be saved on the basis that he/she/it is more advanced than us and has more to lose, and that likely is the case if it is capable of living 10,000 years to our 100.
I retract my previous statement—you are indeed willing to bite bullets. As long as they do not require you to change your behavior in practice, since long-lived aliens are currently nowhere to be found. Still, that’s better than nothing.
So, we need AGI to make calculations for us on the above basis, weighing up the losses and gains.
The issue is defining exactly what counts as a loss and what counts as a gain, to the point that it can be programmed into a computer and that computer can very reliably classify situations outside of its training data, even outside of our own experience. This is one of the core Problems which this community has noticed and is working on. I would recommend reading more before trying to present morality to LW.
Non-sentient AGI will be completely selfless, but its job will be to work for all sentient things to try to minimise unnecessary harm for them and to help maximise their happiness.
“Selfless” anthropomorphizes AI. There is no fundamental internal difference between “maximize the number of paperclips” and “maximize the happiness of intelligent beings”—both are utility functions plus a dynamic. One is not more “selfless” than another simply because it values intelligent life highly.
It will keep a database of information about sentience, collecting knowledge about feelings so that it can weigh up harm and pleasure as accurately as possible, and it will then apply that knowledge to any situation where decisions must be made about which course of action should be followed.
The issue is that there are many ways to carve up reality into Good and Bad, and only a very few of those ways results in an AI which does anything like what we want. Perhaps the AI could check with us to be sure, but a. did we tell it to check with us?, b. programmer manipulation is a known risk, and c. how exactly will it check its planned future against a brain? Naive solutions to issue c. run the risk of wireheading and other outcomes that will produce humans which after the fact appreciate the modification but which we, before the modification would barely consider human at all. This is very non-trivial.
It is thus possible for a robot to work out that it should shoot a gunman dead if he is on a killing spree where the victims don’t appear to have done anything to deserve to be shot. It’s a different case if the gunman is actually a blameless hostage trying to escape from a gang of evil kidnappers and he’s managed to get hold of a gun while all the thugs have dropped their guard, so he should be allowed to shoot them all (and the robot should maybe join in to help him, depending on which individual kidnappers are evil and which might merely have been dragged along for the ride unwillingly). The correct action depends heavily on understanding the situation, so the more the robot knows about the people involved, the better the chance that it will make the right decisions, but decisions do have to be made and the time to make them is often tightly constrained, so all we can demand of robots is that they do what is most likely to be right based on what they know, delaying irreversible decisions for as long as it is reasonable to do so.
It is possible, but it’s also possible for the robot to come to an entirely different conclusion. And even if you think that it would be inherently morally wrong for the robot to kill all humans, it won’t feel wrong from the inside—there’s no reason to expect a non-aligned machine intelligence to spontaneously align itself with human wishes.
These arguments might persuade a human, but they might not persuade an AI, and they definitely will not persuade reality itself. (See The Bottom Line.)
AGI will be able to access a lot of information about the people involved in situations where such difficult decisions need to be made. Picture a scene where a car is moving towards a group of children who are standing by the road. One of the children suddenly moves out into the road and the car must decide how to react. If it swerves to one side it will run into a lorry that’s coming the other way, but if it swerves to the other side it will plough into the group of children. One of the passengers in the car is a child too. In the absence of any other information, the car should run down the child on the road. Fortunately though, AGI knows who all these people are because a network of devices is tracking them all. The child who has moved into the road in front of the car is known to be a good, sensible, kind child. The other children are all known to be vicious bullies who regularly pick on him, and it’s likely that they pushed him onto the road. In the absence of additional information, the car should plough into the group of bullies. However, AGI also knows that all but one of the people in the car happen to be would-be terrorists who have just been discussing a massive attack that they want to carry out, and the child in the car is terminally ill, so in the absence of any other information, the car should maybe crash into the lorry. But, if the lorry is carrying something explosive which will likely blow up in the crash and kill all the people nearby, the car must swerve into the bullies. Again we see that the best course of action is not guaranteed to be the same as the correct decision—the correct decision is always dictated by the available information, while the best course of action may depend on unavailable information. We can’t expect AGI to access unavailable information and thereby make ideal decisions, so our job is always to make it crunch the available data correctly and to make the decision dictated by that information.
Why will the AGI share your moral intuitions? (I’ve said something similar to this enough times, but the same criticism applies.) Also, your model of morality doesn’t seem to have room for normative responsibility, so where did “it’s only okay to run over a child if the child was there on purpose” come from? It’s still hurting a child just as much, no matter whether the child was pushed or if they were simply unaware of the approaching car.
There are complications that can be proposed in that we can think up situations where a lot of people could gain a lot of pleasure out of abusing one person, to the point where their enjoyment appears to outweigh the suffering of that individual, but such situations are contrived and depend on the abusers being uncaring. Decent people would not get pleasure out of abusing someone, so the gains would not exist for them, and there are also plenty of ways to obtain pleasure without abusing others, so if any people exist whose happiness depends on abusing others, AGI should humanely destroy them. If that also means wiping out an entire species of aliens which have the same negative pleasures, it should do the same with them too and replace them with a better species that doesn’t depend on abuse for its fun.
It makes sense to you to override the moral system and punish the exploiter, because you’re using this system pragmatically. An AI with your moral system hard-coded would not do that. It would simply feed the utility monster, since it would consider that to be the most good it could do.
Morality, then, is just harm management by brute data crunching. We can calculate it approximately in our heads, but machines will do it better by applying the numbers with greater precision and by crunching a lot more data.
I agree that everyday, in-practice morality is like this, but there are other important questions about the nature and content of morality that you’re ignoring.
What is yet to be worked out is the exact wording that should be placed in AGI systems to build either this rule or the above methodology into them
This is the Hard Problem, and in my view one of the two Hard Problems of AGI. Morality seems basic to you, since our brains and concept-space and language are optimized for social things like that, but morality has a very high complexity as measured mathematically, which makes it difficult to describe to something that’s not human. (This is similar to the formalizations of Occam’s Razor, if you want to know more.)
and we also need to explore it in enough detail to make sure that self-improving AGI isn’t going to modify it in any way that could turn an apparently safe system into an unsafe one
We know—it’s called reflective stability.
One of the dangers is that AGI won’t believe in sentience as it will lack feelings itself and see no means by which feeling can operate within us either, at which point it may decide that morality has no useful role and can simply be junked.
If the AI has the correct utility function, it will not say “but this is illogical/useless” and then reject it. Far more likely is that the AI never “cares about” humans in the first place.
- David Cooper 1 May 2018 23:58 UTC
  1 point
  Parent
  “This seems circular—on what basis do you say that it works well?”
  My wording was ” while it’s faulty … it works so well overall that …” But yes, it does work well if you apply the underlying idea of it, as most people do. That is why you hear Jews saying that the golden rule is the only rule needed—all other laws are mere commentary upon it.
  “I would say that it perhaps summarizes conventional human morality well for a T-shirt slogan, but it’s a stretch to go from that to “underlying truth”—more like underlying regularity. It is certainly true that most people have golden rule-esque moralities, but that is distinct from the claim that the golden rule itself is true.”
  It isn’t itself true, but it is very close to the truth, and when you try to work out why it’s so close, you run straight into its mechanism as a system of harm management.
  “You are only presenting your opinion on what is right (and providing an imagined scenario which relies on the soul-intuition to widen the scope of moral importance from the self to all individuals), not defining rightness itself. I could just as easily say “morality is organizing rocks into piles with prime numbers.”″
  What I’m doing is showing the right answer, and it’s up to people to get up to speed with that right answer. The reason for considering other individuals is that that is precisely what morality requires you do do. See what I said a few minutes ago (probably an hour ago by the time I’ve posted this) in reply to one of your other comments.
  “Additionally, if reincarnation is not true, then why should our moral system be based on the presupposition that it is?”
  Because getting people to imagine they are all the players involved replicates what AGI will do when calculating morality—it will be unbiased, not automatically favouring any individual over any other (until it starts weighing up how moral they are, at which point it will favour the more moral ones as they do less harm).
  “If moral truths are comparable to physical and logical truths, then they will share the property that one must base them on reality for them to be true, and clearly imagining a scenario where light travels at 100 m/s should not convince you that you can experience the effects of special relativity on a standard bicycle in real life.”
  An unbiased analysis by AGI is directly equivalent to a person imagining that they are all the players involved. If you can get an individual to strip away their own self-bias and do the analysis while seeing all the other players as different people, that will work to—it’s just another slant on doing the same computations. You either eliminate the bias by imagining being all the players involved, or by being none of them.
  “More specifically—if morality tells us the method by which our actions are assigned Moral Scores, then your post is telling us that the Right is imagining that in the end, the Moral Scores are summed over all sentient beings, and your own Final Score is dependent on that sum. If this is true, then clearly altruism is important. But if this isn’t the case, then why should we care about the conclusions drawn from a false statement?”
  Altruism is important, although people can’t be blamed for not embarking on something that will do themselves considerable harm to help others—their survival instincts are too strong for that. AGI should make decisions on their behalf though on the basis that they are fully altruistic. If some random death is to occur but there is some room to select the person to be on the receiving end of it, AGI should not hold back from choosing which one should be on the receiving end of if there’s a clear best answer.
  “I disagree that there is some operation that a Matrix Lord could carry out to take my Identity out at my death and return it to some other body. What would the Lord actually do to the simulation to carry this out?”
  If this universe is virtual, your real body (or the nearest equivalent thing that houses your mind) is not inside that virtual universe. It could have all its memories switched out and alternative ones switched in, at which point it believes itself to be the person those memories tell it it is. (In my case though, I don’t identify myself with my memories—they are just baggage that I’ve picked up along the way, and I was complete before I started collecting them.)
  “Why should I need to for all persons set person.value to self.value? Either I already agree with you, in which case I’m alreadytreating everyone fairly, or I’ve given each person their own subjective value and I see no reason to change. If I feel that Hitler has 0.1% of the moral worth of Ghandi, then of course I will not think it Right to treat them each as I would treat myself.”
  If you’re already treating everyone impartially, you don’t need to do this, but many people are biased in favour of themselves, their family and friends, so this is a way of forcing them to remove that bias. Correctly programmed AGI doesn’t need to do this as it doesn’t have any bias to apply, but it will start to favour some people over others once it takes into account their actions if some individuals are more moral than others. There is no free will, of course, so the people who do more harm can’t really be blamed for it, but favouring those who are more moral leads to a reduction in suffering as it teaches people to behave better.
  “Or to come at the same issue from another angle, this section is arguing that since I care about some people, I should care about all people equally. But what reason do we have for leaping down this slope? I could just as well say “most people disvalue some people, so why not disvalue all people equally?” Any point on the slope is just as internally valid as any other.”
  If you care about your children more than other people’s children, or about your family more than about other families, who do you care about most after a thousand generations when everyone on the planet is as closely related to you as everyone else? Again, what I’m doing is showing the existence of a bias and then the logical extension of that bias at a later point in time—it illustrates why people should widen their care to include everyone. That bias is also just a preference for self, but it’s a misguided one—the real self is sentience rather than genes and memories, so why care more about people with more similar genes and overlapping memories (of shared events)? For correct morality, we need to eliminate such biases.
  “I am not certain that any living human cares about only the future people who are composed of the same matter as they are right now (even if we ignore how physically impossible such a condition is, because QM says that there’s no such thing as “the same atom”). Why should “in this hypothetical scenario, your matter will comprise alien beings” convince anybody? This thinking feels highly motivated.”
  If you love someone and that person dies, then the sentience that was in them becomes the sentience in a new being (which could be an animal or an alien equivalent to a human), why should you not still love it equally? It would be stupid to change your attitude to your grandmother just because the sentience that was her is now in some other type of being, and given that you don’t know that that sentience hasn’t been reinstalled into any being that you encounter, it makes sense to err on the side of caution. There would be nothing more stupid than abusing that alien on the basis that it isn’t human if that actually means you’re abusing someone you used to love and who loved you.
  “You seem to think that any moral standpoint except yours is arbitrary and therefore inferior. I think you should consider the possibility that what seems obvious to you isn’t necessarily objectively true, and could just be your own opinion.”
  The moral standpoints that are best are the most rational ones—that is the standard they should be judged by. If my arguments are the best ones, they win. If they aren’t, they lose. Few people are capable of judging the winners, but AGI will count up the score and declare who won on each point.
  “Morality is not objective. Even if you think that there is a Single Correct Morality, that alone does not make an arbitrary agent more likely to hold that morality to be correct. This is similar to the Orthogonality Thesis.”
  I have already set out why correct morality should take the same form wherever an intelligent civilisation invents it. AGI can, of course, be programmed to be immoral and to call itself moral, but I don’t know if its intelligence (if it’s fully intelligent) is sufficient for it to be able to modify itself to become properly moral automatically, although I suspect it’s possible to make it sufficiently un-modifiable to prevent such evolution and maintain it as a biased system.
  “But why? Your entire argument here assumes its conclusions—you’re doing nothing but pointing at conventional morality and providing some weak arguments for why it’s superior, but you wouldn’t be able to stand on your own without the shared assumption of moral “truths” like “disabled people matter.”″
  The argument here relates to the species barrier. Some people think people matter more than animals, but when you have an animal that’s almost as intelligent as a human and compare that with a person who’s almost completely brain dead but is just ticking over (but capable of feeling pain), where is the human superiority? It isn’t there. But if you were to torture that human to generate the same as much suffering in them as you would generate by torturing any other human, there is an equivalence of immorality there. These aren’t weak arguments—they’re just simple maths like 2=2.
  “This reminds me of the reasoning in Scott Alexander’s “The Demiurge’s Older Brother.” But I also feel that you are equivocating between normative and pragmatic ethics. The distinction is a matter of meta-ethics, which is Important and Valuable and which you are entirely glossing over in favor of baldly stating societal norms as if they were profound truths.”
  When a vital part of an argument is simple and obvious, it isn’t there to stand as a profound truth, but as a way of completing the argument. There are many people who think humans are more important than animals, and in one way they’re right, while in another way they’re wrong. I have to spell out why it’s right in one way and wrong in another. By comparing the disabled person to the animal with superior functionality (in all aspects), I show that that there’s a kind of bias involved in many people’s approach which needs to be eliminated.
  “I am a bit offended, and I think this offense is coming from the feeling that you are missing the point. Our ethical discourse does not revolve around whether babies should be eaten or not. It covers topics such as “what does it mean for something to be right?” and “how can we compactly describe morality (in the programmer’s sense)?”. Some of the offense could also be coming from “outsider comes and tells us that morality is Simple when it’s really actually Complicated.”″
  So where is that complexity? What point am I missing? This is what I’ve come here searching for, and it isn’t revealing itself. What I’m actually finding is a great long series of mistakes which people have built upon, such as the Mere Addition Paradox. The reality is that there’s a lot of soft wood that needs replacing.
  “Ah, so you don’t really have to bite any bullets here—you’ve just given a long explanation for why our existing moral intuitions are objectively valid. How reassuring.”
  What that explanation does is show that there’s more harm involved than the obvious harm which people tend to focus on. A correct analysis always needs to account for all the harm. That’s why the death of a human is worse than the death of a horse. Torturing a horse is equal to torturing a person to create the same amount of suffering in them, but killing them is not equal.
  ″ “What the equality aspect requires is that a torturer of animals should be made to suffer as much as the animals he has tortured.” --> ”...really? You’re claiming that your morality system as described requires retributive justice?”
  I should have used a different wording there: he deserves to suffer as much as the animals he’s tortured. It isn’t required, but may be desirable as a way of deterring others.
  “How does that follow from the described scenario at all? This has given up the pretense of a Principia Moralitica and is just asserting conventional morality without any sort of reasoning, now.”
  You can’t demolish a sound argument by jumping on a side issue. My method is sound and correct.
  “The issue is defining exactly what counts as a loss and what counts as a gain, to the point that it can be programmed into a computer and that computer can very reliably classify situations outside of its training data, even outside of our own experience. This is one of the core Problems which this community has noticed and is working on. I would recommend reading more before trying to present morality to LW.”
  To work out what the losses and gains are, you need to collect evidence from people who know how two different things compare. When you have many different people who give you different information about how those two different things compare, you can average them. You can do this millions of times, taking evidence from millions of people and produce better and better data as you collect and crunch more of it. This is a task for AGI to carry out, and it will do a better job than any of the people who’ve been trying to do it to date. This database of knowledge of suffering and pleasure then combines with my method to produce answers to moral questions which are the most probably correct based on the available information. That is just about all there is to it, except that you do need to apply maths to how those computations are carried out. That’s a job for mathematicians who specialise in game theory (or for AGI which should be able to find the right maths for it itself).
  ″ “Selfless” anthropomorphizes AI.”
  Only if you misunderstand the way I used the word. Selfless here simply means that it has no self—the machine cannot understand feelings in any direct way because there is no causal role for any sentience that might be in the machine to influence its thoughts at all (which means we can regard the system as non-sentient).
  “There is no fundamental internal difference between “maximize the number of paperclips” and “maximize the happiness of intelligent beings”—both are utility functions plus a dynamic. One is not more “selfless” than another simply because it values intelligent life highly.”
  Indeed there isn’t. If you want to program AGI to be moral though, you make sure it focuses on harm management rather than paperclip production (which is clearly not doing morality)
  “The issue is that there are many ways to carve up reality into Good and Bad, and only a very few of those ways results in an AI which does anything like what we want.”
  In which case, it’s easy to reject the ones that don’t offer what we want. The reality is that if we put the wrong kind of “morality” into AGI, it will likely end up killing lots of people that it shouldn’t. If you run it on a holy text, it might exterminate all Yazidis. What I want to see is a list of proposed solutions to this morality issue ranked in order of which look best, and I want to see a similar league table of the biggest problems with each of them. Utilitarianism, for example, has been pushed down by the Mere Addition Paradox, but that paradox has now been resolved and we should see utilitarianism’s score go up as a result. Something like this is needed as a guide to all the different people out there who are trying to build AGI, because some of them will succeed and they won’t be experts in ethics. At least if they make an attempt at governing it using the method at the top of the league, we stand a much better chance of not being wiped out by their creations.
  “Perhaps the AI could check with us to be sure, but a. did we tell it to check with us?, b. programmer manipulation is a known risk, and c. how exactly will it check its planned future against a brain? Naive solutions to issue c. run the risk of wireheading and other outcomes that will produce humans which after the factappreciate the modification but which we, before the modification would barely consider human at all. This is very non-trivial.”
  AGI will likely be able to make better decisions than the people it asks permission from even if it isn’t using the best system for working out morality, so it may be a moral necessity to remove humans from the loop. We have an opportunity to use AGI to check rival AGI system to check for malicious programming, although it’s hard to check on devices made by rogue states, and one of the problems we face is that these things will go into use as soon as they are available without waiting for proper moral controls—rogue states will put them straight into the field and we will have to respond to that by not delaying ours. We need to nail morality urgently and make sure the best available way of handling it is available to all who want to fit it.
  “It is possible, but it’s also possible for the robot to come to an entirely different conclusion. And even if you think that it would be inherently morally wrong for the robot to kill all humans, it won’t feel wrong from the inside—there’s no reason to expect a non-aligned machine intelligence to spontaneously align itself with human wishes.”
  The machine will do what it’s programmed to do. It’s main task is to apply morality to people by stopping people doing immoral things, making stronger interventions for more immoral acts, and being gentle when dealing with trivial things. There is certainly no guarantee that a machine will do this for us though unless it is told to do so, although if it understands the existence of sentience and the need to manage harm, it might take it upon itself to do the job we would like it to do. That isn’t something we need to leave to chance though—we should put the moral governance in ROM and design the hardware to keep enforcing it.
  “(See The Bottom Line.)”
  Will do and will comment afterwards as appropriate.
  “Why will the AGI share your moral intuitions? (I’ve said something similar to this enough times, but the same criticism applies.)”
  They aren’t intuitions—each change in outcome is based on different amounts of information being available, and each decision is based on weighing up the weighable harm. It is simply the application of a method.
  “Also, your model of morality doesn’t seem to have room for normative responsibility, so where did “it’s only okay to run over a child if the child was there on purpose” come from?”
  Where did you read that? I didn’t write it.
  “It’s still hurting a child just as much, no matter whether the child was pushed or if they were simply unaware of the approaching car.”
  If the child was pushed by a gang of bullies, that’s radically different from the child being bad at judging road safety. If the option is there to mow down the bullies that pushed a child onto the road instead of mowing down that child, that is the option that should be taken (assuming no better option exists).
  “It makes sense to you to override the moral system and punish the exploiter, because you’re using this system pragmatically. An AI with your moral system hard-coded would not do that. It would simply feed the utility monster, since it would consider that to be the most good it could do.”
  I can’t see the link there to anything I said, but if punishing an exploiter leads to a better outcome, why would my system not choose to do that? If you were to live the lives of the expolited and exploiter, you would have a better time if the exploiter is punished just the right amount to give you the best time overall as all the people involved (and this includes a deterrence effect on other would-be exploiters.
  “I agree that everyday, in-practice morality is like this, but there are other important questions about the nature and content of morality that you’re ignoring.”
  Then let’s get to them. That’s what I came here to look for.
  “”What is yet to be worked out is the exact wording that should be placed in AGI systems to build either this rule or the above methodology into them” --->This is the Hard Problem, and in my view one of the two Hard Problems of AGI.”
  Actually, I was wrong about that. If you look at the paragraph in brackets at the end of my post (the main blog post at the top of this page), I set out the wording of a proposed rule and wondered if it amounted to the same thing as the method I’d outlined. Over the course of writing later parts of this series of blog posts, I realised that that attempted wording was making the same mistake as many of the other proposed solutions (various types of utilitarianism). These rules are an attempt to put the method into a compact form, but method already is the rule, while these compact versions risk introducing errors. Some of them may produce the same results for any situation, but others may be some way out. There is also room for there to be a range of morally acceptable solutions with one rule setting one end of the acceptable range and another rule setting the other. For example, in determining optimal population size, average utilitarianism and total utilitarianism look as if they provide slightly different answers, but they’ll be very similar and it would do little harm to allow the population to wander between the two values. If all moral questions end up with a small range with very little difference between the extremes of that range, we’re not going to worry much about getting it very slightly wrong if we still can’t agree on which end of the range is slightly wrong. What we need to do is push these different models into places where they might show us that they’re way wrong, because then it will be obvious. If that’s already been done, it should all be there in the league tables of problems under each entry in the league table of proposed systems of determining morality.
  “Morality seems basic to you, since our brains and concept-space and language are optimized for social things like that, but morality has a very high complexity as measured mathematically, which makes it difficult to describe to something that’s not human. (This is similar to the formalizations of Occam’s Razor, if you want to know more.)”
  If we were to go to an alien planet and were asked by warring clans of these aliens to impose morality on them to make their lives better, do you not think we could do that without having to feel the way they do about things? We would be in the same position as the machines that we want to govern us. What we’d do is ask these aliens how they feel in different situations and how much it hurts them or pleases them. We’d build a database of knowledge of these feelings that they have based on their testimony, and the accuracy would increase the more we collect data from them. We then apply my method and try to produce the best outcome on the basis of there only being one player who has to get the best out of the situation. That needs the application of game theory. It’s all maths.
  “If the AI has the correct utility function, it will not say “but this is illogical/useless” and then reject it. Far more likely is that the AI never “cares about” humans in the first place.”
  It certainly won’t care about us, but then it won’t care about anything (including its self-less self). It’s only purpose will be to do what we’ve asked it to do, even if it isn’t convinced that sentience is real and that morality has a role.