gjm comments on Why the AI Alignment Problem Might be Unsolvable?

gjm 25 Mar 2019 20:48 UTC
5 points
1. Some varieties of moral thinking whose diversity doesn’t seem to me to be captured by your eye-for-eye/golden-rule/max-flourish/min-suffer schema:
- For some people, morality is all about results (“consequentialists”). For some, it’s all about following some moral code (“deontologists”). For some, it’s all about what sort of person you are (“virtue ethicists”). Your Minnie and Maxie are clearly consequentialists; perhaps Ivan is a deontologist; it’s hard to be sure what Goldie is; but these different outlooks can coexist with a wide variety of object-level moral preferences and your four certainly don’t cover all the bases here.
- Your four all focus on moral issues surrounding _harming and benefiting_ people. Pretty much everyone does care about those things, but other very different things are important parts of some people’s moral frameworks. For instance, some people believe in a god or gods and think _devotion to their god(s)_ more important than anything else; some people attach tremendous importance to various forms of sexual restraint (only within marriage! only between a man and a woman! only if it’s done in a way that could in principle lead to babies! etc.); some people (perhaps this is part of where Ivan is coming from, but you can be quite Ivan-like by other means) have moral systems in which _honour_ is super-important and e.g. if someone insults you then you have to respond by taking them down as definitively as possible.
2. (You’re answering with “Because …” but I don’t see what “why?” question I asked, either implicitly or explicitly, so at least one of us has misunderstood something here.) (a) I agree that there are lots of different ways in which convergence could happen, but I don’t see why that in any way weakens the point that, one way or another, it _could_ happen. (b) It is certainly true that Maxie and Minnie, as they are now, disagree about some important things; again, that isn’t news. The point I was trying to make is that it might turn out that as you give Maxie and Minnie more information, a deeper understanding of human nature, more opportunities to talk to one another, etc., they stop disagreeing, and if that happens then we might do OK to follow whatever system they end up with.
3. I’m not sure what you mean about “values being eliminated from existence”; it’s ambiguous. Do you mean (i) there stop being people around who have those values or (ii) the world proceeds in a way that doesn’t, whether or not anyone cares, tend to satisfy those values? Either way, note that “that range” was the _normal range of respected human values_. Right now, there are no agents around (that we know of) whose values are entirely outside the range of human values, and we’re getting on OK. There are agents (e.g., some psychopaths, violent religious zealots, etc.) whose values are within the range of human values but outside the range of _respected human values_, and by and large we try to give them as little influence as possible. To be clear, I’m not proposing “world ruled by an entity whose values are similar to those of some particular human being generally regarded as decent” as a _triumphant win for humanity_, but it’s not an _obvious catastrophe_ either and so far as I can tell the sort of issue you’re raising presents no obstacle to that sort of outcome.
- Sailor Vulcan 26 Mar 2019 0:50 UTC
  1 point
  Parent
  1a. Deontology/virtue ethics is a special case of consequentialism. The reason for following deontological rules is because the consequences that result from following deontological rules almost always tend to be better than the consequences of not following deontological rules. The exceptions where it is wiser to not follow deontological rules are generally rare.
  1b. Those are social mores, not morals. If a human is brainwashed into shutting down the forces of empathy and caring within themselves, then they can be argued into treating any social more as a moral rule.
  2. Sorry I should have started that paragraph by repeating what you said, just to make it clear what I was responding to. I don’t think the four moralities converge when everyone has more information because....
  I will also note that while Ivan might adopt Maximize flourishing and/or Minimize Suffering on pragmatic (aka instrumental) grounds, Ivan is a human, and humans don’t really have terminal values. If instead Ivan was an AI programmed with Eye-for-an-Eye, it might temporarily adopt Maximize Flourishing and/or Minimize Suffering as an instrumental goal, and then go back to Eye-for-an-Eye later.
  3a. “Suppose that hope turns out to be illusory and there’s no such thing as a single set of values that can reasonably claim to be in any sense the natural extrapolation of everyone’s values.” Those were your exact words. Now, if there is no such thing as a single set of values that are the natural extrapolation of everyone’s values, then choosing a subset of everyone’s values which are in the normal range of respected human values for the AI to optimize for would mean that all the human values that are not in the normal range would be eliminated. If the AI doesn’t have a term for something in its utility function, it has no reason to let that thing waste resources and space it can use for things that are actually in its utility function. And that’s assuming that a value like “freedom and self-determination for humans” is something that can actually be correctly programmed into an AI, which I’m pretty sure it can’t because it would mean that the AI would have to value the act of doing nothing most of the time and only activating when things are about to go drastically wrong. And that wouldn’t be an optimization process.
  3b. “Either way, note that “that range” was the _normal range of respected human values_. Right now, there are no agents around (that we know of) whose values are entirely outside the range of human values, and we’re getting on OK.”
  You just switched from “outside the normal range of respected human values” to “entirely outside the range of human values”. Those are not at all the same thing. Furthermore, the scenario you described as “pretty good” was one where it stills turns out possible to make a superintelligence whose values are, and remain, within the normal range of generally-respected human values.
  Within the normal range of generally-respected human values. NOT within the entire range of human values. If we were instead talking about a superintelligence that was programmed with the entire range of human values, rather than only a subset of them, then that would be a totally different scenario and would require an entirely different argument to support it than the one you were making.
  - gjm 26 Mar 2019 3:19 UTC
    9 points
    Parent
    1. Neither deontology nor virtue ethics is a special case of consequentialism. Some people really, truly do believe that sometimes the action with worse consequences is better. There are, to be sure, ways for consequentialists sometimes to justify deontologists’ rules, or indeed their policy of rule-following, on consequentialist grounds—and for that matter there are ways to construct rule-based systems that justify consequentialism. (“The one moral rule is: Do whatever leads to maximal overall flourishing!”) They are still deeply different ways of thinking about morality.
    You consider questions of sexual continence, honour, etc., “social mores, not morals”, but I promise you there are people who think of such things as morals. You think such people have been “brainwashed”, and perhaps they’d say the same about you; that’s what moral divergence looks like.
    2. I think that if what you wrote was intended to stand after “I think there is no convergence of moralities because …” then it’s missing a lot of steps. I should maybe repeat that I’m not asserting that there is convergence; quite likely there isn’t. But I don’t think anything you’ve said offers any strong reason to think that there isn’t.
    3. Once again, I think you are not being clear about the distinction between the things I labelled (i) and (ii), and I think it matters. And, more generally, it feels as if we are talking past one another: I get the impression that either you haven’t understood what I’m saying, or you think I haven’t understood what you’re saying.
    Let’s be very concrete here. Pick some human being whose values you find generally admirable. Imagine that we put that person in charge of the world. We’ll greatly increase their intelligence and knowledge, and fix any mental deficits that might make them screw up more than they need to, and somehow enable them to act consistently according to those admirable values (rather than, e.g., turning completely selfish once granted power, as real people too often do). Would you see that as an outcome much better than many of the nightmare misaligned-AI scenarios people worry about?
    I would; while there’s no human being I would altogether trust to be in charge of the universe, no matter how they might be enhanced, I think putting almost any human being in charge of the universe would (if they were also given the capacity to do the job without being overwhelmed) likely be a big improvement over (e.g.) tiling the universe with paperclips or little smiley human-looking faces, or over many scenarios where a super-powerful AI optimizes some precisely-specified-but-wrong approximation to one aspect of human values.
    I would not expect such a person in that situation to eliminate people with different values from theirs, or to force everyone to live according to that person’s values. I would not expect such a person in that situation to make a world in which a lot of things I find essential have been eliminated. (Would you? Would you find such behaviour generally admirable?)
    Any my point here is that nothing in your arguments displays shows any obstacle to doing essentially that. You argue that we can’t align an AI’s values with those of all of humanity because “all of humanity” has too many different diverging values, and that’s true, but there remains the possibility that we could align them with those of some of humanity, to something like the extent that any individual’s values are aligned with those of some of humanity, and even if that’s the best we can hope for the difference between that and (what might be the default, if we ever make any sort of superintelligent AI) aligning its values with those of none of humanity is immense.
    (Why am I bothering to point that out? Because it looks to me as if you’re trying to argue that worrying about “value alignment” is likely a waste of time because there can be no such thing as value alignment; I say, on the contrary, that even though some notions of value alignment are obviously unachievable and some others may be not-so-obviously unachievable, still others are almost certainly achievable in principle and still valuable. Of course, I may have misunderstood what you’re actually arguing for: that’s the risk you take when you choose to speak in parables without explaining them.)
    I feel I need to defend myself on one point. You say “You switched from X to Y” as if you think I either failed to notice the change or else was trying to pull some sort of sneaky bait-and-switch. Neither is the case, and I’m afraid I think you didn’t understand the structure of my argument. I wanted to argue “we could do Thing One, and that would be OK”. I approached this indirectly, by first of all arguing that we already have Thing Two, which is somewhat like Thing One, and is OK, and then addressing the difference between Thing One and Thing Two. But you completely ignored the bit where I addressed the difference, and just said “oi, there’s a difference” as if I had no idea (or was pretending to have no idea) that there is one.
    - Sailor Vulcan 27 Mar 2019 5:25 UTC
      1 point
      Parent
      1. On the deontology/virtue ethics vs consequentialism thing, you’re right I don’t know how I missed that, thanks!
      1a. I’ll have to think about that a bit more.
      2. Well, if we were just going off of the four moralities I described, then I already named two examples where two of those moralities are unable to converge: a pure flourishing maximizer wouldn’t want to mercy kill the human species, but a pure suffering minimizer would. A pure flourishing maximizer would be willing to have one person tortured forever if that was a necessary prerequisite for uplifting the rest of the human species into a transhumanist utopia. A suffering minimizer would not. Even if the four moralities I described only cover a small fraction of moral behaviors, then wouldn’t that still be a hard counterexample to the idea that there is convergence?
      3. I think when you said “within the normal range of generally-respected human values”, I took that literally, meaning I thought it excluded values which were not in the normal range and not generally respected even if they are things like “reading Adult My Little Pony fanfiction”. Not every value which isn’t well respected or in the normal range would make the world a better place through its removal. I thought that would be self-evident to everyone here, and so I didn’t explain it. And then it looked to me like you were trying to justify the removal of all values which aren’t generally respected or within the normal range as being “okay”. So when you said ” Right now, there are no agents around (that we know of) whose values are entirely outside the range of human values, and we’re getting on OK.” I thought it was intended to be in support of the removal of all values which aren’t well respected or in the normal range. But if you’re trying to support the removal of niche values in particular, saying that current humans are getting along fine with their current whole range of values, which one would presume must include the niche values, does not make sense.
      About to fall asleep, I’ll write more of my response later.
      - gjm 27 Mar 2019 16:42 UTC
        2 points
        Parent
        2. Again, there are plenty of counterexamples to the idea that human values have already converged. The idea behind e.g. “coherent extrapolated volition” is that (a) they might converge given more information, clearer thinking, and more opportunities for those with different values to discuss, and (b) we might find the result of that convergence acceptable even if it doesn’t quite match our values now.
        3. Again, I think there’s a distinction you’re missing when you talk about “removal of values” etc. Let’s take your example: reading adult MLP fanfiction. Suppose the world is taken over by some being that doesn’t value that. (As, I think, most humans don’t.) What are the consequences for those people who do value it? Not necessarily anything awful, I suggest. Not valuing reading adult MLP fanfiction doesn’t imply (e.g.) an implacable war against those who do. Why should it? It suffices that the being that takes over the world cares about people getting what they want; in that case, if some people like to write adult MLP fanfiction and some people like to read it, our hypothetical superpowerful overlord will likely prefer to let those people get on with it.
        But, I hear you say, aren’t those fanfiction works made of—or at least stored in—atoms that the Master of the Universe can use for something else? Sure, they are, and if there’s literally nothing in the MotU’s values to stop it repurposing them then it will. But there are plenty of things that can stop the MotU repurposing those atoms other than its own fondness for adult MLP fanfiction—such as, I claim, a preference for people to get what they want.
        There might be circumstances in which the MotU does repurpose those atoms: perhaps there’s something else it values vastly more that it can’t get any other way. But the same is true right here in this universe, in which we’re getting on OK. If your fanfiction is hosted on a server that ends up in a war zone, or a server owned by a company that gets sold to Facebook, or a server owned by an individual in the US who gets a terrible health problem and needs to sell everything to raise funds for treatment, then that server is probably toast, and if no one else has a copy then the fanfiction is gone. What makes a superintelligent AI more dangerous here, it seems to me, is that maybe no one can figure out how to give it even humanish values. But that’s not a problem that has much to do with the divergence within the range of human values: again, “just copy Barack Obama’s values” (feel free to substitute someone whose values you like better, of course) is a counterexample, because most likely even an omnipotent Barack Obama would not feel the need to take away your guns^H^H^H^Hfanfiction.
        To reiterate the point I think you’ve been missing: giving supreme power to (say) a superintelligent AI doesn’t remove from existence all those people who value things it happens not to care about, and if it cares about their welfare then we should not expect it to wipe them out or to wipe out the things they value.