gjm comments on Why the AI Alignment Problem Might be Unsolvable?

gjm 26 Mar 2019 3:19 UTC
9 points
1. Neither deontology nor virtue ethics is a special case of consequentialism. Some people really, truly do believe that sometimes the action with worse consequences is better. There are, to be sure, ways for consequentialists sometimes to justify deontologists’ rules, or indeed their policy of rule-following, on consequentialist grounds—and for that matter there are ways to construct rule-based systems that justify consequentialism. (“The one moral rule is: Do whatever leads to maximal overall flourishing!”) They are still deeply different ways of thinking about morality.
You consider questions of sexual continence, honour, etc., “social mores, not morals”, but I promise you there are people who think of such things as morals. You think such people have been “brainwashed”, and perhaps they’d say the same about you; that’s what moral divergence looks like.
2. I think that if what you wrote was intended to stand after “I think there is no convergence of moralities because …” then it’s missing a lot of steps. I should maybe repeat that I’m not asserting that there is convergence; quite likely there isn’t. But I don’t think anything you’ve said offers any strong reason to think that there isn’t.
3. Once again, I think you are not being clear about the distinction between the things I labelled (i) and (ii), and I think it matters. And, more generally, it feels as if we are talking past one another: I get the impression that either you haven’t understood what I’m saying, or you think I haven’t understood what you’re saying.
Let’s be very concrete here. Pick some human being whose values you find generally admirable. Imagine that we put that person in charge of the world. We’ll greatly increase their intelligence and knowledge, and fix any mental deficits that might make them screw up more than they need to, and somehow enable them to act consistently according to those admirable values (rather than, e.g., turning completely selfish once granted power, as real people too often do). Would you see that as an outcome much better than many of the nightmare misaligned-AI scenarios people worry about?
I would; while there’s no human being I would altogether trust to be in charge of the universe, no matter how they might be enhanced, I think putting almost any human being in charge of the universe would (if they were also given the capacity to do the job without being overwhelmed) likely be a big improvement over (e.g.) tiling the universe with paperclips or little smiley human-looking faces, or over many scenarios where a super-powerful AI optimizes some precisely-specified-but-wrong approximation to one aspect of human values.
I would not expect such a person in that situation to eliminate people with different values from theirs, or to force everyone to live according to that person’s values. I would not expect such a person in that situation to make a world in which a lot of things I find essential have been eliminated. (Would you? Would you find such behaviour generally admirable?)
Any my point here is that nothing in your arguments displays shows any obstacle to doing essentially that. You argue that we can’t align an AI’s values with those of all of humanity because “all of humanity” has too many different diverging values, and that’s true, but there remains the possibility that we could align them with those of some of humanity, to something like the extent that any individual’s values are aligned with those of some of humanity, and even if that’s the best we can hope for the difference between that and (what might be the default, if we ever make any sort of superintelligent AI) aligning its values with those of none of humanity is immense.
(Why am I bothering to point that out? Because it looks to me as if you’re trying to argue that worrying about “value alignment” is likely a waste of time because there can be no such thing as value alignment; I say, on the contrary, that even though some notions of value alignment are obviously unachievable and some others may be not-so-obviously unachievable, still others are almost certainly achievable in principle and still valuable. Of course, I may have misunderstood what you’re actually arguing for: that’s the risk you take when you choose to speak in parables without explaining them.)
I feel I need to defend myself on one point. You say “You switched from X to Y” as if you think I either failed to notice the change or else was trying to pull some sort of sneaky bait-and-switch. Neither is the case, and I’m afraid I think you didn’t understand the structure of my argument. I wanted to argue “we could do Thing One, and that would be OK”. I approached this indirectly, by first of all arguing that we already have Thing Two, which is somewhat like Thing One, and is OK, and then addressing the difference between Thing One and Thing Two. But you completely ignored the bit where I addressed the difference, and just said “oi, there’s a difference” as if I had no idea (or was pretending to have no idea) that there is one.
- Sailor Vulcan 27 Mar 2019 5:25 UTC
  1 point
  Parent
  1. On the deontology/virtue ethics vs consequentialism thing, you’re right I don’t know how I missed that, thanks!
  1a. I’ll have to think about that a bit more.
  2. Well, if we were just going off of the four moralities I described, then I already named two examples where two of those moralities are unable to converge: a pure flourishing maximizer wouldn’t want to mercy kill the human species, but a pure suffering minimizer would. A pure flourishing maximizer would be willing to have one person tortured forever if that was a necessary prerequisite for uplifting the rest of the human species into a transhumanist utopia. A suffering minimizer would not. Even if the four moralities I described only cover a small fraction of moral behaviors, then wouldn’t that still be a hard counterexample to the idea that there is convergence?
  3. I think when you said “within the normal range of generally-respected human values”, I took that literally, meaning I thought it excluded values which were not in the normal range and not generally respected even if they are things like “reading Adult My Little Pony fanfiction”. Not every value which isn’t well respected or in the normal range would make the world a better place through its removal. I thought that would be self-evident to everyone here, and so I didn’t explain it. And then it looked to me like you were trying to justify the removal of all values which aren’t generally respected or within the normal range as being “okay”. So when you said ” Right now, there are no agents around (that we know of) whose values are entirely outside the range of human values, and we’re getting on OK.” I thought it was intended to be in support of the removal of all values which aren’t well respected or in the normal range. But if you’re trying to support the removal of niche values in particular, saying that current humans are getting along fine with their current whole range of values, which one would presume must include the niche values, does not make sense.
  About to fall asleep, I’ll write more of my response later.
  - gjm 27 Mar 2019 16:42 UTC
    2 points
    Parent
    2. Again, there are plenty of counterexamples to the idea that human values have already converged. The idea behind e.g. “coherent extrapolated volition” is that (a) they might converge given more information, clearer thinking, and more opportunities for those with different values to discuss, and (b) we might find the result of that convergence acceptable even if it doesn’t quite match our values now.
    3. Again, I think there’s a distinction you’re missing when you talk about “removal of values” etc. Let’s take your example: reading adult MLP fanfiction. Suppose the world is taken over by some being that doesn’t value that. (As, I think, most humans don’t.) What are the consequences for those people who do value it? Not necessarily anything awful, I suggest. Not valuing reading adult MLP fanfiction doesn’t imply (e.g.) an implacable war against those who do. Why should it? It suffices that the being that takes over the world cares about people getting what they want; in that case, if some people like to write adult MLP fanfiction and some people like to read it, our hypothetical superpowerful overlord will likely prefer to let those people get on with it.
    But, I hear you say, aren’t those fanfiction works made of—or at least stored in—atoms that the Master of the Universe can use for something else? Sure, they are, and if there’s literally nothing in the MotU’s values to stop it repurposing them then it will. But there are plenty of things that can stop the MotU repurposing those atoms other than its own fondness for adult MLP fanfiction—such as, I claim, a preference for people to get what they want.
    There might be circumstances in which the MotU does repurpose those atoms: perhaps there’s something else it values vastly more that it can’t get any other way. But the same is true right here in this universe, in which we’re getting on OK. If your fanfiction is hosted on a server that ends up in a war zone, or a server owned by a company that gets sold to Facebook, or a server owned by an individual in the US who gets a terrible health problem and needs to sell everything to raise funds for treatment, then that server is probably toast, and if no one else has a copy then the fanfiction is gone. What makes a superintelligent AI more dangerous here, it seems to me, is that maybe no one can figure out how to give it even humanish values. But that’s not a problem that has much to do with the divergence within the range of human values: again, “just copy Barack Obama’s values” (feel free to substitute someone whose values you like better, of course) is a counterexample, because most likely even an omnipotent Barack Obama would not feel the need to take away your guns^H^H^H^Hfanfiction.
    To reiterate the point I think you’ve been missing: giving supreme power to (say) a superintelligent AI doesn’t remove from existence all those people who value things it happens not to care about, and if it cares about their welfare then we should not expect it to wipe them out or to wipe out the things they value.