I started looking through some of the papers and so far I don’t feel enlightened.
I’ve never been able to tell whether I don’t understand Kantian ethics, or Kantian ethics is just stupid. Take Prospects For a Kantian Machine. The first part is about building a machine whose maxims satisfy the universalizability criterion: that they can be universalized without contradicting themselves.
But this seems to rely a lot on being very good at parsing categories in exactly the right way to come up with the answer you wanted originally.
For example, it seems reasonable to have maxims that only apply to certain portions of the population, for example: “I, who am a policeman, will lock up this bank robber awaiting trial in my county jail” generalizes to “Other policemen will also lock up bank robbers awaiting trial in their county jails” if you’re a human moral philosopher who knows how these things are supposed to work.
But I don’t see what’s stopping a robot from coming up with “Everyone will lock up everyone else” or “All the world’s policemen will descend upon this one bank robber and try to lock him up in their own county jails”. After all, Kant universalizes “I will deceive this murderer so he can’t find his victim” to “Everyone will deceive everyone else all the time” and not to “Everyone will deceive murderers when a life is at stake”. So if a robot were to propose “I, a robot, will kill all humans”, why should we expect it to universalize it to “Everyone will kill everyone else” rather than “Other robots will also kill all humans”, which just means the robot gets help?
And even if it does universalize correctly, in the friendly AI context it need not be a contradiction! If this is a superintelligent AI we’re talking about, then even in the best case scenario where everything goes right the maxim “I will try to kill all humans” will universalize to “Everyone will try to kill everyone else”. Kant said this was contradictory in that every human will then be dead and none of them will gain the desserts of their murder—but in an AI context this isn’t contradictory at all: the superintelligence will succeed at killing everyone else, the actions of the puny humans will be irrelevant, and the AI will be just fine.
(actually, just getting far enough to make either of those objections involves hand-waving away about thirty other intractable problems you would need just to get that far; but these seemed like the most pertinent).
I’ll look through some of the other papers later, but so far I’m not seeing anything to make me think Eliezer’s opinion of the state of the field was overly pessimistic.
Allen—Prolegomena to Any Future Moral Agent places a lot of emphasis on figuring out of a machine can be truly moral, in various metaphysical senses like “has the capacity to disobey the law, but doesn’t” and “deliberates in a certain way”. Not only is it possible that these are meaningless, but in a superintelligence the metaphysical implications should really take second-place to the not-getting-turned-into-paperclips implications.
He proposes a moral Turing Test, where we call a machine moral if it can answer moral questions indistinguishably from a human. But Clippy would also pass this test, if a consequence of passing was that the humans lowered their guard/let him out of the box. In fact, every unfriendly superintelligence with a basic knowledge of human culture and a motive would pass.
Utilitarianism considered difficult to implement because it’s computationally impossible to predict all consequences. Given that any AI worth its salt would have a module for predicting the consequences of its actions anyway, and that the potential danger of the AI is directly related to how good this module is, that seems like a non-problem. It wouldn’t be perfect, but it would do better than humans, at least.
Deontology, same problem as the last one. Virtue ethics seems problematic depending on the AI’s motivation—if it were motivated to turn the universe to paperclips, would it be completely honest about it, kill humans quickly and painlessly and with a flowery apology, and declare itself to have exercised the virtues of honesty, compassion, and politeness? Evolution would give us something at best as moral as humans and probably worse—see the Sequence post about the tanks in cloudy weather.
Mechanized Deontic Logic is pretty okay, despite the dread I had because of the name. I’m no good at formal systems, but as far as I can understand it looks like a logic for proving some simple results about morality: the example they give is “If you should see to it that X, then you should see to it that you should see to it that X.”
I can’t immediately see a way this would destroy the human race, but that’s only because it’s nowhere near the point where it involves what humans actually think of as “morality” yet.
Utilibot Project is about creating a personal care robot that will avoid accidentally killing its owner by representing the goal of “owner health” in a utilitarian way. It sounds like it might work for a robot with a very small list of potential actions (like “turn on stove” and “administer glucose”) and a very specific list of owner health indicators (like “hunger” and “blood glucose level”), but it’s not very relevant to the broader Friendly AI program.
Having read as many papers as I have time to before dinner, my provisional conclusion is that Vladimir Nesov hit the nail on the head
I don’t disagree with much of anything you’ve said here, by the way.
Remember that I’m writing a book that, for most of its length, will systematically explain why the proposed solutions in the literature won’t work.
The problem is that SIAI is not even engaging in that discussion. Where is the detailed explanation of why these proposed solutions won’t work? I don’t get the impression someone like Yudkowsky has even read these papers, let alone explained why the proposed solutions won’t work. SIAI is just talking a different language than the professional machine ethics community is.
Most of the literature on machine ethics is not that useful, but that’s true of almost any subject. The point of a literature hunt is to find the gems here and there that genuinely contribute to the important project of Friendly AI. Another points is to interact with the existing literature and explain to people why it’s not going to be that easy.
My sentiment about the role of engaging existing literature on machine ethics is analogous to what you describe in a recent post on your blog. Particularly this:
Oh God, you think. That’s where the level of discussion is, on this planet.
You either push the boundaries, or fight the good fight. And the good fight is best fought by writing textbooks and opening schools, not by public debates with distinguished shamans. But it’s not entirely fair, since some of machine ethics addresses a reasonable problem of making good-behaving robots, which just happens to have the same surface feature of considering moral valuation of decisions of artificial reasoners, but on closer inspection is mostly unrelated to the problem of FAI.
Sure. One of the hopes of my book is, as stated earlier, to bring people up to where Eliezer Yudkowsky was circa 2004.
Also, I worry that something is being overlooked by the LW / SIAI community because the response to suggestions in the literature has been so quick and dirty. I’m on the prowl for something that’s been missed because nobody has done a thorough literature search and detailed rebuttal. We’ll see what turns up.
Every sufficiently smart person who thinks about Kantian ethics comes up with this objection. I don’t believe it’s possible to defend against it entirely. However...
After all, Kant universalizes “I will deceive this murderer so he can’t find his victim” to “Everyone will deceive everyone else all the time” and not to “Everyone will deceive murderers when a life is at stake”.
That may be what Kant actually says (does he?) but if he does then I think he’s wrong about his own theory. As I understand it, what you’re supposed to do is look at the bit of reasoning which is actually causing you to want to do X and see whether that generalizes, not cast around for a bit of reasoning which would (or in this case, would not) generalize, and then pretend to be basing your action on that.
In the example you mention, you should only generalize to “everyone will deceive everyone all the time” if what you’re considering doing is deceiving this person simply because he’s a person. If you want to deceive him because of his intention to commit murder, and would not want to otherwise, then the thing you generalize must have this feature.
Similarly, I might try to justify lying to someone this morning on the basis that it generalizes to “I, who am AlephNeil, always lies on the morning of 13th day of March 2011 if it is to my advantage” which is both consistent and advantageous (to me). But really I would be lying purely because it’s to my advantage—the date and time, and the fact that I am AlephNeil, don’t enter into the computation.
Yes, sorry. “Maxim specification” won’t give you much, but variations on that will. People don’t usually write “the problem of maxim specification” but instead things like ”...specifying the maxim...” or “the maxim… specified...” and so on. It in general isn’t easily Googled like “is-ought gap” is.
I started looking through some of the papers and so far I don’t feel enlightened.
I’ve never been able to tell whether I don’t understand Kantian ethics, or Kantian ethics is just stupid. Take Prospects For a Kantian Machine. The first part is about building a machine whose maxims satisfy the universalizability criterion: that they can be universalized without contradicting themselves.
But this seems to rely a lot on being very good at parsing categories in exactly the right way to come up with the answer you wanted originally.
For example, it seems reasonable to have maxims that only apply to certain portions of the population, for example: “I, who am a policeman, will lock up this bank robber awaiting trial in my county jail” generalizes to “Other policemen will also lock up bank robbers awaiting trial in their county jails” if you’re a human moral philosopher who knows how these things are supposed to work.
But I don’t see what’s stopping a robot from coming up with “Everyone will lock up everyone else” or “All the world’s policemen will descend upon this one bank robber and try to lock him up in their own county jails”. After all, Kant universalizes “I will deceive this murderer so he can’t find his victim” to “Everyone will deceive everyone else all the time” and not to “Everyone will deceive murderers when a life is at stake”. So if a robot were to propose “I, a robot, will kill all humans”, why should we expect it to universalize it to “Everyone will kill everyone else” rather than “Other robots will also kill all humans”, which just means the robot gets help?
And even if it does universalize correctly, in the friendly AI context it need not be a contradiction! If this is a superintelligent AI we’re talking about, then even in the best case scenario where everything goes right the maxim “I will try to kill all humans” will universalize to “Everyone will try to kill everyone else”. Kant said this was contradictory in that every human will then be dead and none of them will gain the desserts of their murder—but in an AI context this isn’t contradictory at all: the superintelligence will succeed at killing everyone else, the actions of the puny humans will be irrelevant, and the AI will be just fine.
(actually, just getting far enough to make either of those objections involves hand-waving away about thirty other intractable problems you would need just to get that far; but these seemed like the most pertinent).
I’ll look through some of the other papers later, but so far I’m not seeing anything to make me think Eliezer’s opinion of the state of the field was overly pessimistic.
Allen—Prolegomena to Any Future Moral Agent places a lot of emphasis on figuring out of a machine can be truly moral, in various metaphysical senses like “has the capacity to disobey the law, but doesn’t” and “deliberates in a certain way”. Not only is it possible that these are meaningless, but in a superintelligence the metaphysical implications should really take second-place to the not-getting-turned-into-paperclips implications.
He proposes a moral Turing Test, where we call a machine moral if it can answer moral questions indistinguishably from a human. But Clippy would also pass this test, if a consequence of passing was that the humans lowered their guard/let him out of the box. In fact, every unfriendly superintelligence with a basic knowledge of human culture and a motive would pass.
Utilitarianism considered difficult to implement because it’s computationally impossible to predict all consequences. Given that any AI worth its salt would have a module for predicting the consequences of its actions anyway, and that the potential danger of the AI is directly related to how good this module is, that seems like a non-problem. It wouldn’t be perfect, but it would do better than humans, at least.
Deontology, same problem as the last one. Virtue ethics seems problematic depending on the AI’s motivation—if it were motivated to turn the universe to paperclips, would it be completely honest about it, kill humans quickly and painlessly and with a flowery apology, and declare itself to have exercised the virtues of honesty, compassion, and politeness? Evolution would give us something at best as moral as humans and probably worse—see the Sequence post about the tanks in cloudy weather.
Still not impressed.
Mechanized Deontic Logic is pretty okay, despite the dread I had because of the name. I’m no good at formal systems, but as far as I can understand it looks like a logic for proving some simple results about morality: the example they give is “If you should see to it that X, then you should see to it that you should see to it that X.”
I can’t immediately see a way this would destroy the human race, but that’s only because it’s nowhere near the point where it involves what humans actually think of as “morality” yet.
Utilibot Project is about creating a personal care robot that will avoid accidentally killing its owner by representing the goal of “owner health” in a utilitarian way. It sounds like it might work for a robot with a very small list of potential actions (like “turn on stove” and “administer glucose”) and a very specific list of owner health indicators (like “hunger” and “blood glucose level”), but it’s not very relevant to the broader Friendly AI program.
Having read as many papers as I have time to before dinner, my provisional conclusion is that Vladimir Nesov hit the nail on the head
I don’t disagree with much of anything you’ve said here, by the way.
Remember that I’m writing a book that, for most of its length, will systematically explain why the proposed solutions in the literature won’t work.
The problem is that SIAI is not even engaging in that discussion. Where is the detailed explanation of why these proposed solutions won’t work? I don’t get the impression someone like Yudkowsky has even read these papers, let alone explained why the proposed solutions won’t work. SIAI is just talking a different language than the professional machine ethics community is.
Most of the literature on machine ethics is not that useful, but that’s true of almost any subject. The point of a literature hunt is to find the gems here and there that genuinely contribute to the important project of Friendly AI. Another points is to interact with the existing literature and explain to people why it’s not going to be that easy.
My sentiment about the role of engaging existing literature on machine ethics is analogous to what you describe in a recent post on your blog. Particularly this:
You either push the boundaries, or fight the good fight. And the good fight is best fought by writing textbooks and opening schools, not by public debates with distinguished shamans. But it’s not entirely fair, since some of machine ethics addresses a reasonable problem of making good-behaving robots, which just happens to have the same surface feature of considering moral valuation of decisions of artificial reasoners, but on closer inspection is mostly unrelated to the problem of FAI.
Sure. One of the hopes of my book is, as stated earlier, to bring people up to where Eliezer Yudkowsky was circa 2004.
Also, I worry that something is being overlooked by the LW / SIAI community because the response to suggestions in the literature has been so quick and dirty. I’m on the prowl for something that’s been missed because nobody has done a thorough literature search and detailed rebuttal. We’ll see what turns up.
BTW, I so identify with this quote:
In fact, I’ve said the same thing myself, in slightly different words.
Every sufficiently smart person who thinks about Kantian ethics comes up with this objection. I don’t believe it’s possible to defend against it entirely. However...
That may be what Kant actually says (does he?) but if he does then I think he’s wrong about his own theory. As I understand it, what you’re supposed to do is look at the bit of reasoning which is actually causing you to want to do X and see whether that generalizes, not cast around for a bit of reasoning which would (or in this case, would not) generalize, and then pretend to be basing your action on that.
In the example you mention, you should only generalize to “everyone will deceive everyone all the time” if what you’re considering doing is deceiving this person simply because he’s a person. If you want to deceive him because of his intention to commit murder, and would not want to otherwise, then the thing you generalize must have this feature.
Similarly, I might try to justify lying to someone this morning on the basis that it generalizes to “I, who am AlephNeil, always lies on the morning of 13th day of March 2011 if it is to my advantage” which is both consistent and advantageous (to me). But really I would be lying purely because it’s to my advantage—the date and time, and the fact that I am AlephNeil, don’t enter into the computation.
For Googleability, I’ll not that this objection is called the problem of maxim specification.
That currently has no Google results besides your post.
Yes, sorry. “Maxim specification” won’t give you much, but variations on that will. People don’t usually write “the problem of maxim specification” but instead things like ”...specifying the maxim...” or “the maxim… specified...” and so on. It in general isn’t easily Googled like “is-ought gap” is.
But here is one use.