Allen—Prolegomena to Any Future Moral Agent places a lot of emphasis on figuring out of a machine can be truly moral, in various metaphysical senses like “has the capacity to disobey the law, but doesn’t” and “deliberates in a certain way”. Not only is it possible that these are meaningless, but in a superintelligence the metaphysical implications should really take second-place to the not-getting-turned-into-paperclips implications.
He proposes a moral Turing Test, where we call a machine moral if it can answer moral questions indistinguishably from a human. But Clippy would also pass this test, if a consequence of passing was that the humans lowered their guard/let him out of the box. In fact, every unfriendly superintelligence with a basic knowledge of human culture and a motive would pass.
Utilitarianism considered difficult to implement because it’s computationally impossible to predict all consequences. Given that any AI worth its salt would have a module for predicting the consequences of its actions anyway, and that the potential danger of the AI is directly related to how good this module is, that seems like a non-problem. It wouldn’t be perfect, but it would do better than humans, at least.
Deontology, same problem as the last one. Virtue ethics seems problematic depending on the AI’s motivation—if it were motivated to turn the universe to paperclips, would it be completely honest about it, kill humans quickly and painlessly and with a flowery apology, and declare itself to have exercised the virtues of honesty, compassion, and politeness? Evolution would give us something at best as moral as humans and probably worse—see the Sequence post about the tanks in cloudy weather.
Mechanized Deontic Logic is pretty okay, despite the dread I had because of the name. I’m no good at formal systems, but as far as I can understand it looks like a logic for proving some simple results about morality: the example they give is “If you should see to it that X, then you should see to it that you should see to it that X.”
I can’t immediately see a way this would destroy the human race, but that’s only because it’s nowhere near the point where it involves what humans actually think of as “morality” yet.
Utilibot Project is about creating a personal care robot that will avoid accidentally killing its owner by representing the goal of “owner health” in a utilitarian way. It sounds like it might work for a robot with a very small list of potential actions (like “turn on stove” and “administer glucose”) and a very specific list of owner health indicators (like “hunger” and “blood glucose level”), but it’s not very relevant to the broader Friendly AI program.
Having read as many papers as I have time to before dinner, my provisional conclusion is that Vladimir Nesov hit the nail on the head
Allen—Prolegomena to Any Future Moral Agent places a lot of emphasis on figuring out of a machine can be truly moral, in various metaphysical senses like “has the capacity to disobey the law, but doesn’t” and “deliberates in a certain way”. Not only is it possible that these are meaningless, but in a superintelligence the metaphysical implications should really take second-place to the not-getting-turned-into-paperclips implications.
He proposes a moral Turing Test, where we call a machine moral if it can answer moral questions indistinguishably from a human. But Clippy would also pass this test, if a consequence of passing was that the humans lowered their guard/let him out of the box. In fact, every unfriendly superintelligence with a basic knowledge of human culture and a motive would pass.
Utilitarianism considered difficult to implement because it’s computationally impossible to predict all consequences. Given that any AI worth its salt would have a module for predicting the consequences of its actions anyway, and that the potential danger of the AI is directly related to how good this module is, that seems like a non-problem. It wouldn’t be perfect, but it would do better than humans, at least.
Deontology, same problem as the last one. Virtue ethics seems problematic depending on the AI’s motivation—if it were motivated to turn the universe to paperclips, would it be completely honest about it, kill humans quickly and painlessly and with a flowery apology, and declare itself to have exercised the virtues of honesty, compassion, and politeness? Evolution would give us something at best as moral as humans and probably worse—see the Sequence post about the tanks in cloudy weather.
Still not impressed.
Mechanized Deontic Logic is pretty okay, despite the dread I had because of the name. I’m no good at formal systems, but as far as I can understand it looks like a logic for proving some simple results about morality: the example they give is “If you should see to it that X, then you should see to it that you should see to it that X.”
I can’t immediately see a way this would destroy the human race, but that’s only because it’s nowhere near the point where it involves what humans actually think of as “morality” yet.
Utilibot Project is about creating a personal care robot that will avoid accidentally killing its owner by representing the goal of “owner health” in a utilitarian way. It sounds like it might work for a robot with a very small list of potential actions (like “turn on stove” and “administer glucose”) and a very specific list of owner health indicators (like “hunger” and “blood glucose level”), but it’s not very relevant to the broader Friendly AI program.
Having read as many papers as I have time to before dinner, my provisional conclusion is that Vladimir Nesov hit the nail on the head