Accurately modeling the world entails making accurate predictions about it. An expected paperclip maximizer fully grasps the functioning of your brain and mind to the extent that this is relevant to producing paperclips; if it needs to know the secrets of your heart in order to persuade you, it knows them. If it needs to know why you write papers about the hard problem of conscious experience, it knows that too. The paperclip maximizer is not moved by grasping your first-person perspective, because although it has accurate knowledge of this fact, that is not the sort of fact that figures in its terminal values. The fact that it perfectly grasps the compellingness-to-Jane, even the reason why Jane finds certain facts to be inherently and mysteriously compelling, doesn’t compel it. It’s not a future paperclip.
I know exactly why the villain in Methods of Rationality wants to kill people. I could even write the villain writing about the ineffable compellingness of the urge to rid the world of certain people if I put that villain in a situation where he or she would actually read about the hard problem of conscious experience, and yet I am not likewise compelled. I don’t have the perfect understanding of any particular real-world psychopath that I do of my fictional killer, but if I did know why they were killers, and of course brought to bear my standard knowledge of why humans write what they do about consciousness, I still wouldn’t be compelled by even the limits of a full grasp of their reasons, their justifications, their inner experience, and the reasons they think their inner experience is ineffably compelling.
For sure, accurately modelling the world entails making accurate predictions about it. These predictions include the third-person and first-person facts [what-it’s-like-to-be-a-bat, etc]. What is far from clear—to me at any rate—is whether super-rational agents can share perfect knowledge of both the first-person and third-person facts and still disagree. This would be like two mirror-touch synaesthetes having a fist fight.
Thus I’m still struggling with, “The paperclip maximizer is not moved by grasping your first-person perspective.” From this, I gather we’re talking about a full-spectrum superintelligence well acquainted with both the formal and subjective properties of mind, insofar as they can be cleanly distinguished. Granted your example Eliezer, yes, if contemplating a cosmic paperclip-deficit causes the AGI superhuman anguish, then the hypothetical superintelligence is entitled to prioritise its super-anguish over mere human despair—despite the intuitively arbitrary value of paperclips. On this scenario, the paperclip-maximising superintelligence can represent human distress even more faithfully than a mirror-touch synaesthete; but its own hedonic range surpasses that of mere humans—and therefore takes precedence.
However, to be analogous to burger-choosing Jane in Luke’s FAQ, we’d need to pick an example of a superintelligence who wholly understands both a cow’s strong preference not to have her throat slit and Jane’s comparatively weaker preference to eat her flesh in a burger. Unlike partially mind-blind Jane, the superintelligence can accurately represent and impartially weigh all relevant first-person perspectives. So the question is whether this richer perspective-taking capacity is consistent with the superintelligence discounting the stronger preference not to be harmed? Or would such human-like bias be irrational? In my view, this is not just a question of altruism but cognitive competence.
[Of course, given we’re taking about posthuman superintelligence, the honest answer is boring and lame: I don’t know. But if physicists want to know the “mind of God,” we should want to know God’s utility function, so to speak.]
What is far from clear—to me at any rate—is whether super-rational agents can share perfect knowledge of both the first-person and third-person facts and still disagree. This would be like two mirror-touch synaesthetes having a fist fight.
Why not? Actions are a product of priors, perceptions and motives. Sharing perceptions isn’t sharing motives—and even with identical motives, agents could still fight—if they were motivated to do so.
[Of course, given we’re taking about posthuman superintelligence, the honest answer is boring and lame: I don’t know. But if physicists want to know the “mind of God,” we should want to know God’s utility function, so to speak.]
God’s Utility Function according to Dawkins and Tyler.
Accurately modeling the world entails making accurate predictions about it. An expected paperclip maximizer fully grasps the functioning of your brain and mind to the extent that this is relevant to producing paperclips; if it needs to know the secrets of your heart in order to persuade you, it knows them. If it needs to know why you write papers about the hard problem of conscious experience, it knows that too. The paperclip maximizer is not moved by grasping your first-person perspective, because although it has accurate knowledge of this fact, that is not the sort of fact that figures in its terminal values. The fact that it perfectly grasps the compellingness-to-Jane, even the reason why Jane finds certain facts to be inherently and mysteriously compelling, doesn’t compel it. It’s not a future paperclip.
I know exactly why the villain in Methods of Rationality wants to kill people. I could even write the villain writing about the ineffable compellingness of the urge to rid the world of certain people if I put that villain in a situation where he or she would actually read about the hard problem of conscious experience, and yet I am not likewise compelled. I don’t have the perfect understanding of any particular real-world psychopath that I do of my fictional killer, but if I did know why they were killers, and of course brought to bear my standard knowledge of why humans write what they do about consciousness, I still wouldn’t be compelled by even the limits of a full grasp of their reasons, their justifications, their inner experience, and the reasons they think their inner experience is ineffably compelling.
David, have you already read all this stuff on LW, in which case I shouldn’t bother recapitulating it? http://lesswrong.com/lw/sy/sorting_pebbles_into_correct_heaps/, http://lesswrong.com/lw/ta/invisible_frameworks/, and so on?
For sure, accurately modelling the world entails making accurate predictions about it. These predictions include the third-person and first-person facts [what-it’s-like-to-be-a-bat, etc]. What is far from clear—to me at any rate—is whether super-rational agents can share perfect knowledge of both the first-person and third-person facts and still disagree. This would be like two mirror-touch synaesthetes having a fist fight.
Thus I’m still struggling with, “The paperclip maximizer is not moved by grasping your first-person perspective.” From this, I gather we’re talking about a full-spectrum superintelligence well acquainted with both the formal and subjective properties of mind, insofar as they can be cleanly distinguished. Granted your example Eliezer, yes, if contemplating a cosmic paperclip-deficit causes the AGI superhuman anguish, then the hypothetical superintelligence is entitled to prioritise its super-anguish over mere human despair—despite the intuitively arbitrary value of paperclips. On this scenario, the paperclip-maximising superintelligence can represent human distress even more faithfully than a mirror-touch synaesthete; but its own hedonic range surpasses that of mere humans—and therefore takes precedence.
However, to be analogous to burger-choosing Jane in Luke’s FAQ, we’d need to pick an example of a superintelligence who wholly understands both a cow’s strong preference not to have her throat slit and Jane’s comparatively weaker preference to eat her flesh in a burger. Unlike partially mind-blind Jane, the superintelligence can accurately represent and impartially weigh all relevant first-person perspectives. So the question is whether this richer perspective-taking capacity is consistent with the superintelligence discounting the stronger preference not to be harmed? Or would such human-like bias be irrational? In my view, this is not just a question of altruism but cognitive competence.
[Of course, given we’re taking about posthuman superintelligence, the honest answer is boring and lame: I don’t know. But if physicists want to know the “mind of God,” we should want to know God’s utility function, so to speak.]
Why not? Actions are a product of priors, perceptions and motives. Sharing perceptions isn’t sharing motives—and even with identical motives, agents could still fight—if they were motivated to do so.
God’s Utility Function according to Dawkins and Tyler.