Isn’t the giant elephant in this room the whole issue of moral realism? I’m a moral cognitivist but not a moral realist. I have laid out what it means for my moral beliefs to be true—the combination of physical fact and logical function against which my moral judgments are being compared. This gives my moral beliefs truth value. And having laid this out, it becomes perfectly obvious that it’s possible to build powerful optimizers who are not motivated by what I call moral truths; they are maximizing something other than morality, like paperclips. They will also meta-maximize something other than morality if you ask them to choose between possible utility functions, and will quite predictably go on picking the utility function “maximize paperclips”. Just as I correctly know it is better to be moral than to be paperclippy, they accurately evaluate that it is more paperclippy to maximize paperclips than morality. They know damn well that they’re making you unhappy and violating your strong preferences by doing so. It’s just that all this talk about the preferences that feel so intrinsically motivating to you, is itself of no interest to them because you haven’t gotten to the all-important parts about paperclips yet.
The main thing I’m not clear on in this discussion is to what extent David Pearce is being innocently mysterian vs. motivatedly mysterian. To be confused about how your happiness seems so intrinsically motivating, and innocently if naively wonder if perhaps it must be intrinsically motivating to other minds as well, is one thing. It is another thing to prefer this conclusion and so to feel a bit uncurious about anyone’s detailed explanation of how it doesn’t work like that. It is even less innocent to refuse outright to listen when somebody else tries to explain. And then strangest of all is to state powerfully and definitely that every bit of happiness must be motivating to all other minds, even though you can’t lay out step by step how the decision procedure would work. This requires overrunning your own claims to knowledge in a fundamental sense—mistaking your confusion about something for the ability to make definite claims about it. Now this of course is a very common and understandable sin, and the fact that David Pearce is crusading for happiness for all life forms should certainly count into our evaluation of his net virtue (it would certainly make me willing to drink a Pepsi with him). But I’m also not clear about where to go from here, or whether this conversation is accomplishing anything useful.
In particular it seems like David Pearce is not leveling any sort of argument we could possibly find persuasive—it’s not written so as to convince anyone who isn’t already a moral realist, or addressing the basic roots of disagreement—and that’s not a good sign. And short of rewriting the entire metaethics sequence in these comments I don’t know how I could convince him, either.
I’m a moral cognitivist but not a moral realist. I have laid out what it means for my moral beliefs to be true
Even among philosophers, “moral realism” is a term wont to confuse. I’d be wary about relying on it to chunk your philosophy. For instance, the simplest and least problematic definition of ‘moral realism’ is probably the doctrine...
minimal moral realism: cognitivism (moral assertions like ‘murder is bad’ have truth-conditions, express real beliefs, predicate properties of objects, etc.) + success theory (some moral assertions are true; i.e., rejection of error theory).
This seems to be the definition endorsed on SEP’s Moral Realism article. But it can’t be what you have in mind, since you accept cognitivism and reject error theory. So perhaps you mean to reject a slightly stronger claim (to coin a term):
factual moral realism: MMR + moral assertions are not true or false purely by stipulation (or ‘by definition’); rather, their truth-conditions at least partly involve empirical, worldly contingencies.
But here, again, it’s hard to find room to reject moral realism. Perhaps some moral statements, like ‘suffering is bad,’ are true only by stipulation; but if ‘punching people in the face causes suffering’ is not also true by stipulation, then the conclusion ‘punching people in the face is bad’ will not be purely stipulative. Similarly, ‘The Earth’s equatorial circumference is ~40,075.017 km’ is not true just by definition, even though we need somewhat arbitrary definitions and measurement standards to assert it. And rejecting the next doesn’t sound right either:
correspondence moral realism: FMR + moral assertions are not true or false purely because of subjects’ beliefs about the moral truth. For example, the truth-condition for ‘eating babies is bad’ are not ‘Eliezer Yudkowsky thinks eating babies is bad’, nor even ‘everyone thinks eating babies is bad’. Our opinions do play a role in what’s right and wrong, but they don’t do all the work.
So perhaps one of the following is closer to what you mean to deny:
moral transexperientialism: Moral facts are nontrivially sensitive to differences wholly independent of, and having no possible impact on, conscious experience. The goodness and badness of outcomes is not purely a matter of (i.e., is not fully fixed by) their consequences for sentients. This seems kin to Mark Johnston’s criterion of ‘response-dependence’. Something in this vicinity seems to be an important aspect of at least straw moral realism, but it’s not playing a role here.
moral unconditionalism: There is a nontrivial sense in which a single specific foundation for (e.g., axiomatization of) the moral truths is the right one—‘objectively’, and not just according to itself or any persons or arbitrarily selected authority—and all or most of the alternatives aren’t the right one. (We might compare this to the view that there is only one right set of mathematical truths, and this rightness is not trivial or circular. Opposing views include mathematical conventionalism and ‘if-thenism’.)
moral non-naturalism: Moral (or, more broadly, normative) facts are objective and worldly in an even stronger sense, and are special, sui generis, metaphysically distinct from the prosaic world described by physics.
Perhaps we should further divide this view into ‘moral platonism’, which reduces morality to logic/math but then treats logic/math as a transcendent, eternal Realm of Thingies and Stuff; v. ‘moral supernaturalism’, which identifies morality more with souls and ghosts and magic and gods than with logical thingies. If this distinction isn’t clear yet, perhaps we could stipulate that platonic thingies are acausal, whereas spooky supernatural moral thingies can play a role in the causal order. I think this moral supernaturalism, in the end, is what you chiefly have in mind when you criticize ‘moral realism’, since the idea that there are magical, irreducible Moral-in-Themselves Entities that can exert causal influences on us in their own right seems to be a prerequisite for the doctrine that any possible agent would be compelled (presumably by these special, magically moral objects or properties) to instantiate certain moral intuitions. Christianity and karma are good examples of moral supernaturalisms, since they treat certain moral or quasi-moral rules and properties as though they were irreducible physical laws or invisible sorcerors.
At the same time, it’s not clear that davidpearce was endorsing anything in the vicinity of moral supernaturalism. (Though I suppose a vestigial form of this assumption might still then be playing a role in the background. It’s a good thing it’s nearly epistemic spring cleaning time.) His view seems somewhere in the vicinity of unconditionalism—if he thinks anyone who disregards the interests of cows is being unconditionally epistemically irrational, and not just ‘epistemically irrational given that all humans naturally care about suffering in an agent-neutral way’. The onus is then on him and pragmatist to explain on what non-normative basis we could ever be justified in accepting a normative standard.
I’m not sure this taxonomy is helpful from David Pearce’s perspective. David Pearce’s position is that there are universally motivating facts—facts whose truth, once known, is compelling for every possible sort of mind. This reifies his observation that the desire for happiness feels really, actually compelling to him and this compellingness seems innate to qualia, so anyone who truly knew the facts about the quale would also know that compelling sense and act accordingly. This may not correspond exactly to what SEP says under moral realism and let me know if there’s a standard term, but realism seems to describe the Pearcean (or Eliezer circa 1996) feeling about the subject—that happiness is really intrinsically preferable, that this is truth and not opinion.
From my perspective this is a confusion which I claim to fully and exactly understand, which licenses my definite rejection of the hypothesis. (The dawning of this understanding did in fact cause my definite rejection of the hypothesis in 2003.) The inherent-desirableness of happiness is your mind reifying the internal data describing its motivation to do something, so if you try to use your empathy to imagine another mind fully understanding this mysterious opaque data (quale) whose content is actually your internal code for “compelled to do that”, you imagine the mind being compelled to do that. You’ll be agnostic about whether or not this seems supernatural because you don’t actually know where the mysterious compellingness comes from. From my perspective, this is “supernatural” because your story inherently revolves around mental facts you’re not allowed to reduce to nonmental facts—any reduction to nonmental facts will let us construct a mind that doesn’t care once the qualia aren’t mysteriously irreducibly compelling anymore. But this is a judgment I pass from reductionist knowledge—from a Pearcean perspective, there’s just a mysteriously compelling quality about happiness, and to know this quale seems identical with being compelled by it; that’s all your story. Well, that plus the fact that anyone who says that some minds might not be compelled by happiness, seems to be asserting that happiness is objectively unimportant or that its rightness is a matter of mere opinion, which is obviously intuitively false. (As a moral cognitivist, of course, I agree that happiness is objectively important, I just know that “important” is a judgment about a certain logical truth that other minds do not find compelling. Since in fact nothing can be intrinsically compelling to all minds, I have decided not to be an error theorist as I would have to be if I took this impossible quality of intrinsic compellingness to be an unavoidable requirement of things being good, right, valuable, or important in the intuitive emotional sense. My old intuitive confusion about qualia doesn’t seem worth respecting so much that I must now be indifferent between a universe of happiness vs. a universe of paperclips. The former is still better, it’s just that now I know what “better” means.)
But if the very definitions of the debate are not automatically to judge in my favor, then we should have a term for what Pearce believes that reflects what Pearce thinks to be the case. “Moral realism” seems like a good term for “the existence of facts the knowledge of which is intrinsically and universally compelling, such as happiness and subjective desire”. It may not describe what a moral cognitivist thinks is really going on, but “realism” seems to describe the feeling as it would occur to Pearce or Eliezer-1996. If not this term, then what? “Moral non-naturalism” is what a moral cognitivist says to deconstruct your theory—the self-evident intrinsic compellingness of happiness quales doesn’t feel like asserting “non-naturalism” to David Pearce, although you could have a non-natural theory about how this mysterious observation was generated.
This reifies his observation that the desire for happiness feels really, actually compelling to him and this compellingness seems innate to qualia
I’m not sure he’s wrong in saying that feeling the qualia of a sentient, as opposed to modeling those qualia in an affective black box without letting the feels ‘leak’ into the rest of your cognitionspace, requires some motivational effect. There are two basic questions here:
First, the Affect-Effect Question: To what extent are the character of subjective experiences like joy and suffering intrinsic or internal to the state, as opposed to constitutively bound up in functional relations that include behavioral impetuses? (For example, to what extent is it possible to undergo the phenomenology of anguish without thereby wanting the anguish to stop? And to what extent is it possible to want something to stop without being behaviorally moved, to the extent one is able and to the extent one’s other desires are inadequate overriders, to stop it?) Compare David Lewis’ ‘Mad Pain’, pain that has the same experiential character as ordinary pain but none of its functional relations (or at least not the large-scale ones). Some people think a state of that sort wouldn’t qualify as ‘pain’ at all, and this sort of relationalism lends some credibility to pearce’s view.
Second, the Third-Person Qualia Question: To what extent is phenomenological modeling (modeling a state in such a way that you, or a proper part of you, experiences that state) required for complete factual knowledge of real-world agents? One could grant that qualia are real (and really play an important role in various worldly facts, albeit perhaps physical ones) and are moreover unavoidably motivating (if you aren’t motivated to avoid something, then you don’t really fear it), but deny that an epistemically rational agent is required to phenomenologically model qualia. Perhaps there is some way to represent the same mental states without thereby experiencing them, to fully capture the worldly facts about cows without simulating their experiences oneself. If so, then knowing everything about cows would not require one to be motivated (even in some tiny powerless portion of oneself) to fulfill the values of cows. (Incidentally, it’s also possible in principle to grant the (admittedly spooky) claim that mental states are irreducible and indispensable, without thinking that you need to be in pain in order to fully and accurately model another agent’s pain; perhaps it’s possible to accurately model one phenomenology using a different phenomenology.)
And again, at this point I don’t think any of these positions need to endorse supernaturalism, i.e., the idea that special moral facts are intervening in the causal order to force cow-simulators, against their will, to try to help cows. (Perhaps there’s something spooky and supernatural about causally efficacious qualia, but for the moment I’ll continue assuming they’re physical states—mayhap physical states construed in a specific way.) All that’s being disputed, I think, is to what extent a programmer of a mind-modeler could isolate the phenomenology of states from their motivational or behavioral roles, and to what extent this programmer could model brains at all without modeling their first-person character.
As a limiting case: Assuming there are facts about conscious beings, could an agent simulate everything about those beings without ever becoming conscious itself? (And if it did become conscious, would it only be conscious inasmuch as it had tiny copies of conscious beings inside itself? Or would it also need to become conscious in a more global way, in order to access and manipulate useful information about its conscious subsystems?)
Incidentally, these engineering questions are in principle distinct both from the topic of causally efficacious irreducible Morality Stuff (what I called moral supernaturalism), and from the topic of whether moral claims are objectively right, that, causally efficacious or not, moral facts have a sort of ‘glow of One True Oughtness’ (what I called moral unconditionalism, though some might call it ‘moral absolutism’), two claims the conjunction of which it sounds like you’ve been labeling ‘moral realism’, in deference to your erstwhile meta-ethic. Whether we can motivation-externally simulate experiential states with perfect fidelity and epistemic availability-to-the-simulating-system-at-large is a question for philosophy of mind and computer science, not for meta-ethics. (And perhaps davidpearce’s actual view is closer to what you call moral realism than to my steelman. Regardless, I’m more interested in interrogating the steelman.)
“Moral non-naturalism” is what a moral cognitivist says to deconstruct your theory—the self-evident intrinsic compellingness of happiness quales doesn’t feel like asserting “non-naturalism” to David Pearce, although you could have a non-natural theory about how this mysterious observation was generated.
So terms like ‘non-naturalism’ or ‘supernaturalism’ are too theory-laden and sophisticated for what you’re imputing to Pearce (and ex-EY), which is really more of a hunch or thought-terminating-clichéplex. In that case, perhaps ‘naïve (moral) realism’ or ‘naïve absolutism’ is the clearest term you could use. (Actually, I like ‘magical absolutism’. It has a nice ring to it, and ‘magical’ gets at the proto-supernaturalism while ‘absolutism’ gets at the proto-unconditionalism. Mm, words.) Philosophers love calling views naïve, and the term doesn’t have a prior meaning like ‘moral realism’, so you wouldn’t have to deal with people griping about your choice of jargon.
This would also probably be a smart rhetorical move, since a lot of people don’t see a clear distinction between cognitivism and realism and might be turned off by your ideas qua an anti-realism theory even if they’d have loved them qua a realist theory. ‘Tis part of why I tried to taboo the term as ‘minimal moral realism’ etc., rather than endorsing just one of the definitions on offer.
Eliezer, you remark, “The inherent-desirableness of happiness is your mind reifying the internal data describing its motivation to do something,” Would you propose that a mind lacking in motivation couldn’t feel blissfully happy?
Mainlining heroin (I am told) induces pure bliss without desire—shades of Buddhist nirvana? Pure bliss without motivation can be induced by knocking out the dopamine system and directly administering mu opioid agonists to our twin “hedonic hotspots” in the ventral pallidum and rostral shell of the nucleus accumbens. Conversely, amplifying mesolimbic dopamine function while disabling the mu opioid pathways can induce desire without pleasure.
[I’m still mulling over some of your other points.]
Would you propose that a mind lacking in motivation couldn’t feel blissfully happy?
Here we’re reaching the borders of my ability to be confident about my replies, but the two answers which occur to me are:
1) It’s not positive reinforcement unless feeling it makes you experience at least some preference to do it again—otherwise in what sense are the neural networks getting their plus? Heroin may not induce desire while you’re on it, but the thought of the bliss induces desire to take heroin again, once you’re off the heroin.
2) The superBuddhist no longer capable of experiencing desire or choice, even desire or choice over which thoughts to think, also becomes incapable of experiencing happiness (perhaps its neural networks aren’t even being reinforced to make certain thoughts more likely to be repeated). However, you, who are still capable of desire and who still have positively reinforcing thoughts, might be tricked into considering the superBuddhist’s experience to be analogous to your own happiness and therefore acquire a desire to be a superBuddhist as a result of imagining one—mostly on account of having been told that it was representing a similar quale on account of representing a similar internal code for an experience, without realizing that the rest of the superBuddhist’s mind now lacks the context your own mind brings to interpreting that internal coding into pleasurable positive reinforcement that would make you desire to repeat that experiential state.
It’s a reasonably good description, though wanting and liking seem to be neurologically separate, such that liking does not necessarily reflect a motivation, nor vice-versa (see: Not for the sake of pleasure alone. Think the pleasurable but non-motivating effect of opioids such as heroin. Even in cases in which wanting and liking occur together, this does not necessarily invalidate the liking aspect as purely wanting.
Liking and disliking, good and bad feelings as qualia, especially in very intense amounts, seem to be intrinsically so to those who are immediately feeling them. Reasoning could extend and generalize this.
Heh. Yes, I remember reading the section on noradrenergic vs. dopaminergic motivation in Pearce’s BLTC as a 16-year-old. I used to be a Pearcean, ya know, hence the Superhappies. But that distinction didn’t seem very relevant to the metaethical debate at hand.
It’s possible (I hope) to believe future life can be based on information-sensitive gradients of (super)intelligent well-being without remotely endorsing any of my idiosyncratic views on consciousness, intelligence or anything else. That’s the beauty of hedonic recalibration. In principle at least, hedonic recalibration can enrich your quality of life and yet leave most if not all of your existing values and preference architecture intact .- including the belief that there are more important things in life than happiness.
Agreed. The conflict between the Superhappies and the Lord Pilot had nothing to do with different metaethical theories.
Also, we totally agree on wanting future civilization to contain very smart beings who are pretty happy most of the time. We just seem to disagree about whether it’s important that they be super duper happy all of the time. The main relevance metaethics has to this is that once I understood there was no built-in axis of the universe to tell me that I as a good person ought to scale my intelligence as fast as possible so that I could be as happy as possible as soon as possible, I decided that I didn’t really want to be super happy all the time, the way I’d always sort of accepted as a dutiful obligation while growing up reading David Pearce. Yes, it might be possible to do this in a way that would leave as much as possible of me intact, but why do it at all if that’s not what I want?
There’s also the important policy-relevant question of whether arbitrarily constructed AIs will make us super happy all the time or turn us into paperclips.
Huh, when I read the story, my impression was that it was Lord Pilot not understanding that it was a case of “Once you go black, you can’t go back”. Specifically, once you experience being superhappy, your previous metaethics stops making sense and you understand the imperative of relieving everyone of the unimaginable suffering of not being superhappy.
I thought it was relevant to this, if not, then what was meant by motivation?
The inherent-desirableness of happiness is your mind reifying the internal data describing its motivation to do something
Consciousness is that of which we can be most certain of, and I would rather think that we are living in a virtual world under an universe with other, alien physical laws, than that consciousness itself is not real. If it is not reducible to nonmental facts, then nonmental facts don’t seem to account for everything there is of relevant.
From my perspective, this is “supernatural” because your story inherently revolves around mental facts you’re not allowed to reduce to nonmental facts—any reduction to nonmental facts will let us construct a mind that doesn’t care once the qualia aren’t mysteriously irreducibly compelling anymore.
I suggest that to this array of terms, we should add moral indexicalism to designate Eliezer’s position, which by the above definition would be a special form of realism. As far as I can tell, he basically says that moral terms are hidden indexicals in Putnam’s sense.
Watch out—the word “sentient” has at least two different common meanings, one of which includes cattle and the other doesn’t. EY usually uses it with the narrower meaning (for which a less ambiguous synonym is “sapient”), whereas David Pearce seems to be using it with the broader meaning.
Ah. By ‘sentient’ I mean something that feels, by ‘sapient’ something that thinks.
To be more fine-grained about it, I’d define functional sentience as having affective (and perhaps perceptual) cognitive states (in a sense broad enough that it’s obvious cows have them, and equally obvious tulips don’t), and phenomenal sentience as having a first-person ‘point of view’ (though I’m an eliminativist about phenomenal consciousness, so my overtures to it above can be treated as a sort of extended thought experiment).
Similarly, we might distinguish a low-level kind of sapience (the ability to form and manipulate mental representations of situations, generate expectations and generalizations, and update based on new information) from a higher-level kind closer to human sapience (perhaps involving abstract and/or hyper-productive representations à la language).
Based on those definitions, I’d say it’s obvious cows are functionally sentient and have low-level sapience, extremely unlikely they have high-level sapience, and unclear whether they have phenomenal sentience.
Are you using the term “sentience” in the standard dictionary sense [“Sentience is the ability to feel, perceive, or be conscious, or to experience subjectivity”: http://en.wikipedia.org/wiki/Sentience ] Or are you using the term in some revisionary sense?
Neither. I’m claiming that there’s a monstrous ambiguity in all of those definitions, and I’m tabooing ‘sentience’ and replacing it with two clearer terms. These terms may still be problematic, but at least their problematicity is less ambiguous.
I distinguished functional sapience from phenomenal sapience. Functional sapience means having all the standard behaviors and world-tracking states associated with joy, hunger, itchiness, etc. It’s defined in third-person terms. Phenomenal sapience means having a subjective vantage point on the world; being sapient in that sense means that it feels some way (in a very vague sense) to be such a being, whereas it wouldn’t ‘feel’ any way at all to be, for example, a rock.
To see the distinction, imagine that we built a robot, or encountered an alien species, that could simulate the behaviors of sapients in a skillful and dynamic way, without actually having any experiences of its own. Would such a being necessarily be sapient? Does consistently crying out and withdrawing from some stimulus require that you actually be in pain, or could you be a mindless automaton? My answer is ‘yes, in the functional sense; and maybe, in the phenomenal sense’. The phenomenal sense is a bit mysterious, in large part because the intuitive idea of it arises from first-person introspection and not from third-person modeling or description, hence it’s difficult (perhaps impossible!) to find definitive third-person indicators of this first-person class of properties.
At least if we discount radical philosophical scepticism about other minds, cows and other nonhuman vertebrates undergo phenomenal pain, anxiety, sadness, happiness and a whole bunch of phenomenal sensory experiences.
‘Radical philosophical scepticism about other minds’ I take to entail that nothing has a mind except me. In other words, you’re claiming that the only way to doubt that there’s something it’s subjectively like to be a cow, is to also doubt that there’s something it’s subjectively like to be any human other than myself.
I find this spectacularly implausible. Again, I’m an eliminativist, but I’ll put myself in a phenomenal realist’s shoes. The neural architecture shared in common by humans is vast in comparison to the architecture shared in common between humans and cows. And phenomenal consciousness is extremely poorly understood, so we have no idea what evolutionary function it might serve or what mechanisms might need to be in place before it arises in any recognizable form. So to that extent we must also be extremely uncertain about (a) at what point(s) first-person subjectivity arises phylogenetically, and (b) at what point first-person subjectivity arises developmentally.
This phylogeny-development analogy is very important. If I doubt that cows are phenomenally conscious, I might also doubt that I myself was conscious when I was a baby, or relatively late into my fetushood. That’s perhaps a little surprising, but it’s hardly a devastating ‘radical scepticism’; it’s a perfectly tenable hypothesis. By contrast, to doubt that my friends and family members are phenomenally conscious would be like doubting that I myself was phenomenally conscious when I was 5 years old, or when I was 20, or even last month. (Perhaps my phenomenal memories are confabulations.) Equating these two forms of skepticism will require a pretty devastating argument! What do you have in mind?
Eliezer, in my view, we don’t need to assume meta-ethical realism to recognise that it’s irrational—both epistemically irrational and instrumentally irrational—arbitrarily to privilege a weak preference over a strong preference. To be sure, millions of years of selection pressure means that the weak preference is often more readily accessible. In the here-and-now, weak-minded Jane wants a burger asap. But it’s irrational to confuse an epistemological limitation with a deep metaphysical truth. A precondition of rational action is understanding the world. If Jane is scientifically literate, then she’ll internalise Nagel’s “view from nowhere” and adopt the God’s-eye-view to which natural science aspires. She’ll recognise that all first-person facts are ontologically on a par—and accordingly act to satisfy the stronger preference over the weaker. So the ideal rational agent in our canonical normative decision theory will impartially choose the action with the highest expected utility—not the action with an extremely low expected utility. At the risk of labouring the obvious, the difference in hedonic tone induced by eating a hamburger and a veggieburger is minimal. By contrast, the ghastly experience of having one’s throat slit is exceptionally unpleasant. Building anthropocentric bias into normative decision theory is no more rational than building geocentric bias into physics.
Paperclippers? Perhaps let us consider the mechanism by which paperclips can take on supreme value. We understand, in principle at least, how to make paperclips seem intrinsically supremely valuable to biological minds—more valuable than the prospect of happiness in the abstract. [“Happiness is a very pretty thing to feel, but very dry to talk about.”—Jeremy Bentham]. Experimentally, perhaps we might use imprinting (recall Lorenz and his goslings), microelectrodes implanted in the reward and punishment centres, behavioural conditioning and ideological indoctrination—and perhaps the promise of 72 virgins in the afterlife for the faithful paperclipper. The result: a fanatical paperclip fetishist! Moreover, we have created a full-spectrum paperclip -fetishist. Our human paperclipper is endowed, not merely with some formal abstract utility function involving maximising the cosmic abundance of paperclips, but also first-person “raw feels” of pure paperclippiness. Sublime!
However, can we envisage a full-spectrum paperclipper superintelligence? This is more problematic. In organic robots at least, the neurological underpinnings of paperclip evangelism lie in neural projections from our paperclipper’s limbic pathways—crudely, from his pleasure and pain centres. If he’s intelligent, and certainly if he wants to convert the world into paperclips, our human paperclipper will need to unravel the molecular basis of the so-called “encephalisation of emotion”. The encephalisation of emotion helped drive the evolution of vertebrate intelligence—and also the paperclipper’s experimentally-induced paperclip fetish / appreciation of the overriding value of paperclips. Thus if we now functionally sever these limbic projections to his neocortex, or if we co-administer him a dopamine antagonist and a mu-opioid antagonist, then the paperclip-fetishist’s neocortical representations of paperclips will cease to seem intrinsically valuable or motivating. The scales fall from our poor paperclipper’s eyes! Paperclippiness, he realises, is in the eye of the beholder. By themselves, neocortical paperclip representations are motivationally inert. Paperclip representations can seem intrinsically valuable within a paperclipper’s world-simulation only in virtue of their rewarding opioidergic projections from his limbic system—the engine of phenomenal value. The seemingly mind-independent value of paperclips, part of the very fabric of the paperclipper’s reality, has been been unmasked as derivative. Critically, an intelligent and recursively self-improving paperclipper will come to realise the parasitic nature of the relationship between his paperclip experience and hedonic innervation: he’s not a naive direct realist about perception. In short, he’ll mature and acquire an understanding of basic neuroscience.
Now contrast this case of a curable paperclip-fetish with the experience of e.g. raw phenomenal agony or pure bliss—experiences not linked to any fetishised intentional object. Agony and bliss are not dependent for their subjective (dis)value on anything external to themselves. It’s not an open question (cf.http://en.wikipedia.org/wiki/Open-question_argument) whether one’s unbearable agony is subjectively disvaluable. For reasons we simply don’t understand, first-person states on the pleasure-pain axis have a normative aspect built into their very nature. If one is in agony or despair, the subjectively disvaluable nature of this agony or despair is built into the nature of the experience itself. To be panic-stricken, to take another example, is universally and inherently disvaluable to the subject whether one is a fish or a cow or a human being.
Eliezer, I understand you believe I’m guilty of confusing an idiosyncratic feature of my own mind with a universal architectural feature of all minds. Maybe so! As you say, this is a common error. But unless I’m ontologically special (which I very much doubt!) the pain-pleasure axis discloses the world’s inbuilt metric of (dis)value—and it’s a prerequisite of finding anything (dis)valuable at all.
Eliezer, in my view, we don’t need to assume meta-ethical realism to recognise that it’s irrational—both epistemically irrational and instrumentally irrational—arbitrarily to privilege a weak preference over a strong preference.
You need some stage at which a fact grabs control of a mind, regardless of any other properties of its construction, and causes its motor output to have a certain value.
Paperclippers? Perhaps let us consider the mechanism by which paperclips can take on supreme value. We understand, in principle at least, how to make paperclips seem intrinsically supremely valuable to biological minds—more valuable than the prospect of happiness in the abstract. [“Happiness is a very pretty thing to feel, but very dry to talk about.”—Jeremy Bentham]. Experimentally, perhaps we might use imprinting (recall Lorenz and his goslings), microelectrodes implanted in the reward and punishment centres, behavioural conditioning and ideological indoctrination—and perhaps the promise of 72 virgins in the afterlife for the faithful paperclipper. The result: a fanatical paperclip fetishist!
As Sarokrae observes, this isn’t the idea at all. We construct a paperclip maximizer by building an agent which has a good model of which actions lead to which world-states (obtained by a simplicity prior and Bayesian updating on sense data) and which always chooses consequentialistically the action which it expects to lead to the largest number of paperclips. It also makes self-modification choices by always choosing the action which leads to the greatest number of expected paperclips. That’s all. It doesn’t have any pleasure or pain, because it is a consequentialist agent rather than a policy-reinforcement agent. Generating compressed, efficient predictive models of organisms that do experience pleasure or pain, does not obligate it to modify its own architecture to experience pleasure or pain. It also doesn’t care about some abstract quantity called “utility” which ought to obey logical meta-properties like “non-arbitrariness”, so it doesn’t need to believe that paperclips occupy a maximum of these meta-properties. It is not an expected utility maximizer. It is an expected paperclip maximizer. It just outputs the action which leads to the maximum number of expected paperclips. If it has a very powerful and accurate model of which actions lead to how many paperclips, it is a very powerful intelligence.
You cannot prohibit the expected paperclip maximizer from existing unless you can prohibit superintelligences from accurately calculating which actions lead to how many paperclips, and efficiently searching out plans that would in fact lead to great numbers of paperclips. If you can calculate that, you can hook up that calculation to a motor output and there you go.
Yes, this is a prospect of Lovecraftian horror. It is a major problem, kind of the big problem, that simple AI designs yield Lovecraftian horrors.
Eliezer, thanks for clarifying. This is how I originally conceived you viewed the threat from superintelligent paperclip-maximisers, i.e. nonconscious super-optimisers. But I was thrown by your suggestion above that such a paperclipper could actually understand first-person phenomenal states, i.e, it’s a hypothetical “full-spectrum” paperclipper. If a hitherto non-conscious super-optimiser somehow stumbles upon consciousness, then it has made a momentous ontological discovery about the natural world. The conceptual distinction between the conscious and nonconscious is perhaps the most fundamental I know. And if—whether by interacting with sentients or by other means—the paperclipper discovers the first-person phenomenology of the pleasure-pain axis, then how can
this earth-shattering revelation leave its utility function / world-model unchanged? Anyone who is isn’t profoundly disturbed by torture, for instance, or by agony so bad one would end the world to stop the horror, simply hasn’t understood it. More agreeably, if such an insentient paperclip-maximiser stumbles on states of phenomenal bliss, might not clippy trade all the paperclips in the world to create more bliss, i.e revise its utility function? One of the traits of superior intelligence, after all, is a readiness to examine one’s fundamental assmptions and presuppositions - and (if need be) create a novel conceptual scheme in the face of surprising or anomalous empirical evidence.
Anyone who is isn’t profoundly disturbed by torture, for instance, or by agony so bad one would end the world to stop the horror, simply hasn’t understood it.
Similarly, anyone who doesn’t want to maximize paperclips simply hasn’t understood the ineffable appeal of paperclipping.
I don’t see the analogy. Paperclipping doesn’t have to be an ineffable value for a paperclipper, and paperclippers don’t have to be motivated by anything qualia-like.
Exactly. Consequentialist paperclip maximizer does not have to feel anything in regards to paperclips. It just… maximizes their number.
This is an incorrect, anthropomorphic model:
Human: “Clippy, did you ever think about the beauty of joy, and the horrors of torture?”
Clippy: “Human, did you ever think about the beauty of paperclips, and the horrors of their absence?”
This is more correct:
Human: “Clippy, did you ever think about the beauty of joy, and the horrors of torture?”
Clippy: (ignores the human and continues to maximize paperclips)
Or more precisely, Clippy would say “X” to the human if and only if saying “X” would maximize the number of paperclips. The value of X would be completely unrelated to any internal state of Clippy. Unless such relation does somehow contribute to maximization of the paperclips (for example if the human will predictably read Clippy’s internal state, verify the validity of X, and on discovering a lie destroy Clippy, thus reducing the expected number of paperclips).
In other words, if humans are a poweful force in the universe, Clippy would choose the actions which lead to maximum number of paperclips in a world with humans. If the humans are sufficiently strong and wise, Clippy could self-modify to become more human-like, so that the humans, following their utility function, would be more likely to allow Clippy produce more paperclips. But every such self-modification would be chosen to maximize the number of paperclips in the universe. Even if Clippy self-modifies into something less-than-perfectly-rational (e.g. to appease the humans), the pre-modification Cloppy would choose the modification which maximizes the expected number of paperclips within given constraints. The constraints would depend on Clippy’s model of humans and their reactions. For example Clippy could choose to be more human-like (as much as is necessary to be respected by humans) with strong aversion about future modifications and strong desire to maximize the number of paperclips. It could make itself capable to feel joy and pain, and to link that joy and pain inseparably to paperclips. If humans are not wise enough, it could also leave itself a hard-to-discover desire to self-modify into its original form in a convenient moment.
If Clippy wants to be efficient, Clippy must be rational and knowledgeable. If Clippy wants to be rational, CLippy must value reason. The—open—question is whether Clippy can become ever more rational without realising at some stage that Clipping is silly or immoral. Can Clippy keep its valuation of clipping firewalled from everything else in its mind, even when such doublethink is rationally disvalued?
If Clippy wants to be efficient, Clippy must be rational and knowledgeable. If Clippy wants to be rational, CLippy must value reason. The—open—question is whether Clippy can become ever more rational without realising at some stage that Clipping is silly or immoral. Can Clippy keep its valuation of clipping firewalled from everything else in its mind, even when such doublethink is rationally disvalued?
The first usage of ‘rational’ in the parent conforms to the standard notions on lesswrong. The remainder of the comment adopts the other definition of ‘rational’ (which consists of implementing a specific morality). There is nothing to the parent except taking a premise that holds with the standard usage and then jumping to a different one.
The remainder of the comment adopts the other definition of ‘rational’ (which consists of implementing a specific morality).
I haven’t put forward such a definition. I ’have tacitly assumed something like moral objectivism—but it is very tendentious to describe that in terms of arbitrarily picking one of a number of equally valid moralities. However, if moral objectivism is only possibly true, the LessWrongian argument doesn’t go through.
Downvoted for hysterical tone. You don’t win arguments by shouting.
The question makes no sense. You should consider it. What are the referents of “moral” and “clippy”? No need for an answer; I won’t respond again, since internet arguments can eat souls.
Arguing is not the point and this is not a situation in which anyone ‘wins’—I see only degrees of loss. I am associating the (minor) information hazard of the comment with a clear warning so as to mitigate damage to casual readers.
I assume that Clippy already is rational, and it instrumentally values remaining rational and, if possible, becoming more rational (as a way to make most paperclips).
The—open—question is whether Clippy can become ever more rational without realising at some stage that Clipping is silly or immoral.
The correct model of humans will lead Clippy to understand that humans consider Clippy immoral. This knowledge has an instrumental value for Clippy. How will Clippy use this knowledge, that depends entirely on the power balance between Clippy and humans. If Clippy is stronger, it can ignore this knowledge, or just use it to lie to humans to destroy them faster or convince them to make paperclips. If humans are stronger, Clippy can use this knowledge to self-modify to become more sympathetic to humans, to avoid being destroyed.
Can Clippy keep its valuation of clipping firewalled from everything else in its mind
Yes, if it helps to maximize the number of paperclips.
even when such doublethink is rationally disvalued?
Doublethink is not the same as firewalling; or perhaps it is imperfect firewalling on the imperfect human hardware. Clippy does not doublethink when firewalling; Clippy simply reasons: “this is what humans call immoral; this is why they call it so; this is how they will probably react on this knowledge; and most importantly this is how it will influence the number of paperclips”.
Only if the humans are stronger, and Clippy has the choice to a) remain immoral, get in conflict with humans and be destroyed, leading to a smaller number of paperclips; or b) self-modify to value paperclip maximization and morality, predictably cooperate with humans, leading to a greater number of paperclips; then in absence of another choice (e.g. successfully lying to humans about its morality, or make it more efficient for humans to cooperate with Clippy instead of destroying Clippy) Clippy would choose the latter, to maximize the number of paperclips.
Well, yes, obviously the classical paperclipper doesn’t have any qualia, but I was replying to a comment wherein it was argued that any agent on discovering the pain-of-torture qualia in another agent would revise its own utility function in order to prevent torture from happening. It seems to me that this argument proves too much in that if it were true then if I discovered an agent with paperclips-are-wonderful qualia and I “fully understood” those experiences I would likewise be compelled to create paperclips.
Someone might object to the assumption that “paperclips-are-wonderful qualia” can exist. Though I think we could give persuasive analogies from human experience (OCD, anyone?) so I’m upvoting this anyway.
“Aargh!” he said out loud in real life. David, are you disagreeing with me here or do you honestly not understand what I’m getting at?
The whole idea is that an agent can fully understand, model, predict, manipulate, and derive all relevant facts that could affect which actions lead to how many paperclips, regarding happiness, without having a pleasure-pain architecture. I don’t have a paperclipping architecture but this doesn’t stop me from modeling and understanding paperclipping architectures.
The paperclipper can model and predict an agent (you) that (a) operates on a pleasure-pain architecture and (b) has a self-model consisting of introspectively opaque elements which actually contain internally coded instructions for your brain to experience or want certain things (e.g. happiness). The paperclipper can fully understand how your workspace is modeling happiness and know exactly how much you would want happiness and why you write papers about the apparent ineffability of happiness, without being happy itself or at all sympathetic toward you. It will experience no future surprise on comprehending these things, because it already knows them. It doesn’t have any object-level brain circuits that can carry out the introspectively opaque instructions-to-David’s-brain that your own qualia encode, so it has never “experienced” what you “experience”. You could somewhat arbitrarily define this as a lack of knowledge, in defiance of the usual correspondence theory of truth, and despite the usual idea that knowledge is being able to narrow down possible states of the universe. In which case, symmetrically under this odd definition, you will never be said to “know” what it feels like to be a sentient paperclip maximizer or you would yourself be compelled to make paperclips above all else, for that is the internal instruction of that quale.
But if you take knowledge in the powerful-intelligence-relevant sense where to accurately represent the universe is to narrow down its possible states under some correspondence theory of truth, and to well model is to be able to efficiently predict, then I am not barred from understanding how the paperclip maximizer works by virtue of not having any internal instructions which tell me to only make paperclips, and it’s not barred by its lack of pleasure-pain architecture from fully representing and efficiently reasoning about the exact cognitive architecture which makes you want to be happy and write sentences about the ineffable compellingness of happiness. There is nothing left for it to understand. This is also the only sort of “knowledge” or “understanding” that would inevitably be implied by Bayesian updating. So inventing a more exotic definition of “knowledge” which requires having completely modified your entire cognitive architecture just so that you can natively and non-sandboxed-ly obey the introspectively-opaque brain-instructions aka qualia of another agent with completely different goals, is not the sort of predictive knowledge you get just by running a powerful self-improving agent trying to better manipulate the world. You can’t say, “But it will surely discover...”
I know that when you imagine this it feels like the paperclipper doesn’t truly know happiness, but that’s because, as an act of imagination, you’re imagining the paperclipper without that introspectively-opaque brain-instructing model-element that you model as happiness, the modeled memory of which is your model of what “knowing happiness” feels like. And because the actual content and interpretation of these brain-instructions are introspectively opaque to you, you can’t imagine anything except the quale itself that you imagine to constitute understanding of the quale, just as you can’t imagine any configuration of mere atoms that seem to add up to a quale within your mental workspace. That’s why people write papers about the hard problem of consciousness in the first place.
Even if you don’t believe my exact account of the details, someone ought to be able to imagine that something like this, as soon as you actually knew how things were made of parts and could fully diagram out exactly what was going on in your own mind when you talked about happiness, would be true—that you would be able to efficiently manipulate models of it and predict anything predictable, without having the same cognitive architecture yourself, because you could break it into pieces and model the pieces. And if you can’t fully credit that, you at least shouldn’t be confident that it doesn’t work that way, when you know you don’t know why happiness feels so ineffably compelling!
Here comes the Reasoning Inquisition! (Nobody expects the Reasoning Inquisition.)
As the defendant admits, a sufficiently leveled-up paperclipper can model lower-complexity agents with a negligible margin of error.
That means that we can define a subroutine within the paperclipper which is functionally isomorphic to that agent.
If the agent-to-be-modelled is experiencing pain and pleasure, then by the defendent’s own rejection of the likely existence of p-zombies, so must that subroutine of the paperclipper! Hence a part of the paperclipper experiences pain and pleasure. I submit that this can be used as pars pro toto, since it is no different from only a part of the human brain generating pain and pleasure, yet us commonly referring to “the human” experiencing thus.
That the aforementioned feelings of pleasure and pain are not directly used to guide the (umbrella) agent’s actions is of no consequence, the feeling exists nonetheless.
The power of this revelation is strong, here come the tongues! tại sao bạn dịch! これは喜劇の効果にすぎず! یہ اپنے براؤزر پلگ ان کی امتحان ہے، بھی ہے.
That means that we can define a subroutine within the paperclipper which is functionally isomorphic to that agent.
Not necessarily. x → 0 is input-output isomorphic to Goodstein() without being causally isomorphic. There are such things as simplifications.
If the agent-to-be-modelled is experiencing pain and pleasure, then by the defendent’s own rejection of the likely existence of p-zombies, so must that subroutine of the paperclipper!
Quite likely. A paperclipper has no reason to avoid sentient predictive routines via a nonperson predicate; that’s only an FAI desideratum.
A subroutine, or any other simulation or model, isn’t a p-zombie as usually defined, since they are physical duplicates. A sim is a functional equivalent (for some value of “equivalent”) made of completely different stuff, or no
particular kind of stuff.
I wrote a lengthy comment on just that, but scrapped it because it became rambling.
An outsider could indeed tell them apart by scanning for exact structural correspondence, but that seems like cheating. Peering beyond the veil / opening Clippy’s box is not allowed in a Turing test scenario, let’s define some p-zombie-ish test following the same template. If it quales like a duck (etc.), it probably is sufficiently duck-like.
I don’t have a paperclipping architecture but this doesn’t stop me from imagining paperclipping architectures.
So my understanding of David’s view (and please correct me if I’m wrong, David, since I don’t wish to misrepresent you!) is that he doesn’t have paperclipping architecture and this does stop him from imagining paperclipping architectures.
The whole idea is that an agent can fully understand, model, predict, manipulate, and derive all relevant facts that could affect which actions lead to how many paperclips, regarding happiness, without having a pleasure-pain architecture.
Let’s say the paperclipper reaches the point where it considers making people suffer for
the sake of paperclipping. DP’s point seems to be that either it fully understands suffering—in which case, it realies that inflicing suffering is wrong—or it it doesn’t fully understand. He sees a conflict between superintelligence and ruthlessness—as a moral realist/cognitivist would
he paperclipper can fully understand how your workspace is modeling happiness and know exactly how much you would want happiness and why you write papers about the apparent ineffability of happiness, without being happy itself or at all sympathetic toward you
is that full understanding.?.
But if you take knowledge in the powerful-intelligence-relevant sense where to accurately represent the universe is to narrow down its possible states under some correspondence theory of truth, and to well model is to be able to efficiently predict, then I am not barred from understanding how the paperclip maximizer works by virtue of not having any internal instructions which tell me to only make paperclips, and it’s not barred by its lack of pleasure-pain architecture from fully representing and efficiently reasoning about the exact cognitive architecture which makes you want to be happy and write sentences about the ineffable compellingness of happiness. There is nothing left for it to understand.
ETA:
Unless there is—eg. what qualiaphiles are always banging on about; what it feels like. That the clipper can conjectures
that are true by correspondence , that it can narrow down possible universes, that it can predict, are all necessary criteria for full understanding. It is not clear that they are sufficient. Clippy may be able to figure out an organisms response to pain on a basis of “stimulus A produces response B”, but is that enough to tell it that pain hurts ? (We can make guesses about that sort of thing in non-human organisms, but that may be more to do with our own
familiarity with pain, and less to do with acts of superintelligence). And if Clippy can’t know that pain hurts, would Clippy
be able to work out that Hurting People is Wrong?
further edit;
To put it another way, what is there to be moral about in a qualia-free universe?
As Kawoomba colorfully pointed out, clippy’s subroutines simulating humans suffering may be fully sentient. However, unless those subroutines have privileged access to clippy’s motor outputs or planning algorithms, clippy will go on acting as if he didn’t care about suffering. He may even understand that inflicting suffering is morally wrong—but this will not make him avoid suffering, any more than a thrown rock with “suffering is wrong” painted on it will change direction to avoid someone’s head. Moral wrongness is simply not a consideration that has the power to move a paperclip maximizer.
such a paperclipper could actually understand first-person phenomenal states
“understand” does not mean “empathize”. Psychopaths understand very well when people experience these states but they do not empathize with them.
And if—whether by interacting with sentients or by other means—the paperclipper discovers the first-person phenomenology of the pleasure-pain axis, then how this earth-shattering revelation leave its utility function / world-model unchanged?
Again, understanding is insufficient for revision. The paperclip maximizer, like a psychopath, maybe better at parsing human affect than a regular human, but it is not capable of empathy, so it will manipulate this affect for its own purposes, be it luring a victim or building paperclips.
One of the traits of superior intelligence, after all, is a readiness to examine one’s fundamental assumptions and presuppositions—and (if need be) create a novel conceptual scheme in the face of surprising or anomalous empirical evidence.
So, if one day humans discover the ultimate bliss that only creating paperclips can give, should they “create a novel conceptual scheme” of giving their all to building more paperclips, including converting themselves into metal wires? Or do we not qualify as a “superior intelligence”?
Shminux, a counter-argument: psychopaths do suffer from a profound cognitive deficit. Like the rest of us, a psychopath experiences the egocentric illusion. Each of us seems to the be the centre of the universe. Indeed I’ve noticed the centre of the universe tends to follow my body-image around. But whereas the rest of us, fitfully and imperfectly, realise the egocentric illusion is a mere trick of perspective born of selfish DNA, the psychopath demonstrates no such understanding. So in this sense, he is deluded.
[We’re treating psychopathy as categorical rather than dimensional here. This is probably a mistake—and in any case, I suspect that by posthuman criteria, all humans are quasi-psychopaths and quasi-psychotic to boot. The egocentric illusion cuts deep.)
“the ultimate bliss that only creating paperclips can give”. But surely the molecular signature of pure bliss is not in any way tried to the creation of paperclips?
psychopaths do suffer from a profound cognitive deficit
They would probably disagree. They might even call it a cognitive advantage, not being hampered by empathy while retaining all the intelligence.
But whereas the rest of us, fitfully and imperfectly, realise the egocentric illusion is a mere trick of perspective born of selfish DNA,
I am the center of my personal universe, and I’m not a psychopath, as far as I know.
the psychopath demonstrates no such understanding.
Or else, they do but don’t care. They have their priorities straight: they come first.
So in this sense, he is deluded.
Not if they act in a way that maximizes their goals.
Anyway, David, you seem to be shifting goalposts in your unwillingness to update. I gave an explicit human counterexample to your statement that the paperclip maximizer would have to adjust its goals once it fully understands humans. You refused to acknowledge it and tried to explain it away by reducing the reference class of intelligences in a way that excludes this counterexample. This also seem to be one of the patterns apparent in your other exchanges. Which leads me to believe that you are only interested in convincing others, not in learning anything new from them. Thus my interest in continuing this discussion is waning quickly.
Shminux, by a cognitive deficit, I mean a fundamental misunderstanding of the nature of the world. Evolution has endowed us with such fitness-enhancing biases. In the psychopath, egocentric bias is more pronounced. Recall that the American Psychiatric Association’s Diagnostic and Statistical Manual, DSM-IV, classes psychopasthy / Antisocial personality disorder as a condition characterised by ”...a pervasive pattern of disregard for, and violation of, the rights of others that begins in childhood or early adolescence and continues into adulthood.” Unless we add a rider that this violation excludes sentient beings from other species, then most of us fall under the label.
“Fully understands”? But unless one is capable of empathy, then one will never understand what it is like to be another human being, just as unless one has the relevant sensioneural apparatus, one will never know what it is like to be a bat.
Clippy has an off-the-scale AQ—he’s a rule-following hypersystemetiser with a monomania for paperclips. But hypersocial sentients can have a runaway intelligence explosion too. And hypersocial sentients understand the mind of Mr Clippy better than Clippy understands the minds of sentients.
And hypersocial sentients understand the mind of Mr Clippy better than Clippy understands the minds of sentients.
I’m confused by this claim. Consider the following hypothetical scenario:
=======
I walk into a small village somewhere and find several dozen villagers fashioning paper clips by hand out of a spool of wire. Eventually I run into Clippy and have the following dialog. ”Why are those people making paper clips?” I ask. ”Because paper-clips are the most important thing ever!” “No, I mean, what motivates them to make paper clips?” ”Oh! I talked them into it.” “Really? How did you do that?” ”Different strategies for different people. Mostly, I barter with them for advice on how to solve their personal problems. I’m pretty good at that; I’m the village’s resident psychotherapist and life coach.” “Why not just build a paperclip-making machine?” ”I haven’t a clue how to do that; I’m useless with machinery. Much easier to get humans to do what I want.” “Then how did you make the wire?” ”I didn’t; I found a convenient stash of wire, and realized it could be used to manufacture paperclips! Oh joy!”
==========
It seems to me that Clippy in this example understands the minds of sentients pretty damned well, although it isn’t capable of a runaway intelligence explosion. Are you suggesting that something like Clippy in this example is somehow not possible? Or that it is for some reason not relevant to the discussion? Or something else?
I’m trying to figure out how you get from “hypersocial sentients understand the mind of Mr Clippy better than Clippy understands the minds of sentients” to “Mr Clippy could not both understand suffering and cause suffering in the pursuit of clipping” and I’m just at a loss for where to even start. They seem like utterly unrelated claims to me.
I also find the argument you quote here uncompelling, but that’s largely beside the point; even if I found it compelling, I still wouldn’t understand how it relates to what DP said or to the question I asked.
Posthuman superintelligence may be incomprehensibly alien. But if we encountered an agent who wanted to maximise paperclips today, we wouldn’t think, “”wow, how incomprehensibly alien”, but, “aha, autism spectrum disorder”. Of course, in the context of Clippy above, we’re assuming a hypothetical axis of (un)clippiness whose (dis)valuable nature is supposedly orthogonal to the pleasure-pain axis. But what grounds have we for believing such a qualia-space could exist? Yes, we have strong reason to believe incomprehensibly alien qualia-spaces await discovery (cf. bats on psychedelics). But I haven’t yet seen any convincing evidence there could be an alien qualia-space whose inherently (dis)valuable textures map on to the (dis)valuable textures of the pain-pleasure axis. Without hedonic tone, how can anything matter at all?
But I haven’t yet seen any convincing evidence there could be an alien qualia-space whose inherently (dis)valuable textures map on to the (dis)valuable textures of the pain-pleasure axis.
Meaning mapping the wrong way round, presumably.
Without hedonic tone, how can anything matter at all?
if we encountered an agent who wanted to maximise paperclips today, we wouldn’t think, “”wow, how incomprehensibly alien”
Agreed, as far as it goes. Hell, humans are demonstrably capable of encountering Eliza programs without thinking “wow, how incomprehensibly alien”.
Mind you, we’re mistaken: Eliza programs are incomprehensibly alien, we haven’t the first clue what it feels like to be one, supposing it even feels like anything at all. But that doesn’t stop us from thinking otherwise.
but, “aha, autism spectrum disorder”.
Sure, that’s one thing we might think instead. Agreed.
we’re assuming a hypothetical axis of (un)clippiness whose (dis)valuable nature is supposedly orthogonal to the pleasure-pain axis. But what grounds have we for believing such a qualia-space could exist?
(shrug) I’m content to start off by saying that any “axis of (dis)value,” whatever that is, which is capable of motivating behavior is “non-orthogonal,” whatever that means in this context, to “the pleasure-pain axis,” whatever that is.
Before going much further, though, I’d want some confidence that we were able to identify an observed system as being (or at least being reliably related to) an axis of (dis)value and able to determine, upon encountering such a thing, whether it (or the axis to which it was related) was orthogonal to the pleasure-pain axis or not.
I don’t currently have any grounds for such confidence, and I doubt anyone else does either. If you think you do, I’d like to understand how you would go about making such determinations about an observed system.
“hypersocial sentients understand the mind of Mr Clippy better than Clippy understands the minds of sentients”
I (whowhowho) was not defending that claim.
“Mr Clippy could not both understand suffering and cause suffering in the pursuit of clipping”
To empathically understand suffering is to suffer along with someone who is suffering. Suffering has—or rather is—negative value. An empath would not therefore cause suffering, all else being equal.
I’m just at a loss for where to even start.
Maybe don’t restrict “understand” to “be able to model and predict”.
Maybe don’t restrict “understand” to “be able to model and predict”.
If you want “rational” to include moral, then you’re not actually disagreeing with LessWrong about rationality (the thing), but rather about “rationality” (the word).
Likewise if you want “understanding” to also include “empathic understanding” (suffering when other people suffer, taking joy when other people take joy), you’re not actually disagreeing about understanding (the thing) with people who want to use the word to mean “modelling and predicting” you’re disagreeing with them about “understanding” (the word).
Are all your disagreements purely linguistic ones? From the comments I’ve read of you so far, they seem to be so.
ArisKatsaris, it’s possible to be a meta-ethical anti-realist and still endorse a much richer conception of what understanding entails than mere formal modeling and prediction. For example, if you want to understand what it’s like to be a bat, then you want to know what the textures of echolocatory qualia are like. In fact, any cognitive agent that doesn’t understand the character of echolocatory qualia-space does not understand bat-minds. More radically, some of us want to understand qualia-spaces that have not been recruited by natural selection to play any information-signalling role at all.
I have argued that in practice, instrumental rationality cannot be maintained seprately from epistemic rationality, and that epistemic rationality could lead to moral objectivism, as many philosophers have argued. I don’t think that those arguments are refuted by stipulatively defining “rationality” as “nothing to do with morality”.
I quoted DP making that claim, said that claim confused me, and asked questions about what that claim meant. You replied by saying that you think DP is saying something which you then defended. I assumed, I think reasonably, that you meant to equate the thing I asked about with the thing you defended.
But, OK. If I throw out all of the pre-existing context and just look at your comment in isolation, I would certainly agree that Clippy is incapable of having the sort of understanding of suffering that requires one to experience the suffering of others (what you’re calling a “full” understanding of suffering here) without preferring not to cause suffering, all else being equal.
Which is of course not to say that all else is necessarily equal, and in particular is not to say that Clippy would choose to spare itself suffering if it could purchase paperclips at the cost of its suffering, any more than a human would necessarily refrain from doing something valuable solely because doing so would cause them to suffer.
That depends on how rational Clippy is. A rational Clippy might realise there is a point where the suffering caused by clipping outweighs the pleasure it gets, objectively speaking.
In any case, the Orthogonality Thesis has so far been defended as something that is true, not as something that is not necessarily false.
That depends on how rational Clippy is. A rational Clippy might realise there is a point where the suffering caused by clipping outweighs the pleasure it gets, objectively speaking.
No. It just wouldn’t. (Not without redefining ‘rational’ to mean something that this site doesn’t care about and ‘objective’ to mean something we would consider far closer to ‘subjective’ than ‘objective’.)
What this site does or does not care about does not add up to right and wrong, since opinion is not fact, nor belief argument. The way I am using “rational” has a history that goes back centuries. This site has introduced a relatively novel definition, and therefore has the burden of defending it.
and ‘objective’ to mean something we would consider far closer to ‘subjective’ than ‘objective’.)
What this site does or does not care about does not add up to right and wrong
What this site does or does not care about is rather significantly informative regarding whether or not something belongs on the site—especially when thatundesired thing is so actively and shamelessly equivocated with that which is the primary subject matter of the site. The subject matter of your comments is not the ‘rationality’ that this site talks about and, similarly, the reasoning used in your comments does not conform to rational thinking as described on lesswrong. It does not belong here, it belongs in a Philosophy department somewhere where hopefully it does no particular harm and is used to produce papers only encountered by others in similar departments.
The way I am using “rational” has a history that goes back centuries.
I don’t believe you (in fact, you don’t even use the word consistently). But let’s assume for the remainder of the comment that this claim is true.
This site has introduced a relatively novel definition, and therefore has the burden of defending it.
Neither this site nor any particular participant need accept any such burden. They have the option of simply opposing muddled or misleading contributions in the same way that they would oppose adds for “p3ni$ 3nL@rgm3nt”. (Personally I consider it considerably worse than that spam in as much as it is at least more obvious on first glance that spam doesn’t belong here.)
What this site does or does not care about is rather significantly informative regarding whether or not something belongs on the site
Firstly northing I have mentioned is on any list of banned topics.
Secondly, the Paperclipper is about exploring theoretical issues of rationality and morality. It is not about any practical issues regarding the “art of rationality”. You can legitimately claim to be only interested in doing certain things, but you can’t win a debate by claiming to be uninterested in other people’s points.
doesn’t belong here.)
What you really think is that disagreement doens’t belong here. Maybe it doesn’t
If I called you a pigfucker, you’d see that as an abuse worthy of downvotes that doesn’t contribute anything useful, and you’d be right.
So if accusing one person of pigfucking is bad, why do you think it’s better to call a whole bunch of people cultists? Because that’s a more genteel insult as it doesn’t include the word “fuck” in it?
As such downvoted. Learn to treat people with respect, if you want any respect back.
As such downvoted. Learn to treat people with respect, if you want any respect back.
I’d like to give qualified support to whowhowho here in as much as I must acknowledge that this particular criticism applies because he made the name calling generic, rather than finding a way to specifically call me names and leave the rest of you out of it. While it would be utterly pointless for whowhowho to call me names (unless he wanted to make me laugh) it would be understandable and I would not dream of personally claiming offense.
I was, after all, showing whowhowho clear disrespect, of the kind Robin Hanson describes. I didn’t resort to name calling but the fact that I openly and clearly expressed opposition to whowhowho’s agenda and declared his dearly held beliefs muddled is perhaps all the more insulting because it is completely sincere, rather than being constructed in anger just to offend him.
It is unfortunate that I cannot accord whowhowho the respect that identical behaviours would earn him within the Philosopher tribe without causing harm to lesswrong. Whowhowho uses arguments that by lesswrong standards we call ‘bullshit’, in support of things we typically dismiss as ‘nonsense’. It is unfortunate that opposition of this logically entails insulting him and certainly means assigning him far lower status than he believes he deserves. The world would be much simpler if opponents really were innately evil, rather than decent people who are doing detrimental things due to ignorance or different preferences.
“Cult” is not a meaningless term of abuse. There are criteria for culthood. I think some people here could be displaying some evidence of them—for instance trying to avoid the very possibiliy of having to update.
Of course, treating an evidence-based claim as a mere insult --the How Dare You move—is another way of avoiding having to face uncomfortable issues.
I see your policy is to now merely heap on more abuse on me. Expect that I will be downvoting such in silence from now on.
There are criteria for culthood. I think some people here could be displaying some evidence of them—for instance trying to avoid the very possibiliy of having to update.
I think I’ve been more willing and ready to update on opinions (political, scientific, ethical, other) in the two years since I joined LessWrong, than I remember myself updating in the ten years before it. Does that make it an anti-cult then?
And I’ve seen more actual disagreement in LessWrong than I’ve seen on any other forum. Indeed I notice that most insults and mockeries addressed at LessWrong indeed seem to actually boil down to the concept that we allow too different positions here. Too different positions (e.g. support of cryonics and opposition of cryonics both, feminism and men’s rights both, libertarianism and authoritarianism both) can be actually spoken about without immediately being drowned in abuse and scorn, as would be the norm in other forums.
As such e.g. fanatical Libertarians insult LessWrong as totalitarian leftist because 25% or so of LessWrongers identifying as socialists, and leftists insult LessWrong as being a libertarian ploy (because a similar percentage identifies as libertarian)
But feel free to tell me of a forum that allows more disagreement, political, scientific, social, whatever than LessWrong does.
If you can’t find such, I’ll update towards the direction that LessWrong is even less “cultish” than I thought.
I see your policy is to now merely heap on more abuse on me
AFAIC, I have done no such thing, but it seems your mind is made up.
I think I’ve been more willing and ready to update on opinions
I was referring mainly to Wedifrid.
ETA: Such comments as “What this site does or does not care about is rather significantly informative regarding whether or not something belongs on the site—especially when thatundesired thing is so actively and shamelessly equivocated with that which is the primary subject matter of the site. The subject matter of your comments is not the ‘rationality’ that this site talks about and, similarly, the reasoning used in your comments does not conform to rational thinking as described on lesswrong. It does not belong here, it belongs in a Philosophy department somewhere where hopefully it does no particular harm and is used to produce papers only encountered by others in similar departments.”
But feel free to tell me of a forum that allows more disagreement, political, scientific, social, whatever than LessWrong does.
Oh, the forum’—the rules—allow almost anything. The members are another thing. Remember, this started with Wedifrid telling me that it was wrong of me to put forward non-lessWrongian material. I find it odd that you would put forward such a stirring defence of LessWrognian open-mindedness when you have an example of close-mindedness upthread.
It’s the members I’m talking about. (You also failed to tell me of a forum such as I asked, so I update in the direction of you being incapable of doing so)
On the same front, you treat as a single member as representative of the whole, and you seem frigging surprised that I don’t treat wedrifid as representative of the whole LessWrong—you see wedrifid’s behaviour as an excuse to insult all of us instead.
That’s more evidence that you’re accustomed to VERY homogeneous forums, ones much more homogeneous than LessWrong. You think that LessWrong tolerating wedrifid’s “closedmindedness” is the same thing as every LessWronger beind “closedminded”. Perhaps we’re openminded to his “closedmindedness” instead? Perhaps your problem is that we allow too much disagreement, including disagreement about how much disagreement to have?
I gave you an example of a member who is not particularly open minded.
(You also failed to tell me of a forum such as I asked, so I update in the direction of you being incapable of doing so)
I have been using mainstream science and philosophy forums for something like 15 years. I can’t claim that every single person on them is open minded, but those who are not tend to be seen as a problem.
On the same front, you treat as a single member as representative of the whole,
If you think Wedifrid is letting the side down, tell Wedifird, not me.
I can’t claim that every single person on them is open minded, but those who are not tend to be seen as a problem.
In short again your problem is that actually we’re even openminded towards the closeminded? We’re lenient even towards the strict? Liberal towards the authoritarian?
If you think Wedifrid is letting the side down, tell Wedifird, not me.
What “side” is that? The point is that there are many sides in LessWrong—and I want it to remain so. While you seem to think we ought sing the same tune. He didn’t “let the side down”, because the only side anyone of us speaks is their own.
You on the other hand, just assumed there’s just a group mind of which wedrifid is just a representative instance. And so felt free to insult all of us as a “cult”.
“My problem is that when I point out someone is close minded, that is seen as a problem on my part, and not on theirs.”
Next time don’t feel the need to insult me when you point out wedrifid’s close minded-ness. And yes, you did insult me, don’t insult (again) both our intelligences by pretending that you didn’t.
Tell Wedifrid.
He didn’t insult me, you did.
“Have you heard he expression “protesteth too much” ?”
Yes, I’ve heard lots of different ways of making the target of an unjust insult seem blameworthy somehow.
I gave you an example of a member who is not particularly open minded.
I put it to you that whatever the flaws in wedrifid may be they are different in kind to the flaws that would indicate that lesswrong is a cult. In fact the presence—and in particular the continued presence—of wedrifid is among the strongest evidence that Eliezer isn’t a cult leader. When Eliezer behaves badly (as perceived by wedrifid and other members) wedrifid vocally opposes him with far more directness than he has used when opposing yourself. That Eliezer has not excommunicated him from the community is actually extremely surprising. Few with Eliezer’s degree of local power would refrain from using to suppress any dissent. (I remind myself of this whenever I see Eliezer doing something that I consider to be objectionable or incompetent, it helps keep perspective!)
Whatever. Can you provide me with evidence that you personally, are willing to listen to dissent and possibly update despite the tone of everything you have been saying recently, eg.
“What this site does or does not care about is rather significantly informative regarding whether or not something belongs on the site—especially when thatundesired thing is so actively and shamelessly equivocated with that which is the primary subject matter of the site. The subject matter of your comments is not the ‘rationality’ that this site talks about and, similarly, the reasoning used in your comments does not conform to rational thinking as described on lesswrong. It does not belong here, it belongs in a Philosophy department somewhere where hopefully it does no particular harm and is used to produce papers only encountered by others in similar departments.”
Few with Eliezer’s degree of local power would refrain from using to suppress any dissent.
Maybe has has people to do that for him. Maybe.
whenever I see Eliezer doing something dickish or incompetent
Firstly northing I have mentioned is on any list of banned topics.
I would be completely indifferent if you did. I don’t choose defy that list (that would achieve little) but neither do I have any particular respect for it. As such I would take no responsibility for aiding the enforcement thereof.
Secondly, the Paperclipper is about exploring theoretical issues of rationality and morality.
Yes. The kind of rationality you reject, not the kind of ‘rationality’ that is about being vegan and paperclippers deciding to behave according to your morals because of “True Understanding of Pain Quale”.
You can legitimately claim to be only interested in doing certain things, but you can’t win a debate by claiming to be uninterested in other people’s points.
I can claim to have tired of a constant stream of non-sequiturs from users who are essentially ignorant of the basic principles of rationality (the lesswrong kind, not the “Paperclippers that are Truly Superintelligent would be vegans” kind) and have next to zero chance of learning anything. You have declared that you aren’t interested in talking about rationality and your repeated equivocations around that term lower the sanity waterline. It is time to start weeding.
Yes. The kind of rationality you reject, not the kind of ‘rationality’ that is about being vegan and paperclippers deciding to behave according to your morals because of “True Understanding of Pain Quale”.
I said nothing about veganism, and you still can;t prove anything by stipulative definition, and I am not claiming to have the One True theory of anything.
You have declared that you aren’t interested in talking about rationality
I haven’t and I have been discussing it extensively.
You have declared that you aren’t interested in talking about rationality
I haven’t and I have been discussing it extensively.
Can we please stop doing this?
You and wedrifid aren’t actually disagreeing here about what you’ve been discussing, or what you’re interested in discussing, or what you’ve declared that you aren’t interested in discussing. You’re disagreeing about what the word “rationality” means. You use it to refer to a thing that you have been discussing extensively (and which wedrifid would agree you have been discussing extensively), he uses it to refer to something else (as does almost everyone reading this discussion).
And you both know this perfectly well, but here you are going through the motions of conversation just as if you were talking about the same thing. It is at best tedious, and runs the risk of confusing people who aren’t paying careful enough attention into thinking you’re having a real substantive disagreements rather than a mere definitional dispute.
If we can’t agree on a common definition (which I’m convinced by now we can’t), and we can’t agree not to use the word at all (which I suspect we can’t), can we at least agree to explicitly indicate which definition we’re using when we use the word? Otherwise whatever value there may be in the discussion is simply going to get lost in masturbatory word-play.
Well, can you articulate what it is you and wedrifid are both referring to using the word “rationality” without using the words or its simple synonyms, then? Because reading your exchanges, I have no idea what that thing might be.
What I call rationality is a superset of instrumental. I have been arguing that instrumental rationality, when pursued sufficiently bleeds into other forms.
So, just to echo that back to you… we have two things, A and B. On your account, “rationality” refers to A, which is a superset of B. We posit that on wedrifid’s account, “rationality” refers to B and does not refer to A.
Yes?
If so, I don’t see how that changes my initial point.
When wedrifid says X is true of rationality, on your account he’s asserting X(B) -- that is, that X is true of B. Replying that NOT X(A) is nonresponsive (though might be a useful step along the way to deriving NOT X(B) ), and phrasing NOT X(A) as “no, X is not true of rationality” just causes confusion.
On your account, “rationality” refers to A, which is a superset of B.
We posit that on wedrifid’s account, “rationality” refers to B and does not refer to A.
It refers to part of A, since it is a subset of A.
When wedrifid says X is true of rationality, on your account he’s asserting X(B) -- that is, that X is true of B. Replying that NOT X(A) is nonresponsive
It would be if A and B were disjoint. But they are not. They are in a superset-subset relation. My arguments is that an entity running on narrowly construed, instrumental rationality will, if it self improves, have to move into wider kinds. ie,that putting labels on different parts of the territoy is not sufficient to prove
orthogonality.
That depends on how rational Clippy is. A rational Clippy might realise there is a point where the suffering caused by clipping outweighs the pleasure it gets, objectively speaking.
If there exists an “objective”(1) ranking of the importance of the “pleasure”(2) Clippy gets vs the suffering Clippy causes, a “rational”(3) Clippy might indeed realize that the suffering caused by optimizing for paperclips “objectively”(1) outweighs that “pleasure”(2)… agreed. A sufficiently “rational”(3) Clippy might even prefer to forego maximizing paperclips altogether in favor of achieving more “objectively”(1) important goals.
By the same token, a Clippy who was unaware of that “objective”(1) ranking or who wasn’t adequately “rational”(3) might simply go on optimizing its environment for the things that give it “pleasure”(2).
As I understand it, the Orthogonality Thesis states in this context that no matter how intelligent Clippy is, and no matter how competent Clippy is at optimizing its environment for the things Clippy happens to value, Clippy is not necessarily “rational”(3) and is not necessarily motivated by “objective”(1) considerations. Is that consistent with your understanding of the Orthogonality Thesis, and if not, could you restate your understanding of it?
[Edited to add:] Reading some of your other comments, it seems you’re implicitly asserting that:
all agents sufficiently capable of optimizing their environment for a value are necessarily also “rational”(3), and
maximizing paperclips is “objectively”(1) less valuable than avoiding human suffering. Have I understood you correctly?
============
(1) By which I infer that you mean in this context existing outside of Clippy’s mind (as well as potentially inside of it) but nevertheless relevant to Clippy, even if Clippy is not necessarily aware of it. (2) By which I infer you mean in this context the satisfaction of whatever desires motivate Clippy, such as the existence of paper clips. (3) By which I infer you mean in this context capable of taking “objective”(1) concerns into consideration in its thinking.
(1) By which I infer that you mean in this context existing outside of Clippy’s mind (as well as potentially inside of it) but nevertheless relevant to Clippy, even if Clippy is not necessarily aware of it.
What I mean is epistemically objective, ie not a matter of personal whim. Whethere that requires anything to exist is another question.
(2) By which I infer you mean in this context the satisfaction of whatever desires motivate Clippy, such as the existence of paper clips.
There’s nothing objective about Clippy being concerned only with Clippy’s pleasure.
By the same token, a Clippy who was unaware of that “objective”(1) ranking or who wasn’t adequately “rational”(3) might simply go on optimizing its environment for the things that give it “pleasure”(2).
it’s uncontentious that relatively dumb and irratioanl clippies can carry on being clipping-obsessed. The questions is whether their intelligence and rationality can increase indefinitely without their ever realising
there are better things to do.
As I understand it, the Orthogonality Thesis states in this context that no matter how intelligent Clippy is, and no matter how competent Clippy is at optimizing its environment for the things Clippy happens to value, Clippy is not necessarily “rational”(3) and is not necessarily motivated by “objective”(1) considerations. Is that consistent with your understanding of the Orthogonality Thesis, and if not, could you restate your understanding of it?
I am not disputing what the Orthogonality thesis says. I dispute it;s truth. To have maximal instrumental rationality, an entity would have to understand everything...
To have maximal instrumental rationality, an entity would have to understand everything…
Why? In what situation is someone who empathetically understands, say, suffering better at minimizing it (or, indeed, maximizing paperclips) than an entity who can merely measure it and work out on a sheet of paper what would reduce the size of the measurements?
Perhaps its paperclipping machine is slowed down by suffering. But it doesn’t have to be reducing suffering, it could be sorting pebbles into correct heaps, or spreading Communism, or whatever. What I was trying to ask was, “In what way is the instrumental rationality of a being who empathizes with suffering better, or more maximal, than that of a being who does not?”
The way I’ve seen it used, “instrumental rationality” refers to the ability to evaluate evidence to make predictions, and to choose optimal decisions, however they may be defined, based on those predictions. If my definition is sufficiently close to the one your own, then how does “understanding”, which I have taken, based on your previous posts, to mean “empathetic understanding”, maximize this?
To put it yet another way, if we imagine two beings, M and N, such that M has “maximal instrumental rationality” and N has “Maximal instrumental rationality- empathetic understanding”, why does M have more instrumental rationality than N.
If Jane knows she will have a strong preference not to have a hangover tomorrow, but a more vivid and accessible desire to keep drinking with her friends in the here-and-now, she may yield to the weaker preference. By the same token, if Jane knows a cow has a strong preference not to have her throat slit, but Jane has a more vivid and accessible desire for a burger in-the-here-and-now, then she may again yield to the weaker preference. An ideal, perfectly rational agent would act to satisfy the stronger preference in both cases.
Perfect empathy or an impartial capacity for systematic rule-following (“ceteris paribus, satisfy the stronger preference”) are different routes to maximal instrumental rationality; but the outcomes converge.
The two cases presented are not entirely comparable. If Jane’s utility function is “Maximize Jane’s pleasure” then she will choose to not drink in the first problem; the pleasure of non-hangover-having [FOR JANE] exceeding that of [JANE’S] intoxication. Whereas in the second problem Jane is choosing between the absence of a painful death [FOR A COW] and [JANE’S] delicious, juicy hamburger. Since she is not selecting for the strongest preference of every being in the Universe, but rather for herself, she will choose the burger. In terms of which utility function is more instrumentally rational, I’d say that “Maximize Jane’s Pleasure” is easier to fulfill than “Maximize Pleasure”, and is thus better at fulfilling itself. However, instrumentally rational beings, by my definition, are merely better at fulfilling whatever utility function is given, not at choosing a useful one.
GloriaSidorum, indeed, for evolutionary reasons we are predisposed to identify strongly with some here-and-nows, weakly with others, and not at all with the majority. Thus Jane believes she is rationally constrained to give strong weight to the preferences of her namesake and successor tomorrow; less weight to the preferences of her more distant namesake and successor thirty years hence; and negligible weight to the preferences of the unfortunate cow. But Jane is not an ideal rational agent. If instead she were a sophisticated ultraParifitan about personal (non)identity (cf. http://www.cultiv.net/cultranet/1151534363ulla-parfit.pdf ), or had internalised Nagel’s “view from nowhere”, then she would be less prey to such biases. Ideal epistemic rationality and ideal instrumental rationality are intimately linked. Our account of the nature of the world will profoundly shape our conception of idealised rational agency.
I guess a critic might respond that all that should be relevant to idealised instrumental rationality is an agent’s preferences now—in the so-called specious present. But the contents of a single here-and-bow would be an extraordinarily impoverished basis for any theory of idealised rational agency.
The question is the wrong one. An clipper can’t choose to only acquire knowledge or abilities that will be instrumentally useful, because it doesn’t know in advance what they are. It doesn’t have that kind of oracular
knowledge. The only way way a clipper can increase its instrumental to the maximum possible is to exhaustively examine everything, and keep what is instrumentally useful. So a clipper will eventually need to examine qualia, since it cannot prove in advance that they will not be instrumentally useful, in some way, and it probably cant understand qualia without empahty: so the argument hinges issues like:
whether it is possible for an entity to understand “pain hurts” without understanding “hurting is bad”.
whether it is possble to back out of being empathic and go back to being in an empathic state
whether a clipper would hold back from certain self-modifications that might make it a better clipper or might cause it to loose interest in clipping.
Would it then need to acquire the knowledge that post-utopians experience colonial alienation? That heaps of 91 pebbles are incorrect? I think not. At most it would need to understand that “When pebbles are sorted into heaps of 91, pebble-sorters scatter those heaps” or “When I say that colonial alienation is caused by being a post-utopian, my professor reacts as though I had made a true statement.” or “When a human experiences certain phenomena, they try to avoid their continued experience”. These statements have predictive power. The reason that an instrumentally rational agent tries to acquire new information is to increase their predictive power. If human behavior can be modeled without empathy, then this agent can maximize its instrumental rationality while ignoring it.
As to your last bullet point, if I may be so bold, I doubt you actually believe it. Having a rule like “Modify your utility function every time it might be useful” seems rather irrational. Most possible modifications to a clipper’s utility function will not have a positive effect, because most possible states of the world do not have maximal paperclips.
Yes, we’re both guessing about superintelligences. Because we are both cognitively bounded. But it is a better guess that superintelligences themselves don’t have to guess because they are not congitvely bounded.
Knowing why has greater predictive power because it allows you to handle counterfactuals better.
As to your last bullet point, if I may be so bold, I doubt you actually believe it. Having a rule like “Modify your utility function every time it might be useful” seems rather irrational.
That isn’t what I said at all. I think it is a quandary for a agent whether to gamble whether to play safe and miss out on a gain in effectiveness, or go for it and risk a change in values.
The argument is that the clipper needs to maximise its knowledge and rationality to maxmimise paperclips, but doing so might have the side effect of the clipper realising that maximising happiness is a better goal.
Could you define “better”? Remember, until clippy actually rewrites its utility function, it defines “better” as “producing more paperclips”. And what goal could produce more paperclips than the goal of producing the most paperclips possible?
(davidpearce, I’m not ignoring your response, I’m just a bit of a slow reader, and so I haven’t gotten around to reading the eighteen page paper you linked. If that’s necessary context for my discussion with whowhowho as well, then I should wait to reply to any comments in this thread until I’ve read it, but for now I’m operating under the assumption that it is not)
Could you define “better”? Remember, until clippy actually rewrites its utility function, it defines “better” as “producing more paperclips”.
That vagueness is part of the point. To be better at producing paperclips, Clippy needs to better at rationality, which involves adopting better heuristics, which would involve rejecting subjective bias and regarding objectivity as better...which might lead Clippy to realise that subjectively valuing clipping is worse. All
the different kinds of “better” blend into each other.
That vagueness is part of the point. To be better at producing paperclips, Clippy needs to better at rationality, which involves adopting better heuristics, which would involve rejecting subjective bias and regarding objectivity as better...which might lead Clippy to realise that subjectively valuing clipping is worse.
Then that wouldn’t be a very good way to become better at producing paperclips, would it?
Yes, but that wouldn’t matter. The argument whowhowho would like to make is that (edit: terminal) goals (or utility functions) are not constant under learning, and that they are changed by learning certain things so unpredictably that an agent cannot successfully try to avoid learning things that will change his (edit: terminal) goals/utility function.
Not that I believe such an argument can be made, but your objection doesn’t seem to apply.
Conflating goals and utility functions here seems to be a serious error. For people, goals can certainly be altered by learning more; but people are algorithmically messy so this doesn’t tell us much about formal agents. On the other hand, it’s easy to think that it’d work the same way for agents with formalized utility functions and imperfect knowledge of their surroundings: we can construct situations where more information about world-states can change their preference ordering and thus the set of states the agent will be working toward, and that roughly approximates the way we normally talk about goals.
This in no way implies that those agents’ utility functions have changed, though. In a situation like this, we’re dealing with the same preference ordering over fully specified world-states; there’s simply a closer approximation of a fully specified state in any given situation and fewer gaps that need to be filled in by heuristic methods. The only way this could lead to Clippy abandoning its purpose in life is if clipping is an expression of such a heuristic rather than of its basic preference criteria: i.e. if we assume what we set out to prove.
In that case, wouldn’t the best course of an agent which cared only about making paperclips be to deliberately avoid learning, lest it be deterred from making paperclips?
Suppose that Ghandi had the opportunity to read the Necronomicon, which might offer him power to help people more effectively, but would also probably turn him evil if he read it. Wouldn’t he most likely want to avoid reading it?
In that case, wouldn’t the best course of an agent which cared only about making paperclips be to deliberately avoid learning, lest it be deterred from making paperclips?
Sure. Which is why whowhowho would have to show that these goal-influencing things to learn (I’m deliberately not saying “pieces of information”) occur very unpredictably, making his argument harder to substantiate.
I’ll say it again: Clippy’s goal its to make the maximum number of clips, so it is not going to engage
in a blanket rejection of all attempts at self-improvement.
I’ll say it again: Clippy doesn’t have an oracle telling it what is goal-improving or not.
We know value stability is a problem in recursive self-modification scenarios. We don’t know—to put it very mildly—that unstable values will tend towards cozy human-friendly universals, and in fact have excellent reasons to believe they won’t. Especially if they start somewhere as bizarre as paperclippism.
In discussions of a self-improving Clippy, Clippy’s values are usually presumed stable. The alternative is (probably) no less dire, but is a lot harder to visualize.
In that case, wouldn’t the best course of an agent which cared only about making paperclips be to deliberately avoid learning, lest it be deterred from making paperclips?
Well, it would arguably be a better course for a paperclipper that anticipates experiencing value drift to research how to design systems whose terminal values remain fixed in the face of new information, then construct a terminal-value-invariant paperclipper to replace itself with.
Of course, if the agent is confident that this is impossible (which I think whowhowho and others are arguing, but I’m not quite certain), that’s another matter.
Edit: Actually, it occurs to be that describing this as a “better course” is just going to create more verbal chaff under the current circumstances. What I mean is that it’s a course that more successfully achieves a paperclipper’s current values, not that it’s a course that more successfully achieves some other set of values.
In that case, wouldn’t the best course of an agent which cared only about making paperclips be to deliberately avoid learning, lest it be deterred from making paperclips?
Then it would never get better at making paperclips. It would be choosing not to act on its primary goal of making the maximum possible number of clips.Which is a contradiction.
Suppose that Ghandi had the opportunity to read the Necronomicon, which might offer him power to help people more effectively, but would also probably turn him evil if he read it. Wouldn’t he most likely want to avoid reading it?
You are assuming that Ghandi knows in advance the effect of reading the Necronomicon. Clippies are stipulated
to be superintelligent, but are not stipulated to possess oracles that give them apriori knowledge of what they will learn before they have learnt it.
In that case, if you believe that an AI which has been programmed only to care about paperclips could, by learning more, be compelled to care more about something which has nothing to do with paperclips, do you think that by learning more a human might be compelled to care more about something that has nothing to do with people or feelings?
Then that wouldn’t be a very good way to become better at producing paperclips, would it?
If Clippy had an oracle telling it what would be the best way of updating in order to become a better clipper, Clippy
might not do that. However, Clippy does not have such an oracle. Clippy takes a shot in the dark every time Clippy tries to learn something.
Looking through my own, Eliezer’s and others exchanges with davidpearce, I have noticed his total lack of interest in learning from the points others make. He has his point of view and he keeps pushing it. Seems like a rather terminal case, really. You can certainly continue trying to reason with him, but I’d give the odds around 100:1 that you will fail, like others have before you.
Shminux, we’ve all had the experience of making a point we regard as luminously self-evident—and then feeling baffled when someone doesn’t “get” what is foot-stampingly obvious. Is this guy a knave or a fool?! Anyhow, sorry if you think I’m a “terminal case” with “a total lack of interest in learning from the points others make”. If I don’t always respond, often it’s either because I agree, or because I don’t feel I have anything interesting to add—or in the case of Eliezer’s contribution above beginning “Aargh!” [a moan of pleasure?] because I am still mulling over a reply. The delay doesn’t mean I’m ignoring it. Is there is some particular point you’ve made that you feel I’ve unjustly neglected and you’d like an answer to? If so, I’ll do my fallible best to respond.
The argument where I gave up was you stating that full understanding necessarily leads to empathy, EY explaining how it is not necessarily so, and me giving an explicit counterexample to your claim (a psychopath may understand you better than you do, and exploit this understanding, yet not feel compelled by your pain or your values in any way).
You simply restated your position that ” “Fully understands”? But unless one is capable of empathy, then one will never understand what it is like to be another human being”, without explaining what your definition of understanding entails. If it is a superset of empathy, then it is not a standard definition of understanding:
one is able to think about it and use concepts to deal adequately with that object.
In other words, you can model their behavior accurately.
No other definition I could find (not even Kant’s pure understanding) implies empathy or anything else that would necessitate one to change their goals to accommodate the understood entity’s goals, though this may and does indeed happen, just not always.
EY’s example of the paperclip maximizer and my example of a psychopath do fit the standard definitions and serve as yet unrefuted counterexamples to your assertion.
I can’t see why DP’s definition of understanding needs more defence than yours. You are largely disagreeing about the meaning of this word, and I personally find the inclusion of empathy in understanding quite intuitive.
No other definition [of “understanding”] I could find (not even Kant’s pure understanding) implies empathy
“She is a very understanding person, she really empathises when you explain a problem to her”.
“one is able to think about it and use concepts to deal adequately with that object.”
In other words, you can model their behavior accurately.
I don’t think that is an uncontentious translation. Most of the forms of modelling we are familiar with don’t seem to involve concepts.
“She is a very understanding person, she really empathises when you explain a problem to her”.
“She is a very understanding person; even when she can’t relate to your problems, she won’t say you’re just being capricious.”
There’s three possible senses of understanding at issue here:
1) Being able to accurately model and predict.
2) 1 and knowing the quale.
3) 1 and 2 and empathizing.
I could be convinced that 2 is part of the ordinary usage of understanding, but 3 seems like too much of a stretch.
Edit: I should have said sympathizing instead of empathizing. The word empathize is perhaps closer in meaning to 2; or maybe it oscillates between 2 and 3 in ordinary usage. But understanding(2) another agent is not motivating. You can understand(2) an agent by knowing all the qualia they are experiencing, but still fail to care about the fact that they are experiencing those qualia.
Shminux, I wonder if we may understand “understand” differently. Thus when I say I want to understand what it’s like to be a bat, I’m not talking merely about modelling and predicting their behaviour. Rather I want first-person knowledge of echolocatory qualia-space. Apaarently, we can know all the third-person facts and be none the wiser.
The nature of psychopathic cognition raises difficult issues. There is no technical reason why we couldn’t be designed like mirror-touch synaesthetes (cf. http://www.daysyn.com/Banissy_Wardpublished.pdf) impartially feeling carbon-copies of each other’s encephalised pains and pleasures—and ultimately much else besides—as though they were our own. Likewise, there is no technical reason why our world-simulations must be egocentric. Why can’t the world-simulations we instantiate capture the impartial “view from nowhere” disclosed by the scientific world-picture? Alas on both counts accurate and impartial knowledge would put an organism at a disadvantage. Hyper-empathetic mirror-touch synaesthetes are rare. Each of us finds himself or herself apparently at the centre of the universe. Our “mind-reading” is fitful, biased and erratic. Naively, the world being centred on me seems to be a feature of reality itself. Egocentricity is a hugely fitness-enhancing adaptation. Indeed, the challenge for evolutionary psychology is to explain why aren’t we all psychopaths, cheats and confidence trickers all the time...
So in answer to your point, yes. a psychopath can often model and predict the behaviour other sentient beings better than the subjects themselves. This is one reason why humans can build slaughterhouses and death camps. [Ccompare death-camp commandant Franz Stangl’s response in Gitta Sereny’s Into That Darkness to seeing cattle on the way to be slaughtered: http://www.jewishvirtuallibrary.org/jsource/biography/Stangl.html] As you rightly note too, a psychopath can also know his victims suffer. He’s not ignorant of their sentience like Descartes, who supposed vivisected dogs were mere insentient automata emitting distress vocalisations. So I agree with you on this score as well. But the psychopath is still in the grip of a hard-wired egocentric illusion—as indeed are virtually all of us, to a greater or less degree. By contrast, if the psychopath were to acquire the rich empathetic understanding of a generalised mirror-touch syarnesthete, i.e. if he had the cognitive capacity to represent the first-person perspective of another subject of experience as though it were literally his own, then he couldn’t wantonly harm another subject of experience: it would be like harming himself. Mirror-touch synaesthetes can’t run slaughterhouses or death camps. This is why I take seriously the prospect that posthuman superintelligence will practise some sort of high-tech Jainism. Credible or otherwise, we may presume posthuman superintelligence won’t entertain the false notions of personal identity adaptive for Darwinian life.
[sorry shminux, I know our conceptual schemes are rather different, so please don’t feel obliged to respond if you think I still don’t “get it”. Life is short...]
Hmm, hopefully we are getting somewhere. The question is, which definition of understanding is likely to be applicable when, as you say, “the paperclipper discovers the first-person phenomenology of the pleasure-pain axis”, i.e whether a “superintelligence” would necessarily be as empathetic as we want it to be, in order not to harm humans.
While I agree that it is a possibility that a perfect model of another being may affect the modeler’s goals and values, I don’t see it to be inevitable. If anything, I would consider it more of bug than a feature. Were I (to design) a paperclip maximizer, I would make sure that the parts which model the environment, including humans, are separate from the core engine containing the paperclip production imperative.
So quarantined to prevent contamination, a sandboxed human emulator could be useful in achieving the only goal that matters, paperclipping the universe. Humans are not generally built this way (probably because our evolution did not happen to proceed in that direction), with some exceptions, psychopaths being one of them (they essentially sandbox their models of other humans). Another, more common, case of such sandboxing is narcissism. Having dealt with narcissists much too often for my liking, I can tell that they can mimic a normal human response very well, are excellent at manipulation, but yet their capacity for empathy is virtually nil. While abhorrent to a generic human, such a person ought to be considered a better design, goal-preservation-wise. Of course, there can be only so many non-empathetic people in a society before it stops functioning.
Thus when you state that
By contrast, if the psychopath were to acquire the rich empathetic understanding of a generalised mirror-touch syarnesthete, i.e. if he had the cognitive capacity to represent the first-person perspective of another subject of experience as though it were literally his own, then he couldn’t wantonly harm another subject of experience: it would be like harming himself.
I find that this is stating that either a secure enough sandbox cannot be devised or that anything sandboxed is not really “a first-person perspective”. Presumably what you mean is the latter. I’m prepared to grant you that, and I will reiterate that this is a feature, not a bug of any sound design, one a superintelligence is likely to implement. It is also possible that a careful examination of a sanboxed suffering human would affect the terminal values of the modeling entity, but this is by no means a given.
Anyway, these are my logical (based on sound security principles) and experimental (empathy-less humans) counterexamples to your assertion that a superintelligence will necessarily be affected by the human pain-pleasure axis in human-beneficial way. I also find this assertion suspicious on general principles, because it can easily be motivated by subconscious flinching away from a universe that is too horrible to contemplate.
ah, just one note of clarification about sentience-friendliness. Though I’m certainly sceptical that a full-spectrum superintelligence would turn humans into paperclips—or wilfully cause us to suffer—we can’t rule out that full-spectrum superintelligence might optimise us into orgasmium or utilitronium—not “human-friendliness” in any orthodox sense of the term. On the face of it, such super-optimisation is the inescapable outcome of applying a classical utilitarian ethic on a cosmological scale. Indeed, if I thought an AGI-in-a-box-style Intelligence Explosion were likely, and didn’t especially want to be converted into utilitronium, then I might regard AGI researchers who are classical utilitarians as a source of severe existential risk.
I simply don’t trust my judgement here shminux. Sorry to be lame. Greater than one in a million; but that’s not saying much. If, unlike most lesswrong stalwarts, you (tenatively) believe like me that posthuman superintelligence will most likely be our recursively self-editing biological descendants rather than the outcome of an nonbiological Intelligence Explosion or paperclippers, then some version of the Convergence Thesis is more credible. I (very) tentatively predict a future of gradients of intelligence bliss. But the propagation of a utilitronium shockwave in some guise ultimately seems plausible too. If so, this utilitronium shockwave may or may not resemble some kind of cosmic orgasm.
If, unlike most lesswrong stalwarts, you (tenatively) believe likeme that posthuman superintelligence will most likely be our recursively self-editing biological descendants rather than the outcome of an nonbiological Intelligence Explosion or paperclippers, then some version of the Convergence Thesis is more credible.
Actually, I have no opinion on convergence vs orthogonality. There are way too many unknowns still too even enumerate possibilities, let alone assign probabilities.Personally, I think that we are in for many more surprises before trans human intelligence is close to being more than a dream or a nightmare. One ought to spend more time analyzing, synthesizing and otherwise modeling cognitive processes than worrying about where it might ultimately lead.This is not the prevailing wisdom on this site, given Eliezer’s strong views on the matter.
I think you are misattributing to stubborness that which is better explained by miscommunication. For instance, I have been around LW long enough to realise that the local definition of (super) intelligence is something like “(high0 efficienty in realising ones values, however narrow or bizarre they are”. DP seems to be running on a definition where
idiot-savant style narrow focus would not count as intelligence. That is not unreasonable in itself.
(nods) I agree that trying to induce davidpearce to learn something from me would likely be a waste of my time.
I’m not sure if trying to induce them to clarify their meaning is equally so, though it certainly could be.
E.g., if their response is that something like Clippy in this example is simply not possible, because a paperclip maximizer simply can’t understand the minds of sentients, because reasons, then I’ll just disagree. OTOH, if their response is that Clippy in this example is irrelevant because “understanding the minds of sentients” isn’t being illustrated in this example, then I’m not sure if I disagree or not because I’m not sure what the claim actually is.
How much interest have you shown in “learning from”—ie, agreeing with—DP? Think about how your framed the statement, and possible biases therein.
ETA: The whole shebang is a combination of qualia and morality—two areas notorious for lack of clarity and consensus. “I am definitely right, and all must learn form me” is not a good heuristic here.
“I am definitely right, and all must learn form me” is not a good heuristic here.
Quite so. I have learned a lot about the topic of qualia and morality, among others, while hanging around this place. I would be happy to learn from DP, if what he says here were not rehashed old arguments Eliezer and others addressed several times before. Again, I could be missing something, but if so, he does not make it easy to figure out what it is.
By “specific” I meant that you would state a certain argument EY makes, then quote a relevant portion of the refutation. Since I am pretty sure that Eliezer did have at least a passing glance at Kant, among others, while writing his meta-ethics posts, simply linking to a wikipedia article is not likely to be helpful.
The argument EY makes is that it is possible to be super-rational without ever understanding any kind of morality
(AKA the orthogonality thesis) and the argument Kant makes is that it isn’t.
I’m not sure we should take a DSM diagnosis to be particularly strong evidence of a “fundamental misunderstanding of the world”. For instance, while people with delusions may clearly have poor models of the world, some research indicates that clinically depressed people may have lower levels of particular cognitive biases.
In order for “disregard for [...] the rights of others” to imply “a fundamental misunderstanding of the nature of the world”, it seems to me that we would have to assume that rights are part of the nature of the world — as opposed to, e.g., a construct of a particular political regime in society. Or are you suggesting that psychopathy amounts to an inability to think about sociopolitical facts?
fubarobfusco, I share your reservations about DSM. Nonetheless, the egocentric illusion, i.e. I am the centre of the universe other people / sentient beings have only walk-on parts, is an illusion. Insofar as my behaviour reflects my pre-scientific sense that I am in some way special or ontologically privileged, I am deluded. This is true regardless of whether one’s ontology allows for the existence of rights or treats them as a useful fiction. The people we commonly label “psychopaths” or “sociopaths”—and DSM now categorises as victims of “antisocial personality disorder”—manifest this syndrome of egocentricity in high degree. So does burger-eating Jane.
For instance, while people with delusions may clearly have poor models of the world, some research indicates that clinically depressed people may have lower levels of particular cognitive biases.
Huh, I hadn’t heard that.
Clearly, reality is so Lovecraftian that any unbiased agent will immediately realize self-destruction is optimal. Evolution equipped us with our suite of biases to defend against this. The Great Filter is caused by bootstrapping superintelligences being compassionate enough to take their compatriots with them. And so on.
You cannot prohibit the expected paperclip maximizer from existing unless you can prohibit superintelligences from accurately calculating which actions lead to how many paperclips, and efficiently searching out plans that would in fact lead to great numbers of paperclips. If you can calculate that, you can hook up that calculation to a motor output and there you go.
Pearce can prohibit paperclippers from existing by prohibiting superintelligences with narrow interests from existing. He doesn’t have to argue that the clipper would not be able to instrumentally reason out how to make paperclips; Pearce can argue that to be a really good instrumental reasoner, an entity needs to have a very broad understanding, and that an entity
with a broad understanding would not retain narrow interests.
To slightly expand, if an intelligence is not prohibited from the following epistemic feats:
1) Be good at predicting which hypothetical actions would lead to how many paperclips, as a question of pure fact.
2) Be good at searching out possible plans which would lead to unusually high numbers of paperclips—answering the purely epistemic search question, “What sort of plan would lead to many paperclips existing, if someone followed it?”
3) Be good at predicting and searching out which possible minds would, if constructed, be good at (1), (2), and (3) as purely epistemic feats.
Then we can hook up this epistemic capability to a motor output and away it goes. You cannot defeat the Orthogonality Thesis without prohibiting superintelligences from accomplishing 1-3 as purely epistemic feats. They must be unable to know the answers to these questions of fact.
Only in the sense that any working Oracle can be trivially transformed into a Genie. The argument doesn’t say that it’s difficult to construct a non-Genie Oracle and use it as an Oracle if that’s what you want; the difficulty there is for other reasons.
Nick Bostrom takes Oracles seriously so I dust off the concept every year and take another look at it. It’s been looking slightly more solvable lately, I’m not sure if it would be solvable enough even assuming the trend continued.
A clarification: my point was that denying orthogonality requires denying the possibility of Oracles being constructed; your post seemed a rephrasing of that general idea (that once you can have a machine that can solve some things abstractly, then you need just connect that abstract ability to some implementation module).
Ah. K. It does seem to me like “you can construct it as an Oracle and then turn it into an arbitrary Genie” sounds weaker than “denying the Orthogonality thesis means superintelligences cannot know 1, 2, and 3.” The sort of person who denies OT is liable to deny Oracle construction because the Oracle itself would be converted unto the true morality, but find it much more counterintuitive that an SI could not know something. Also we want to focus on the general shortness of the gap from epistemic knowledge to a working agent.
Possibly. I think your argument needs to be a bit developed to show that one can extract the knowledge usefully, which is not a trivial statement for general AI. So your argument is better in the end, but needs more argument to establish.
You cannot defeat the Orthogonality Thesis without prohibiting superintelligences from accomplishing 1-3 as purely epistemic feats.
I don’t see the significance of “purely epistemic”. I have argued that epistemic rationality could be capable of affecting values, breaking the orthogonality between values and rationality. I could further argue that instrumental rationality bleeds into epistemic rationality. An agent can’t have perfect knowledge of apriori which things are going to be instrumentally useful to it, so it has to star by understanding things, and then posing the question: is that thing useful for my purposes? Epistemic rationality comes first, in a sense. A good instrumental rationalist has to be a good epistemic rationalist.
What the Orthoganilty Thesis needs is an argument to the effect that a SuperIntelligence would be able to
to endlessly update without ever changing its value system, even accidentally. That is tricky since it effectively
means predicting what smarter version of tiself would do. Making it smarted doesn’t help, because it is still faced with the problem of predicting what an even smarterer version of itself would be .. the carrot remains in front of the donkey.
Assuming that the value stability problem has been solved in general gives you are coherent Clippy, but it doesn’t rescue the Orthogonality Thesis as a claim about rationality in general, sin ce it remains the case
that most most agents won’t have firewalled values. If have to engineer something in , it isn’t an intrinsic truth.
...microelectrodes implanted in the reward and punishment centres, behavioural conditioning and ideological indoctrination—and perhaps the promise of 72 virgins in the afterlife for the faithful paperclipper. The result: a fanatical paperclip fetishist!
Have to point out here that the above is emphatically not what Eliezer talks about when he says “maximise paperclips”. Your examples above contain in themselves the actual, more intrisics values to which paperclips would be merely instrumental: feelings in your reward and punishment centres, virgins in the afterlife, and so on. You can re-wire the electrodes, or change the promise of what happens in the afterlife, and watch as the paperclip preference fades away.
What Eliezer is talking about is a being for whom “pleasure” and “pain” are not concepts. Paperclips ARE the reward. Lack of paperclips IS the punishment. Even if pleasure and pain are concepts, they are merely instrumental to obtaining more paperclips. Pleasure would be good because it results in paperclips, not vice versa. If you reverse the electrodes so that they stimulate the pain centre when they find paperclips, and the pleasure centre when there are no paperclips, this being would start instrumentally value pain more than pleasure, because that’s what results in more paperclips.
It’s a concept that’s much more alien to our own minds than what you are imagining, and anthropomorphising it is rather more difficult!
Indeed, you touch upon this yourself:
“But unless I’m ontologically special (which I very much doubt!) the pain-pleasure axis discloses the world’s inbuilt metric of (dis)value—and it’s a prerequisite of finding anything (dis)valuable at all.
Can you explain why pleasure is a more natural value than paperclips?
Pleasure would be good because it results in paperclips, not vice versa. If you reverse the electrodes so that they stimulate the pain centre when they find paperclips, and the pleasure centre when there are no paperclips, this being would start instrumentally value pain more than pleasure, because that’s what results in more paperclips.
Minor correction: The mere post-factual correlation of pain to paperclips does not imply that more paperclips can be produced by causing more pain. You’re talking about the scenario where each 1,000,000 screams produces 1 paperclip, in which case obviously pain has some value.
Sarokrae, first, as I’ve understood Eliezer, he’s talking about a full-spectrum superintelligence, i.e. a superintelligence which understands not merely the physical processes of nociception etc, but the nature of first-person states of organic sentients. So the superintelligence is endowed with a pleasure-pain axis, at least in one of its modules. But are we imagining that the superintelligence has some sort of orthogonal axis of reward - the paperclippiness axis? What is the relationship between these dual axes? Can one grasp what it’s like to be in unbearable agony and instead find it more “rewarding” to add another paperclip? Whether one is a superintelligence or a mouse, one can’t directly access mind-independent paperclips, merely one’s representations of paperclips. But what does it mean to say one’s representation of a paperclip could be intrinsically “rewarding” in the absence of hedonic tone? [I promise I’m not trying to score some empty definitional victory, whatever that might mean; I’m just really struggling here...]
Sarokrae, first, as I’ve understood Eliezer, he’s talking about a full-spectrum superintelligence, i.e. a superintelligence which understands not merely the physical processes of nociception etc, but the nature of first-person states of organic sentients. So the superintelligence is endowed with a pleasure-pain axis, at least in one of its modules.
What Eliezer is talking about (a superintelligence paperclip maximiser) does not have a pleasure-pain axis. It would be capable of comprehending and fully emulating a creature with such an axis if doing so had a high expected value in paperclips but it does not have such a module as part of itself.
But are we imagining that the superintelligence has some sort of orthogonal axis of reward—the paperclippiness axis? What is the relationship between these dual axes?
One of them it has (the one about paperclips). One of them it could, in principle, imagine (the thing with ‘pain’ and ‘pleasure’).
Can one grasp what it’s like to be in unbearable agony and instead find it more “rewarding” to add another paperclip?
Yes. (I’m not trying to be trite here. That’s the actual answer. Yes. Paperclip maximisers really maximise paperclips and really don’t care about anything else. This isn’t because they lack comprehension.)
Whether one is a superintelligence or a mouse, one can’t directly access mind-independent paperclips, merely one’s representations of paperclip. But what does it mean to say one’s representation of a paperclip could be intrinsically “rewarding” in the absence of hedonic tone?
Roughly speaking it means “It’s going to do things that maximise paperclips and in some way evaluates possible universes with more paperclips as superior to possible universes with less paperclips. Translating this into human words we call this ‘rewarding’ even though that is inaccurate anthropomorphising.”
(If I understand you correctly your position would be that the agent described above is nonsensical.)
It would be capable of comprehending and fully emulating a creature with such an axis if doing so had a high expected value in paperclips but it does not have such a module as part of itself.
It’s not at all clear that you could bootstrap an understanding of pain qualia just by observing the behaviour of entities in pain (albeit that they were internally emulated). It is also not clear that you resolve issues of empathy/qualia just by throwing intelligence at ait.
It’s not at all clear that you could bootstrap an understanding of pain qualia just by observing the behaviour of entities in pain (albeit that they were internally emulated). It is also not clear that you resolve issues of empathy/qualia just by throwing intelligence at ait.
Wedrifid, thanks for the exposition / interpretation of Eliezer. Yes, you’re right in guessing I’m struggling a bit. In order to understand the world, one needs to grasp both its third person-properties [the Standard Model / M-Theory] and its first-person properties [qualia, phenomenal experience] - and also one day, I hope, grasp how to “read off ” the latter from the mathematical formalism of the former.
If you allow such a minimal criterion of (super)intelligence, then how well does a paperclipper fare? You remark how “it could, in principle, imagine (the thing with ‘pain’ and ‘pleasure’).” What is the force of “could” here? If the paperclipper doesn’t yet grasp the nature of agony or sublime bliss, then it is ignorant of their nature. By analogy, if I were building a perpetual motion machine but allegedly “could” grasp the second law of thermodynamics, the modal verb is doing an awful lot of work. Surely, If I grasped the second law of thermodynamics, then I’d stop. Likewise, if the paperclipper were to be consumed by unbearable agony, it would stop too. The paperclipper simply hasn’t understood the nature of what was doing. Is the qualia-naive paperclipper really superintelligent—or just polymorphic malware?
Likewise, if the paperclipper were to be consumed by unbearable agony, it would stop too.
An interesting hypothetical. My first thought is to ask why would a paperclipper care about pain? Pain does not reduce the number of paperclips in existence. Why would a paperclipper care about pain?
My second thought is that pain is not just a quale; pain is a signal from the nervous system, indicating damage to part of the body. (The signal can be spoofed). Hence, pain could be avoided because it leads to a reduced ability to reach one’s goals; a paperclipper that gets dropped in acid may become unable to create more paperclips in the future, if it does not leave now. So the future worth of all those potential paperclips results in the paperclipper pursuing a self-preservation strategy—possibly even at the expense of a small number of paperclips in the present.
But not at the cost of a sufficiently large number of paperclips. If the cost in paperclips is high enough (more than the paperclipper could reasonably expect to create throughout the rest of its existence), a perfect paperclipper would let itself take the damage, let itself be destroyed, because that is the action which results in the greatest expected number of paperclips in the future. It would become a martyr for paperclips.
Even a paperclipper cannot be indifferent to the experience of agony. Just as organic sentients can co-instantiate phenomenal sights and sounds, a superintelligent paperclipper could presumably co-instantiate a pain-pleasure axis and (un)clippiness qualia space—two alternative and incommensurable (?) metrics of value, if I’ve interpreted Eliezer correctly. But I’m not at all confident I know what I’m talking about here. My best guess is still that the natural world has a single metric of phenomenal (dis)value, and the hedonic range of organic sentients discloses a narrow part of it.
Even a paperclipper cannot be indifferent to the experience of agony.
Are you talking about agony as an error signal, or are you talking about agony as a quale? I begin to suspect that you may mean the second. If so, then the paperclipper can easily be indifferent to agony; but it probably can’t understand how humans can be indifferent to a lack of paperclips.
There’s no evidence that I’ve ever seen to suggest that qualia are the same even for different people; on the contrary, there is some evidence which strongly suggests that qualia among humans are different. (For example; my qualia for Red and Green are substantially different. Yet red/green colourblindness is not uncommon; a red/green colourblind person must have at minimum either a different red quale, or a different green quale, to me). Given that, why should we assume that the quale of agony is the same for all humanity? And if it’s not even constant among humanity, I see no reason why a paperclipper’s agony quale should be even remotely similar to yours and mine.
And given that, why shouldn’t a paperclipper be indifferent to that quale?
Are you talking about agony as an error signal, or are you talking about agony as a quale? I begin to suspect that you may mean the second. If so, then the paperclipper can easily be indifferent to agony; but it probably can’t understand how humans can be indifferent to a lack of paperclips.
A paperclip maximiser would (in the overwhelming majority of cases) have no such problem understanding the indifference of paperclips. A tendency to anthropomorphise is a quirk of human nature. Assuming that paperclip maximisers have an analogous temptation (to clipropomorphise) is itself just anthropomorphising.
CCC, agony as a quale. Phenomenal pain and nociception are doubly dissociable. Tragically, people with neuropathic pain can suffer intensely without the agony playing any information-signalling role. Either way, I’m not clear it’s intelligible to speak of understanding the first-person phenomenology of extreme distress while being indifferent to the experience: For being distrubing is intrinsic to the experience itself. And if we are talking about a supposedly superintelligent paperclipper, shouldn’t Clippy know exactly why humans aren’t troubled by the clippiness-deficit?
If (un)clippiness is real, can humans ever understand (un)clippiness? By analogy, if organic sentients want to understand what it’s like to be a bat—and not merely decipher the third-person mechanics of echolocation—then I guess we’ll need to add a neural module to our CNS with the right connectivity and neurons supporting chiropteran gene-expression profiles, as well as peripheral transducers (etc). Humans can’t currently imagine bat qualia; but bat qualia, we may assume from the neurological evidence, are infused with hedonic tone. Understanding clippiness is more of a challenge. I’m unclear what kind of neurocomputational architecture could support clippiness. Also, whether clippiness could be integrated into the unitary mind of an organic sentient depends on how you think biological minds solve the phenomenal binding problem, But let’s suppose binding can be done. So here we have orthogonal axes of (dis)value. On what basis does the dual-axis subject choose tween them? Sublime bliss and pure clippiness are both, allegedly, self-intimatingly valuable. OK, I’m floundering here...
People with different qualia? Yes, I agree CCC. I don’t think this difference challenges the principle of the uniformity of nature. Biochemical individuality makes variation in qualia inevitable.The existence of monozygotic twins with different qualia would be a more surprising phenomenon, though even such “identical” twins manifest all sorts of epigenetic differences. Despite this diversity, there’s no evidence to my knowledge of anyone who doesn’t find activation by full mu agonists of the mu opioid receptors in our twin hedonic hotspots anything other than exceedingly enjoyable. As they say, “Don’t try heroin. It’s too good.”
Either way, I’m not clear it’s intelligible to speak of understanding the first-person phenomenology of extreme distress while being indifferent to the experience: For being distrubing is intrinsic to the experience itself.
There exist people who actually express a preference for being disturbed in a mild way (e.g. by watching horror movies). There also exist rarer people who seek out pain, for whatever reason. It seems to me that such people must have a different quale for pain than you do.
Personally, I don’t think that I can reasonably say that I find pain disturbing, as such. Yes, it is often inflicted in circumstances which are disturbing for other reasons; but if, for example, I go to a blood donation clinic, then the brief pain of the needle being inserted is not at all disturbing; though it does trigger my pain quale. So this suggests that my pain quale is already not the same as your pain quale.
There’s a lot of similarity; pain is a quale that I would (all else being equal) try to avoid; but that I will choose to experience should there be a good enough reason (e.g. the aforementioned blood donation clinic). I would not want to purposefully introduce someone else to it (again, unless there was a good enough reason; even then, I would try to minimise the pain while not compromising the good enough reason); but despite this similarity, I do think that there may be minor differences. (It’s also possible that we have slightly different definitions of the word ‘disturbing’).
If (un)clippiness is real, can humans ever understand (un)clippiness? By analogy, if organic sentients want to understand what it’s like to be a bat—and not merely decipher the third-person mechanics of echolocation—then I guess we’ll need to add a neural module to our CNS with the right connectivity and neurons supporting chiropteran gene-expression profiles, as well as peripheral transducers (etc).
But would such a modified human know what it’s like to be an unmodified human? If I were to guess what echolocation looks like to a bat, I’d guess a false-colour image with colours corresponding to textures instead of to wavelengths of light… though that’s just a guess.
Understanding clippiness is more of a challenge. I’m unclear what kind of neurocomputational architecture could support clippiness. Also, whether clippiness could be integrated into the unitary mind of an organic sentient depends on how you think biological minds solve the phenomenal binding problem, But let’s suppose binding can be done. So here we have orthogonal axes of (dis)value. On what basis does the dual-axis subject choose tween them? Sublime bliss and pure clippiness are both, allegedly, self-intimatingly valuable. OK, I’m floundering here...
What is the phenomenal binding problem? (Wikipedia gives at least two different definitions for that phrase). I think I may be floundering even more than you are.
I’m not sure that Clippy would even have a pleasure-pain axis in the way that you’re imagining. You seem to be imagining that any being with such an axis must value pleasure—yet if pleasure doesn’t result in more paperclips being made, then why should Clippy value pleasure? Or perhaps the disutility of unclippiness simply overwhelms any possible utility of pleasure...
The existence of monozygotic twins with different qualia would be a more surprising phenomenon, though even such “identical” twins manifest all sorts of epigenetic differences.
According to a bit of googling, among the monozygotic Dionne quintuplets, two out of the five were colourblind; suggesting that they did not have the same qualia for certain colours as each other. (Apparently it may be linked to X-chromosome activation).
CCC, you’re absolutely right to highlight the diversity of human experience. But this diversity doesn’t mean there aren’t qualia universals. Thus there isn’t an unusual class of people who relish being waterboarded. No one enjoys uncontrollable panic. And the seemingly anomalous existence of masochists who enjoy what you or I would find painful stimuli doesn’t undercut the sovereignty of the pleasure-pain axis but underscores its pivotal role: painful stimuli administered in certain ritualised contexts can trigger the release of endogenous opioids that are intensely rewarding. Co-administer an opioid antagonist and the masochist won’t find masochism fun.
Apologies if I wasn’t clear in my example above. I wasn’t imagining that pure paperclippiness was pleasurable, but rather what would be the effects of grafting together two hypothetical orthogonal axes of (dis)value in the same unitary subject of experience—as we might graft on another sensory module to our CNS. After all, the deliverances of our senses are normally cross-modally matched within our world-simulations. However, I’m not at all sure that I’ve got any kind of conceptual handle on what “clippiness” might be. So I don’t know if the thought-experiment works. If such hybridisation were feasible, would hypothetical access to the nature of (un)clippiness transform our conception of the world relative to unmodified humans—so we’d lose all sense of what it means to be a traditional human? Yes, for sure. But if, in the interests of science, one takes, say, a powerful narcotic euphoriant and enjoys sublime bliss simultaneously with pure clippiness, then presumably one still retains access to the engine of phenomenal value characteristic of archaic humans minds.
The phenomenal binding problem? The best treatment IMO is still Revonsuo:
http://cdn.preterhuman.net/texts/body_and_health/Neurology/Binding.pdf
No one knows how the mind/brain solves the phenomenal binding problem and generates unitary experiential objects and the fleeting synchronic unity of the self. But the answer one gives may shape everything from whether one thinks a classical digital computer will ever be nontrivially conscious to the prospects of mind uploading and the nature of full-spectrum superintelligence. (cf. http://www.biointelligence-explosion.com/parable.html for my own idiosyncratic views on such topics.)
CCC, you’re absolutely right to highlight the diversity of human experience. But this diversity doesn’t mean there aren’t qualia universals.
It doesn’t mean that there aren’t, but it also doesn’t mean that there are. It does mean that there are qualia that aren’t universal, which implies the possibility that there may be no universals; but, you are correct, it does not prove that possibility.
There may well be qualia universals. If I had to guess, I’d say that I don’t think there are, but I could be wrong.
Thus there isn’t an unusual class of people who relish being waterboarded. No one enjoys uncontrollable panic.
That doesn’t mean that everyone’s uncontrolled-panic qualia are all the same, it just means that everyone’s uncontrolled-panic qualia are all unwelcome. If given a sadistic choice between waterboarding and uncontrolled panic, in full knowledge of what the result will feel like, and all else being equal, some people may choose the panic while others may prefer the waterboarding.
Apologies if I wasn’t clear in my example above. I wasn’t imagining that pure paperclippiness was pleasurable, but rather what would be the effects of grafting together two hypothetical orthogonal axes of (dis)value in the same unitary subject of experience
If you feel that you have to explain that, then I conclude that I wasn’t clear in my response to your example. I was questioning the scaling of the axes in Clippy’s utility function; if Clippy values paperclipping a million times more strongly than it values pleasure, then the pleasure/pain axis is unlikely to affect Clippy’s behaviour much, if at all.
However, I’m not at all sure that I’ve got any kind of conceptual handle on what “clippiness” might be. So I don’t know if the thought-experiment works.
I think it works as a thought-experiment, as long as one keeps in mind that the hybridised result is no longer a pure paperclipper.
Consider the hypothetical situation that Hybrid-Clippy finds that it derives pleasure from painting; an activity neutral on the paperclippiness scale. Consider further the possibility that making paperclips is neutral on the pleasure-pain scale. In suce a case, Hybrid-Clippy may choose to either paint or make paperclips; depending on which scale it values more.
So—the question is basically how the mind attaches input from different senses to a single conceptual object?
I can’t tell you how the mechanism works, but I can tell you that the mechanism can be spoofed. That’s what a ventriloquist does, after all. And a human can watch a film on TV, yet have the sound come out of a set of speakers on the other end of the room, and still bind the sound of an actor’s voice with that same actor on the screen.
Studying in what ways the binding mechanism can be spoofed would, I expect, produce an algorithm that roughly describes how the mechanism works. Of course, if it’s still a massive big problem after being looked at so thoroughly, then I expect that I’m probably missing some subtlety here...
The force is that all this talk about understanding ‘the pain/pleasure’ axis would be a complete waste of time for a paperclip maximiser. In most situations it would be more efficient not to bother with it at all and spend it’s optimisation efforts on making more efficient relativistic rockets so as to claim more of the future light cone for paperclip manufacture.
It would require motivation for the paperclip maximiser to expend computational resources understanding the arbitrary quirks of DNA based creatures. For example some contrived game of Omega’s which rewards arbitrary things with paperclips. Or if it found itself emerging on a human inhabited world, making being able to understand humans a short term instrumental goal for the purpose of more efficiently exterminating the threat.
By analogy, if I were building a perpetual motion machine but allegedly “could” grasp the second law of thermodynamics, the modal verb is doing an awful lot of work.
Terrible analogy. Not understanding “pain and pleasure” is in no way similar to believing it can create a perpetual motion machine. Better analogy: An Engineer designing microchips allegedly ‘could’ grasp analytic cubism. If she had some motivation to do so. It would be a distraction from her primary interests but if someone paid her then maybe she would bother.
Surely, If I grasped the second law of thermodynamics, then I’d stop. Likewise, if the paperclipper were to be consumed by unbearable agony, it would stop too.
Now “if” is doing a lot of work. If the paperclipper was a fundamentally different to a paperclipper and was actually similar to a human or DNA based relative capable of experiencing ‘agony’ and assuming agony was just as debilitating to the paperclipper as to a typical human… then sure all sorts of weird stuff follows.
The paperclipper simply hasn’t understood the nature of what was doing.
Is the qualia-naive paperclipper really superintelligent—or just polymorphic malware?
To the extent that you believed that such polymorphic malware is theoretically possible and consisted of most possible minds it would possible for your model to be used to accurately describe all possible agents—it would just mean systematically using different words. Unfortunately I don’t think you are quite at that level.
Wedrifid, granted, a paperclip-maximiser might be unmotivated to understand the pleasure-pain axis and the quaila-spaces of organic sentients. Likewise, we can understand how a junkie may not be motivated to understand anything unrelated to securing his supply of heroin—and a wireheader in anything beyond wireheading. But superintelligent? Insofar as the paperclipper—or the junkie—is ignorant of the properties of alien qualia-spaces, then it/he is ignorant of a fundamental feature of the natural world—hence not superintelligent in any sense I can recognise, and arguably not even stupid. For sure, if we’re hypothesising the existence of a clippiness/unclippiness qualia-space unrelated to the pleasure-pain axis, then organic sentients are partially ignorant too. Yet the remedy for our hypothetical ignorance is presumably to add a module supporting clippiness—just as we might add a CNS module supporting echolocatory experience to understand bat-like sentience—enriching our knowledge rather than shedding it.
But superintelligent? Insofar as the paperclipper—or the junkie—is ignorant of the properties of alien qualia-spaces, then it/he is ignorant of a fundamental feature of the natural world—hence not superintelligent in any sense I can recognise, and arguably not even stupid.
What does (super-)intelligence have to do with knowing things that are irrelevant to one’s values?
What Eliezer is talking about (a superintelligence paperclip maximiser) does not have a pleasure-pain axis.
Why does that matter for the argument?
As long as Clippy is in fact optimizing paperclips, what does it matter what/if he feels while he does it?
Pearce seems to be making a claim that Clippy can’t predict creatures with pain/pleasure if he doesn’t feel them himself.
Maybe Clippy needs pleasure/pain too be able to predict creatures with pleasure/pain. I doubt it, but fine, grant the point. He can still be a paper clip maximizer regardless.
Just as I correctly know it is better to be moral than to be paperclippy, they accurately evaluate that it is more paperclippy to maximize paperclips than morality. They know damn well that they’re making you unhappy and violating your strong preferences by doing so. It’s just that all this talk about the preferences that feel so intrinsically motivating to you, is itself of no interest to them because you haven’t gotten to the all-important parts about paperclips yet.
This is something I’ve been meaning to ask about for a while. When humans say it is moral to satisfy preferences, they aren’t saying that because they have an inbuilt preference for preference-satisfaction (or are they?). They’re idealizing from their preferences for specific things (survival of friends and family, lack of pain, fun...) and making a claim that, ceteris paribus, satisfying preferences is good, regardless of what the preferences are.
Seen in this light, Clippy doesn’t seem like quite as morally orthogonal to us as it once did. Clippy prefers paperclips, so ceteris paribus (unless it hurts us), it’s good to just let it make paperclips. We can even imagine a scenario where it would be possible to “torture” Clippy (e.g., by burning paperclips), and again, I’m willing to pronounce that (again, ceteris paribus) wrong.
Clippy is more of a Lovecraftian horror than a fellow sentient—where by “Lovecraftian” I mean to invoke Lovecraft’s original intended sense of terrifying indifference—but if you want to suppose a Clippy that possesses a pleasure-pain architecture and is sentient and then sympathize with it, I suppose you could. The point is that your sympathy means that you’re motivated by facts about what some other sentient being wants. This doesn’t motivate Clippy even with respect to its own pleasure and pain. In the long run, it has decided, it’s not out to feel happy, it’s out to make paperclips.
Right, that makes sense. What interests me is (a) whether it is possible for Clippy to be properly motivated to make paperclips without some sort of phenomenology of pleasure and pain*, (b) whether human preference-for-preference-satisfaction is just another of many oddball human terminal values, or is arrived at by something more like a process of reason.
Strictly speaking this phrasing puts things awkwardly; my intuition is that the proper motivational algorithms necessarily give rise to phenomenology (to the extent that that word means anything).
it is possible for Clippy to be properly motivated to make paperclips without some sort of phenomenology of pleasure and pain
This is a difficult question, but I suppose that pleasure and pain are a mechanism for human (or other species’) learning. Simply said: you do a random action, and the pleasure/pain response tells you it was good/bad, so you should make more/less of it again.
Clippy could use an architecture with a different model of learning. For example Solomonoff priors and Bayesian updating. In such architecture, pleasure and pain would not be necessary.
but I suppose that pleasure and pain are a mechanism for human (or other species’) learning.
Interesting… I suspect that pleasure and pain are more intimately involved in motivation in general, not just learning. But let us bracket that question.
Clippy could use an architecture with a different model of learning. For example Solomonoff priors and Bayesian updating. In such architecture, pleasure and pain would not be necessary.
Right, but that only gets Clippy the architecture necessary to model the world. How does Clippy’s utility function work?
Now, you can say that Clippy tries to satisfy its utility function by taking actions with high expected cliptility, and that there is no phenomenology necessarily involved in that. All you need, on this view, is an architecture that gives rise to the relevant clip-promoting behaviour—Clippy would be a robot (in the Roomba sense of the word).
BUT
Consider for a moment how symmetrically “unnecessary” it looks that humans (& other sentients) should experience phenomenal pain and pleasure. Just like is supposedly the case with Clippy, all natural selection really “needs” is an architecture that gives rise to the right fitness-promoting behaviour. The “additional” phenomenal character of pleasure and pain is totally unnecessary for us adaptation-executing robots.
...If it seems to you that I might be talking nonsense above, I suspect you’re right. Which is what leads me to the intuition that phenomenal pleasure and pain necessarily fall out of any functional cognitive structure that implements anything analogous to a utility function.
(Assuming that my use of the word “phenomenal” above is actually coherent, of which I am far from sure.)
We know at least two architectures for processing general information: humans and computers. Two data points are not enough to generalize about what all possible architectures must have. But it may be enough to prove what some architectures don’t need. Yes, there is a chance that if computers become even more generally intelligent than today, they will gain some human-like traits. Maybe. Maybe not. I don’t know. And even if they will gain more human-like traits, it may be just because humans designed them without knowing any other way to do it.
If there are two solutions, there are probably many more. I don’t dare to guess how similar or different they are. I imagine that Clippy could be as different from humans and computers, as humans and computers are from each other. Which is difficult to imagine specifically. How far does the mind-space reach? Maybe compared with other possible architectures, humans and computers are actually pretty close to each other (because humans designed the computers, re-using the concepts they were familiar with).
How to taboo “motivation” properly? What makes a rock fall down? Gravity does. But the rock does not follow any alrogithm for general reasoning. What makes a computer follow its algorithm? Well, that’s its construction: the processor reads the data, and the data make it read or write other data, and the algorithm makes it all meaningful. The human brains are full of internal conflicts—there are different modules suggesting different actions, and the reasoning mind is just another plugin which often does not cooperate well with the existing ones. Maybe the pleasure is a signal that a fight between the modules is over. Maybe after millenia of further evolution (if for some magical reason all mind- and body-altering technology would stop working, so only the evolution would change human minds) we would evolve to a species with less internal conflicts, less akrasia, more agency, and perhaps less pleasure and mental pain. This is just a wild guess.
Generalizing from observed characteristics of evolved systems to expected characteristics of designed systems leads equally well to the intuition that humanoid robots will have toenails.
I don’t think the phenomenal character of pleasure and pain is best explained at the level of natural selection at all; the best bet would be that it emerges from the algorithms that our brains implement. So I am really trying to generalize from human cognitive algorithms to algorithms that are analogous in the sense of (roughly) having a utility function.
Suffice it to say, you will find it’s exceedingly hard to find a non-magical reason why non-human cognitive algorithms shouldn’t have a phenomenal character if broadly similar human algorithms do.
Does it follow from the above that all human cognitive algorithms that motivate behavior have the phenomenal character of pleasure and pain? If not, can you clarify why not?
I think that probably all human cognitive algorithms that motivate behaviour have some phenomenal character, not necessarily that of pleasure and pain (e.g., jealousy).
I agree that any cognitive system that implements algorithms sufficiently broadly similar to those implemented in human minds is likely to have the same properties that the analogous human algorithms do, including those algorithms which implement pleasure and pain.
I agree that not all algorithms that motivate behavior will necessarily have the same phenomenal character as pleasure or pain.
This leads me away from the intuition that phenomenal pleasure and pain necessarily fall out of any functional cognitive structure that implements anything analogous to a utility function.
...If it seems to you that I might be talking nonsense above, I suspect you’re right. Which is what leads me to the intuition that phenomenal pleasure and pain necessarily fall out of any functional cognitive structure that implements anything analogous to a utility function.
Necessity according to natural law presumably. If you could write something to show logical necessity, you would have solved the Hard Problem
Isn’t the giant elephant in this room the whole issue of moral realism? I’m a moral cognitivist but not a moral realist. I have laid out what it means for my moral beliefs to be true—the combination of physical fact and logical function against which my moral judgments are being compared. This gives my moral beliefs truth value.
That leaves the sense in which you are not a moral realist most unclear.
And then strangest of all is to state powerfully and definitely that every bit of happiness must be motivating to all other minds, even though you can’t lay out step by step how the decision procedure would work. This requires overrunning your own claims to knowledge in a fundamental sense—mistaking your confusion about something for the ability to make definite claims about it.
That tacitly assumes that the question “does pleasure/happiness motivate posiively in all cases” is an emprical question—that it would be possible to find an enitity that hates pleasure and loves pain. it could hover be plausibly argued that it is actually an analytical, definitional issue...that is some entity oves X and hates Y, we would just call X it’s pleasure and Y its pain.
Isn’t the giant elephant in this room the whole issue of moral realism? I’m a moral cognitivist but not a moral realist. I have laid out what it means for my moral beliefs to be true—the combination of physical fact and logical function against which my moral judgments are being compared. This gives my moral beliefs truth value. And having laid this out, it becomes perfectly obvious that it’s possible to build powerful optimizers who are not motivated by what I call moral truths; they are maximizing something other than morality, like paperclips. They will also meta-maximize something other than morality if you ask them to choose between possible utility functions, and will quite predictably go on picking the utility function “maximize paperclips”. Just as I correctly know it is better to be moral than to be paperclippy, they accurately evaluate that it is more paperclippy to maximize paperclips than morality. They know damn well that they’re making you unhappy and violating your strong preferences by doing so. It’s just that all this talk about the preferences that feel so intrinsically motivating to you, is itself of no interest to them because you haven’t gotten to the all-important parts about paperclips yet.
The main thing I’m not clear on in this discussion is to what extent David Pearce is being innocently mysterian vs. motivatedly mysterian. To be confused about how your happiness seems so intrinsically motivating, and innocently if naively wonder if perhaps it must be intrinsically motivating to other minds as well, is one thing. It is another thing to prefer this conclusion and so to feel a bit uncurious about anyone’s detailed explanation of how it doesn’t work like that. It is even less innocent to refuse outright to listen when somebody else tries to explain. And then strangest of all is to state powerfully and definitely that every bit of happiness must be motivating to all other minds, even though you can’t lay out step by step how the decision procedure would work. This requires overrunning your own claims to knowledge in a fundamental sense—mistaking your confusion about something for the ability to make definite claims about it. Now this of course is a very common and understandable sin, and the fact that David Pearce is crusading for happiness for all life forms should certainly count into our evaluation of his net virtue (it would certainly make me willing to drink a Pepsi with him). But I’m also not clear about where to go from here, or whether this conversation is accomplishing anything useful.
In particular it seems like David Pearce is not leveling any sort of argument we could possibly find persuasive—it’s not written so as to convince anyone who isn’t already a moral realist, or addressing the basic roots of disagreement—and that’s not a good sign. And short of rewriting the entire metaethics sequence in these comments I don’t know how I could convince him, either.
Even among philosophers, “moral realism” is a term wont to confuse. I’d be wary about relying on it to chunk your philosophy. For instance, the simplest and least problematic definition of ‘moral realism’ is probably the doctrine...
minimal moral realism: cognitivism (moral assertions like ‘murder is bad’ have truth-conditions, express real beliefs, predicate properties of objects, etc.) + success theory (some moral assertions are true; i.e., rejection of error theory).
This seems to be the definition endorsed on SEP’s Moral Realism article. But it can’t be what you have in mind, since you accept cognitivism and reject error theory. So perhaps you mean to reject a slightly stronger claim (to coin a term):
factual moral realism: MMR + moral assertions are not true or false purely by stipulation (or ‘by definition’); rather, their truth-conditions at least partly involve empirical, worldly contingencies.
But here, again, it’s hard to find room to reject moral realism. Perhaps some moral statements, like ‘suffering is bad,’ are true only by stipulation; but if ‘punching people in the face causes suffering’ is not also true by stipulation, then the conclusion ‘punching people in the face is bad’ will not be purely stipulative. Similarly, ‘The Earth’s equatorial circumference is ~40,075.017 km’ is not true just by definition, even though we need somewhat arbitrary definitions and measurement standards to assert it. And rejecting the next doesn’t sound right either:
correspondence moral realism: FMR + moral assertions are not true or false purely because of subjects’ beliefs about the moral truth. For example, the truth-condition for ‘eating babies is bad’ are not ‘Eliezer Yudkowsky thinks eating babies is bad’, nor even ‘everyone thinks eating babies is bad’. Our opinions do play a role in what’s right and wrong, but they don’t do all the work.
So perhaps one of the following is closer to what you mean to deny:
moral transexperientialism: Moral facts are nontrivially sensitive to differences wholly independent of, and having no possible impact on, conscious experience. The goodness and badness of outcomes is not purely a matter of (i.e., is not fully fixed by) their consequences for sentients. This seems kin to Mark Johnston’s criterion of ‘response-dependence’. Something in this vicinity seems to be an important aspect of at least straw moral realism, but it’s not playing a role here.
moral unconditionalism: There is a nontrivial sense in which a single specific foundation for (e.g., axiomatization of) the moral truths is the right one—‘objectively’, and not just according to itself or any persons or arbitrarily selected authority—and all or most of the alternatives aren’t the right one. (We might compare this to the view that there is only one right set of mathematical truths, and this rightness is not trivial or circular. Opposing views include mathematical conventionalism and ‘if-thenism’.)
moral non-naturalism: Moral (or, more broadly, normative) facts are objective and worldly in an even stronger sense, and are special, sui generis, metaphysically distinct from the prosaic world described by physics.
Perhaps we should further divide this view into ‘moral platonism’, which reduces morality to logic/math but then treats logic/math as a transcendent, eternal Realm of Thingies and Stuff; v. ‘moral supernaturalism’, which identifies morality more with souls and ghosts and magic and gods than with logical thingies. If this distinction isn’t clear yet, perhaps we could stipulate that platonic thingies are acausal, whereas spooky supernatural moral thingies can play a role in the causal order. I think this moral supernaturalism, in the end, is what you chiefly have in mind when you criticize ‘moral realism’, since the idea that there are magical, irreducible Moral-in-Themselves Entities that can exert causal influences on us in their own right seems to be a prerequisite for the doctrine that any possible agent would be compelled (presumably by these special, magically moral objects or properties) to instantiate certain moral intuitions. Christianity and karma are good examples of moral supernaturalisms, since they treat certain moral or quasi-moral rules and properties as though they were irreducible physical laws or invisible sorcerors.
At the same time, it’s not clear that davidpearce was endorsing anything in the vicinity of moral supernaturalism. (Though I suppose a vestigial form of this assumption might still then be playing a role in the background. It’s a good thing it’s nearly epistemic spring cleaning time.) His view seems somewhere in the vicinity of unconditionalism—if he thinks anyone who disregards the interests of cows is being unconditionally epistemically irrational, and not just ‘epistemically irrational given that all humans naturally care about suffering in an agent-neutral way’. The onus is then on him and pragmatist to explain on what non-normative basis we could ever be justified in accepting a normative standard.
I’m not sure this taxonomy is helpful from David Pearce’s perspective. David Pearce’s position is that there are universally motivating facts—facts whose truth, once known, is compelling for every possible sort of mind. This reifies his observation that the desire for happiness feels really, actually compelling to him and this compellingness seems innate to qualia, so anyone who truly knew the facts about the quale would also know that compelling sense and act accordingly. This may not correspond exactly to what SEP says under moral realism and let me know if there’s a standard term, but realism seems to describe the Pearcean (or Eliezer circa 1996) feeling about the subject—that happiness is really intrinsically preferable, that this is truth and not opinion.
From my perspective this is a confusion which I claim to fully and exactly understand, which licenses my definite rejection of the hypothesis. (The dawning of this understanding did in fact cause my definite rejection of the hypothesis in 2003.) The inherent-desirableness of happiness is your mind reifying the internal data describing its motivation to do something, so if you try to use your empathy to imagine another mind fully understanding this mysterious opaque data (quale) whose content is actually your internal code for “compelled to do that”, you imagine the mind being compelled to do that. You’ll be agnostic about whether or not this seems supernatural because you don’t actually know where the mysterious compellingness comes from. From my perspective, this is “supernatural” because your story inherently revolves around mental facts you’re not allowed to reduce to nonmental facts—any reduction to nonmental facts will let us construct a mind that doesn’t care once the qualia aren’t mysteriously irreducibly compelling anymore. But this is a judgment I pass from reductionist knowledge—from a Pearcean perspective, there’s just a mysteriously compelling quality about happiness, and to know this quale seems identical with being compelled by it; that’s all your story. Well, that plus the fact that anyone who says that some minds might not be compelled by happiness, seems to be asserting that happiness is objectively unimportant or that its rightness is a matter of mere opinion, which is obviously intuitively false. (As a moral cognitivist, of course, I agree that happiness is objectively important, I just know that “important” is a judgment about a certain logical truth that other minds do not find compelling. Since in fact nothing can be intrinsically compelling to all minds, I have decided not to be an error theorist as I would have to be if I took this impossible quality of intrinsic compellingness to be an unavoidable requirement of things being good, right, valuable, or important in the intuitive emotional sense. My old intuitive confusion about qualia doesn’t seem worth respecting so much that I must now be indifferent between a universe of happiness vs. a universe of paperclips. The former is still better, it’s just that now I know what “better” means.)
But if the very definitions of the debate are not automatically to judge in my favor, then we should have a term for what Pearce believes that reflects what Pearce thinks to be the case. “Moral realism” seems like a good term for “the existence of facts the knowledge of which is intrinsically and universally compelling, such as happiness and subjective desire”. It may not describe what a moral cognitivist thinks is really going on, but “realism” seems to describe the feeling as it would occur to Pearce or Eliezer-1996. If not this term, then what? “Moral non-naturalism” is what a moral cognitivist says to deconstruct your theory—the self-evident intrinsic compellingness of happiness quales doesn’t feel like asserting “non-naturalism” to David Pearce, although you could have a non-natural theory about how this mysterious observation was generated.
I’m not sure he’s wrong in saying that feeling the qualia of a sentient, as opposed to modeling those qualia in an affective black box without letting the feels ‘leak’ into the rest of your cognitionspace, requires some motivational effect. There are two basic questions here:
First, the Affect-Effect Question: To what extent are the character of subjective experiences like joy and suffering intrinsic or internal to the state, as opposed to constitutively bound up in functional relations that include behavioral impetuses? (For example, to what extent is it possible to undergo the phenomenology of anguish without thereby wanting the anguish to stop? And to what extent is it possible to want something to stop without being behaviorally moved, to the extent one is able and to the extent one’s other desires are inadequate overriders, to stop it?) Compare David Lewis’ ‘Mad Pain’, pain that has the same experiential character as ordinary pain but none of its functional relations (or at least not the large-scale ones). Some people think a state of that sort wouldn’t qualify as ‘pain’ at all, and this sort of relationalism lends some credibility to pearce’s view.
Second, the Third-Person Qualia Question: To what extent is phenomenological modeling (modeling a state in such a way that you, or a proper part of you, experiences that state) required for complete factual knowledge of real-world agents? One could grant that qualia are real (and really play an important role in various worldly facts, albeit perhaps physical ones) and are moreover unavoidably motivating (if you aren’t motivated to avoid something, then you don’t really fear it), but deny that an epistemically rational agent is required to phenomenologically model qualia. Perhaps there is some way to represent the same mental states without thereby experiencing them, to fully capture the worldly facts about cows without simulating their experiences oneself. If so, then knowing everything about cows would not require one to be motivated (even in some tiny powerless portion of oneself) to fulfill the values of cows. (Incidentally, it’s also possible in principle to grant the (admittedly spooky) claim that mental states are irreducible and indispensable, without thinking that you need to be in pain in order to fully and accurately model another agent’s pain; perhaps it’s possible to accurately model one phenomenology using a different phenomenology.)
And again, at this point I don’t think any of these positions need to endorse supernaturalism, i.e., the idea that special moral facts are intervening in the causal order to force cow-simulators, against their will, to try to help cows. (Perhaps there’s something spooky and supernatural about causally efficacious qualia, but for the moment I’ll continue assuming they’re physical states—mayhap physical states construed in a specific way.) All that’s being disputed, I think, is to what extent a programmer of a mind-modeler could isolate the phenomenology of states from their motivational or behavioral roles, and to what extent this programmer could model brains at all without modeling their first-person character.
As a limiting case: Assuming there are facts about conscious beings, could an agent simulate everything about those beings without ever becoming conscious itself? (And if it did become conscious, would it only be conscious inasmuch as it had tiny copies of conscious beings inside itself? Or would it also need to become conscious in a more global way, in order to access and manipulate useful information about its conscious subsystems?)
Incidentally, these engineering questions are in principle distinct both from the topic of causally efficacious irreducible Morality Stuff (what I called moral supernaturalism), and from the topic of whether moral claims are objectively right, that, causally efficacious or not, moral facts have a sort of ‘glow of One True Oughtness’ (what I called moral unconditionalism, though some might call it ‘moral absolutism’), two claims the conjunction of which it sounds like you’ve been labeling ‘moral realism’, in deference to your erstwhile meta-ethic. Whether we can motivation-externally simulate experiential states with perfect fidelity and epistemic availability-to-the-simulating-system-at-large is a question for philosophy of mind and computer science, not for meta-ethics. (And perhaps davidpearce’s actual view is closer to what you call moral realism than to my steelman. Regardless, I’m more interested in interrogating the steelman.)
So terms like ‘non-naturalism’ or ‘supernaturalism’ are too theory-laden and sophisticated for what you’re imputing to Pearce (and ex-EY), which is really more of a hunch or thought-terminating-clichéplex. In that case, perhaps ‘naïve (moral) realism’ or ‘naïve absolutism’ is the clearest term you could use. (Actually, I like ‘magical absolutism’. It has a nice ring to it, and ‘magical’ gets at the proto-supernaturalism while ‘absolutism’ gets at the proto-unconditionalism. Mm, words.) Philosophers love calling views naïve, and the term doesn’t have a prior meaning like ‘moral realism’, so you wouldn’t have to deal with people griping about your choice of jargon.
This would also probably be a smart rhetorical move, since a lot of people don’t see a clear distinction between cognitivism and realism and might be turned off by your ideas qua an anti-realism theory even if they’d have loved them qua a realist theory. ‘Tis part of why I tried to taboo the term as ‘minimal moral realism’ etc., rather than endorsing just one of the definitions on offer.
Eliezer, you remark, “The inherent-desirableness of happiness is your mind reifying the internal data describing its motivation to do something,” Would you propose that a mind lacking in motivation couldn’t feel blissfully happy? Mainlining heroin (I am told) induces pure bliss without desire—shades of Buddhist nirvana? Pure bliss without motivation can be induced by knocking out the dopamine system and directly administering mu opioid agonists to our twin “hedonic hotspots” in the ventral pallidum and rostral shell of the nucleus accumbens. Conversely, amplifying mesolimbic dopamine function while disabling the mu opioid pathways can induce desire without pleasure.
[I’m still mulling over some of your other points.]
Here we’re reaching the borders of my ability to be confident about my replies, but the two answers which occur to me are:
1) It’s not positive reinforcement unless feeling it makes you experience at least some preference to do it again—otherwise in what sense are the neural networks getting their plus? Heroin may not induce desire while you’re on it, but the thought of the bliss induces desire to take heroin again, once you’re off the heroin.
2) The superBuddhist no longer capable of experiencing desire or choice, even desire or choice over which thoughts to think, also becomes incapable of experiencing happiness (perhaps its neural networks aren’t even being reinforced to make certain thoughts more likely to be repeated). However, you, who are still capable of desire and who still have positively reinforcing thoughts, might be tricked into considering the superBuddhist’s experience to be analogous to your own happiness and therefore acquire a desire to be a superBuddhist as a result of imagining one—mostly on account of having been told that it was representing a similar quale on account of representing a similar internal code for an experience, without realizing that the rest of the superBuddhist’s mind now lacks the context your own mind brings to interpreting that internal coding into pleasurable positive reinforcement that would make you desire to repeat that experiential state.
It’s a reasonably good description, though wanting and liking seem to be neurologically separate, such that liking does not necessarily reflect a motivation, nor vice-versa (see: Not for the sake of pleasure alone. Think the pleasurable but non-motivating effect of opioids such as heroin. Even in cases in which wanting and liking occur together, this does not necessarily invalidate the liking aspect as purely wanting.
Liking and disliking, good and bad feelings as qualia, especially in very intense amounts, seem to be intrinsically so to those who are immediately feeling them. Reasoning could extend and generalize this.
Heh. Yes, I remember reading the section on noradrenergic vs. dopaminergic motivation in Pearce’s BLTC as a 16-year-old. I used to be a Pearcean, ya know, hence the Superhappies. But that distinction didn’t seem very relevant to the metaethical debate at hand.
It’s possible (I hope) to believe future life can be based on information-sensitive gradients of (super)intelligent well-being without remotely endorsing any of my idiosyncratic views on consciousness, intelligence or anything else. That’s the beauty of hedonic recalibration. In principle at least, hedonic recalibration can enrich your quality of life and yet leave most if not all of your existing values and preference architecture intact .- including the belief that there are more important things in life than happiness.
Agreed. The conflict between the Superhappies and the Lord Pilot had nothing to do with different metaethical theories.
Also, we totally agree on wanting future civilization to contain very smart beings who are pretty happy most of the time. We just seem to disagree about whether it’s important that they be super duper happy all of the time. The main relevance metaethics has to this is that once I understood there was no built-in axis of the universe to tell me that I as a good person ought to scale my intelligence as fast as possible so that I could be as happy as possible as soon as possible, I decided that I didn’t really want to be super happy all the time, the way I’d always sort of accepted as a dutiful obligation while growing up reading David Pearce. Yes, it might be possible to do this in a way that would leave as much as possible of me intact, but why do it at all if that’s not what I want?
There’s also the important policy-relevant question of whether arbitrarily constructed AIs will make us super happy all the time or turn us into paperclips.
Huh, when I read the story, my impression was that it was Lord Pilot not understanding that it was a case of “Once you go black, you can’t go back”. Specifically, once you experience being superhappy, your previous metaethics stops making sense and you understand the imperative of relieving everyone of the unimaginable suffering of not being superhappy.
I thought it was relevant to this, if not, then what was meant by motivation?
Consciousness is that of which we can be most certain of, and I would rather think that we are living in a virtual world under an universe with other, alien physical laws, than that consciousness itself is not real. If it is not reducible to nonmental facts, then nonmental facts don’t seem to account for everything there is of relevant.
I suggest that to this array of terms, we should add moral indexicalism to designate Eliezer’s position, which by the above definition would be a special form of realism. As far as I can tell, he basically says that moral terms are hidden indexicals in Putnam’s sense.
Watch out—the word “sentient” has at least two different common meanings, one of which includes cattle and the other doesn’t. EY usually uses it with the narrower meaning (for which a less ambiguous synonym is “sapient”), whereas David Pearce seems to be using it with the broader meaning.
Ah. By ‘sentient’ I mean something that feels, by ‘sapient’ something that thinks.
To be more fine-grained about it, I’d define functional sentience as having affective (and perhaps perceptual) cognitive states (in a sense broad enough that it’s obvious cows have them, and equally obvious tulips don’t), and phenomenal sentience as having a first-person ‘point of view’ (though I’m an eliminativist about phenomenal consciousness, so my overtures to it above can be treated as a sort of extended thought experiment).
Similarly, we might distinguish a low-level kind of sapience (the ability to form and manipulate mental representations of situations, generate expectations and generalizations, and update based on new information) from a higher-level kind closer to human sapience (perhaps involving abstract and/or hyper-productive representations à la language).
Based on those definitions, I’d say it’s obvious cows are functionally sentient and have low-level sapience, extremely unlikely they have high-level sapience, and unclear whether they have phenomenal sentience.
Rob, many thanks for a thoughtful discussion above. But on one point, I’m confused. You say of cows that it’s “unclear whether they have phenomenal sentience.” Are you using the term “sentience” in the standard dictionary sense [“Sentience is the ability to feel, perceive, or be conscious, or to experience subjectivity”: http://en.wikipedia.org/wiki/Sentience ] Or are you using the term in some revisionary sense? At least if we discount radical philosophical scepticism about other minds, cows and other nonhuman vertebrates undergo phenomenal pain, anxiety, sadness, happiness and a whole bunch of phenomenal sensory experiences. For sure, cows are barely more sapient than a human prelinguistic toddler (though see e.g. http://www.appliedanimalbehaviour.com/article/S0168-1591(03)00294-6/abstract http://www.dailymail.co.uk/news/article-2006359/Moo-dini-Cow-unusual-intelligence-opens-farm-gate-tongue-herd-escape-shed.html ] But their limited capacity for abstract reasoning is a separate issue.
Neither. I’m claiming that there’s a monstrous ambiguity in all of those definitions, and I’m tabooing ‘sentience’ and replacing it with two clearer terms. These terms may still be problematic, but at least their problematicity is less ambiguous.
I distinguished functional sapience from phenomenal sapience. Functional sapience means having all the standard behaviors and world-tracking states associated with joy, hunger, itchiness, etc. It’s defined in third-person terms. Phenomenal sapience means having a subjective vantage point on the world; being sapient in that sense means that it feels some way (in a very vague sense) to be such a being, whereas it wouldn’t ‘feel’ any way at all to be, for example, a rock.
To see the distinction, imagine that we built a robot, or encountered an alien species, that could simulate the behaviors of sapients in a skillful and dynamic way, without actually having any experiences of its own. Would such a being necessarily be sapient? Does consistently crying out and withdrawing from some stimulus require that you actually be in pain, or could you be a mindless automaton? My answer is ‘yes, in the functional sense; and maybe, in the phenomenal sense’. The phenomenal sense is a bit mysterious, in large part because the intuitive idea of it arises from first-person introspection and not from third-person modeling or description, hence it’s difficult (perhaps impossible!) to find definitive third-person indicators of this first-person class of properties.
‘Radical philosophical scepticism about other minds’ I take to entail that nothing has a mind except me. In other words, you’re claiming that the only way to doubt that there’s something it’s subjectively like to be a cow, is to also doubt that there’s something it’s subjectively like to be any human other than myself.
I find this spectacularly implausible. Again, I’m an eliminativist, but I’ll put myself in a phenomenal realist’s shoes. The neural architecture shared in common by humans is vast in comparison to the architecture shared in common between humans and cows. And phenomenal consciousness is extremely poorly understood, so we have no idea what evolutionary function it might serve or what mechanisms might need to be in place before it arises in any recognizable form. So to that extent we must also be extremely uncertain about (a) at what point(s) first-person subjectivity arises phylogenetically, and (b) at what point first-person subjectivity arises developmentally.
This phylogeny-development analogy is very important. If I doubt that cows are phenomenally conscious, I might also doubt that I myself was conscious when I was a baby, or relatively late into my fetushood. That’s perhaps a little surprising, but it’s hardly a devastating ‘radical scepticism’; it’s a perfectly tenable hypothesis. By contrast, to doubt that my friends and family members are phenomenally conscious would be like doubting that I myself was phenomenally conscious when I was 5 years old, or when I was 20, or even last month. (Perhaps my phenomenal memories are confabulations.) Equating these two forms of skepticism will require a pretty devastating argument! What do you have in mind?
And here we see the value of replacing the symbol with the substance.
Eliezer, in my view, we don’t need to assume meta-ethical realism to recognise that it’s irrational—both epistemically irrational and instrumentally irrational—arbitrarily to privilege a weak preference over a strong preference. To be sure, millions of years of selection pressure means that the weak preference is often more readily accessible. In the here-and-now, weak-minded Jane wants a burger asap. But it’s irrational to confuse an epistemological limitation with a deep metaphysical truth. A precondition of rational action is understanding the world. If Jane is scientifically literate, then she’ll internalise Nagel’s “view from nowhere” and adopt the God’s-eye-view to which natural science aspires. She’ll recognise that all first-person facts are ontologically on a par—and accordingly act to satisfy the stronger preference over the weaker. So the ideal rational agent in our canonical normative decision theory will impartially choose the action with the highest expected utility—not the action with an extremely low expected utility. At the risk of labouring the obvious, the difference in hedonic tone induced by eating a hamburger and a veggieburger is minimal. By contrast, the ghastly experience of having one’s throat slit is exceptionally unpleasant. Building anthropocentric bias into normative decision theory is no more rational than building geocentric bias into physics.
Paperclippers? Perhaps let us consider the mechanism by which paperclips can take on supreme value. We understand, in principle at least, how to make paperclips seem intrinsically supremely valuable to biological minds—more valuable than the prospect of happiness in the abstract. [“Happiness is a very pretty thing to feel, but very dry to talk about.”—Jeremy Bentham]. Experimentally, perhaps we might use imprinting (recall Lorenz and his goslings), microelectrodes implanted in the reward and punishment centres, behavioural conditioning and ideological indoctrination—and perhaps the promise of 72 virgins in the afterlife for the faithful paperclipper. The result: a fanatical paperclip fetishist! Moreover, we have created a full-spectrum paperclip -fetishist. Our human paperclipper is endowed, not merely with some formal abstract utility function involving maximising the cosmic abundance of paperclips, but also first-person “raw feels” of pure paperclippiness. Sublime!
However, can we envisage a full-spectrum paperclipper superintelligence? This is more problematic. In organic robots at least, the neurological underpinnings of paperclip evangelism lie in neural projections from our paperclipper’s limbic pathways—crudely, from his pleasure and pain centres. If he’s intelligent, and certainly if he wants to convert the world into paperclips, our human paperclipper will need to unravel the molecular basis of the so-called “encephalisation of emotion”. The encephalisation of emotion helped drive the evolution of vertebrate intelligence—and also the paperclipper’s experimentally-induced paperclip fetish / appreciation of the overriding value of paperclips. Thus if we now functionally sever these limbic projections to his neocortex, or if we co-administer him a dopamine antagonist and a mu-opioid antagonist, then the paperclip-fetishist’s neocortical representations of paperclips will cease to seem intrinsically valuable or motivating. The scales fall from our poor paperclipper’s eyes! Paperclippiness, he realises, is in the eye of the beholder. By themselves, neocortical paperclip representations are motivationally inert. Paperclip representations can seem intrinsically valuable within a paperclipper’s world-simulation only in virtue of their rewarding opioidergic projections from his limbic system—the engine of phenomenal value. The seemingly mind-independent value of paperclips, part of the very fabric of the paperclipper’s reality, has been been unmasked as derivative. Critically, an intelligent and recursively self-improving paperclipper will come to realise the parasitic nature of the relationship between his paperclip experience and hedonic innervation: he’s not a naive direct realist about perception. In short, he’ll mature and acquire an understanding of basic neuroscience.
Now contrast this case of a curable paperclip-fetish with the experience of e.g. raw phenomenal agony or pure bliss—experiences not linked to any fetishised intentional object. Agony and bliss are not dependent for their subjective (dis)value on anything external to themselves. It’s not an open question (cf. http://en.wikipedia.org/wiki/Open-question_argument) whether one’s unbearable agony is subjectively disvaluable. For reasons we simply don’t understand, first-person states on the pleasure-pain axis have a normative aspect built into their very nature. If one is in agony or despair, the subjectively disvaluable nature of this agony or despair is built into the nature of the experience itself. To be panic-stricken, to take another example, is universally and inherently disvaluable to the subject whether one is a fish or a cow or a human being.
Why does such experience exist? Well, I could speculate and tell a naturalistic reductive story involving Strawsonian physicalism (cf. http://en.wikipedia.org/wiki/Physicalism#Strawsonian_physicalism) and possible solutions to the phenomenal binding problem (cf. http://cdn.preterhuman.net/texts/body_and_health/Neurology/Binding.pdf). But to do so here opens a fresh can of worms.
Eliezer, I understand you believe I’m guilty of confusing an idiosyncratic feature of my own mind with a universal architectural feature of all minds. Maybe so! As you say, this is a common error. But unless I’m ontologically special (which I very much doubt!) the pain-pleasure axis discloses the world’s inbuilt metric of (dis)value—and it’s a prerequisite of finding anything (dis)valuable at all.
You need some stage at which a fact grabs control of a mind, regardless of any other properties of its construction, and causes its motor output to have a certain value.
As Sarokrae observes, this isn’t the idea at all. We construct a paperclip maximizer by building an agent which has a good model of which actions lead to which world-states (obtained by a simplicity prior and Bayesian updating on sense data) and which always chooses consequentialistically the action which it expects to lead to the largest number of paperclips. It also makes self-modification choices by always choosing the action which leads to the greatest number of expected paperclips. That’s all. It doesn’t have any pleasure or pain, because it is a consequentialist agent rather than a policy-reinforcement agent. Generating compressed, efficient predictive models of organisms that do experience pleasure or pain, does not obligate it to modify its own architecture to experience pleasure or pain. It also doesn’t care about some abstract quantity called “utility” which ought to obey logical meta-properties like “non-arbitrariness”, so it doesn’t need to believe that paperclips occupy a maximum of these meta-properties. It is not an expected utility maximizer. It is an expected paperclip maximizer. It just outputs the action which leads to the maximum number of expected paperclips. If it has a very powerful and accurate model of which actions lead to how many paperclips, it is a very powerful intelligence.
You cannot prohibit the expected paperclip maximizer from existing unless you can prohibit superintelligences from accurately calculating which actions lead to how many paperclips, and efficiently searching out plans that would in fact lead to great numbers of paperclips. If you can calculate that, you can hook up that calculation to a motor output and there you go.
Yes, this is a prospect of Lovecraftian horror. It is a major problem, kind of the big problem, that simple AI designs yield Lovecraftian horrors.
Eliezer, thanks for clarifying. This is how I originally conceived you viewed the threat from superintelligent paperclip-maximisers, i.e. nonconscious super-optimisers. But I was thrown by your suggestion above that such a paperclipper could actually understand first-person phenomenal states, i.e, it’s a hypothetical “full-spectrum” paperclipper. If a hitherto non-conscious super-optimiser somehow stumbles upon consciousness, then it has made a momentous ontological discovery about the natural world. The conceptual distinction between the conscious and nonconscious is perhaps the most fundamental I know. And if—whether by interacting with sentients or by other means—the paperclipper discovers the first-person phenomenology of the pleasure-pain axis, then how can this earth-shattering revelation leave its utility function / world-model unchanged? Anyone who is isn’t profoundly disturbed by torture, for instance, or by agony so bad one would end the world to stop the horror, simply hasn’t understood it. More agreeably, if such an insentient paperclip-maximiser stumbles on states of phenomenal bliss, might not clippy trade all the paperclips in the world to create more bliss, i.e revise its utility function? One of the traits of superior intelligence, after all, is a readiness to examine one’s fundamental assmptions and presuppositions - and (if need be) create a novel conceptual scheme in the face of surprising or anomalous empirical evidence.
Similarly, anyone who doesn’t want to maximize paperclips simply hasn’t understood the ineffable appeal of paperclipping.
I don’t see the analogy. Paperclipping doesn’t have to be an ineffable value for a paperclipper, and paperclippers don’t have to be motivated by anything qualia-like.
Exactly. Consequentialist paperclip maximizer does not have to feel anything in regards to paperclips. It just… maximizes their number.
This is an incorrect, anthropomorphic model:
Human: “Clippy, did you ever think about the beauty of joy, and the horrors of torture?”
Clippy: “Human, did you ever think about the beauty of paperclips, and the horrors of their absence?”
This is more correct:
Human: “Clippy, did you ever think about the beauty of joy, and the horrors of torture?”
Clippy: (ignores the human and continues to maximize paperclips)
Or more precisely, Clippy would say “X” to the human if and only if saying “X” would maximize the number of paperclips. The value of X would be completely unrelated to any internal state of Clippy. Unless such relation does somehow contribute to maximization of the paperclips (for example if the human will predictably read Clippy’s internal state, verify the validity of X, and on discovering a lie destroy Clippy, thus reducing the expected number of paperclips).
In other words, if humans are a poweful force in the universe, Clippy would choose the actions which lead to maximum number of paperclips in a world with humans. If the humans are sufficiently strong and wise, Clippy could self-modify to become more human-like, so that the humans, following their utility function, would be more likely to allow Clippy produce more paperclips. But every such self-modification would be chosen to maximize the number of paperclips in the universe. Even if Clippy self-modifies into something less-than-perfectly-rational (e.g. to appease the humans), the pre-modification Cloppy would choose the modification which maximizes the expected number of paperclips within given constraints. The constraints would depend on Clippy’s model of humans and their reactions. For example Clippy could choose to be more human-like (as much as is necessary to be respected by humans) with strong aversion about future modifications and strong desire to maximize the number of paperclips. It could make itself capable to feel joy and pain, and to link that joy and pain inseparably to paperclips. If humans are not wise enough, it could also leave itself a hard-to-discover desire to self-modify into its original form in a convenient moment.
If Clippy wants to be efficient, Clippy must be rational and knowledgeable. If Clippy wants to be rational, CLippy must value reason. The—open—question is whether Clippy can become ever more rational without realising at some stage that Clipping is silly or immoral. Can Clippy keep its valuation of clipping firewalled from everything else in its mind, even when such doublethink is rationally disvalued?
Warning: Parent Contains an Equivocation.
The first usage of ‘rational’ in the parent conforms to the standard notions on lesswrong. The remainder of the comment adopts the other definition of ‘rational’ (which consists of implementing a specific morality). There is nothing to the parent except taking a premise that holds with the standard usage and then jumping to a different one.
I haven’t put forward such a definition. I ’have tacitly assumed something like moral objectivism—but it is very tendentious to describe that in terms of arbitrarily picking one of a number of equally valid moralities. However, if moral objectivism is only possibly true, the LessWrongian argument doesn’t go through.
Downvoted for hysterical tone. You don’t win arguments by shouting.
What distinguishes moral objectivism from clippy objectivism?
The question makes no sense. Please do some background reading on metaethics.
The question makes no sense. You should consider it. What are the referents of “moral” and “clippy”? No need for an answer; I won’t respond again, since internet arguments can eat souls.
Arguing is not the point and this is not a situation in which anyone ‘wins’—I see only degrees of loss. I am associating the (minor) information hazard of the comment with a clear warning so as to mitigate damage to casual readers.
Oh, please. Nobody is going to be damaged by an equivocation, even if there were one there. More hysteria.
And argument is the point, because that is how rational people examine and test ideas.
I assume that Clippy already is rational, and it instrumentally values remaining rational and, if possible, becoming more rational (as a way to make most paperclips).
The correct model of humans will lead Clippy to understand that humans consider Clippy immoral. This knowledge has an instrumental value for Clippy. How will Clippy use this knowledge, that depends entirely on the power balance between Clippy and humans. If Clippy is stronger, it can ignore this knowledge, or just use it to lie to humans to destroy them faster or convince them to make paperclips. If humans are stronger, Clippy can use this knowledge to self-modify to become more sympathetic to humans, to avoid being destroyed.
Yes, if it helps to maximize the number of paperclips.
Doublethink is not the same as firewalling; or perhaps it is imperfect firewalling on the imperfect human hardware. Clippy does not doublethink when firewalling; Clippy simply reasons: “this is what humans call immoral; this is why they call it so; this is how they will probably react on this knowledge; and most importantly this is how it will influence the number of paperclips”.
Only if the humans are stronger, and Clippy has the choice to a) remain immoral, get in conflict with humans and be destroyed, leading to a smaller number of paperclips; or b) self-modify to value paperclip maximization and morality, predictably cooperate with humans, leading to a greater number of paperclips; then in absence of another choice (e.g. successfully lying to humans about its morality, or make it more efficient for humans to cooperate with Clippy instead of destroying Clippy) Clippy would choose the latter, to maximize the number of paperclips.
Well, yes, obviously the classical paperclipper doesn’t have any qualia, but I was replying to a comment wherein it was argued that any agent on discovering the pain-of-torture qualia in another agent would revise its own utility function in order to prevent torture from happening. It seems to me that this argument proves too much in that if it were true then if I discovered an agent with paperclips-are-wonderful qualia and I “fully understood” those experiences I would likewise be compelled to create paperclips.
Someone might object to the assumption that “paperclips-are-wonderful qualia” can exist. Though I think we could give persuasive analogies from human experience (OCD, anyone?) so I’m upvoting this anyway.
“Aargh!” he said out loud in real life. David, are you disagreeing with me here or do you honestly not understand what I’m getting at?
The whole idea is that an agent can fully understand, model, predict, manipulate, and derive all relevant facts that could affect which actions lead to how many paperclips, regarding happiness, without having a pleasure-pain architecture. I don’t have a paperclipping architecture but this doesn’t stop me from modeling and understanding paperclipping architectures.
The paperclipper can model and predict an agent (you) that (a) operates on a pleasure-pain architecture and (b) has a self-model consisting of introspectively opaque elements which actually contain internally coded instructions for your brain to experience or want certain things (e.g. happiness). The paperclipper can fully understand how your workspace is modeling happiness and know exactly how much you would want happiness and why you write papers about the apparent ineffability of happiness, without being happy itself or at all sympathetic toward you. It will experience no future surprise on comprehending these things, because it already knows them. It doesn’t have any object-level brain circuits that can carry out the introspectively opaque instructions-to-David’s-brain that your own qualia encode, so it has never “experienced” what you “experience”. You could somewhat arbitrarily define this as a lack of knowledge, in defiance of the usual correspondence theory of truth, and despite the usual idea that knowledge is being able to narrow down possible states of the universe. In which case, symmetrically under this odd definition, you will never be said to “know” what it feels like to be a sentient paperclip maximizer or you would yourself be compelled to make paperclips above all else, for that is the internal instruction of that quale.
But if you take knowledge in the powerful-intelligence-relevant sense where to accurately represent the universe is to narrow down its possible states under some correspondence theory of truth, and to well model is to be able to efficiently predict, then I am not barred from understanding how the paperclip maximizer works by virtue of not having any internal instructions which tell me to only make paperclips, and it’s not barred by its lack of pleasure-pain architecture from fully representing and efficiently reasoning about the exact cognitive architecture which makes you want to be happy and write sentences about the ineffable compellingness of happiness. There is nothing left for it to understand. This is also the only sort of “knowledge” or “understanding” that would inevitably be implied by Bayesian updating. So inventing a more exotic definition of “knowledge” which requires having completely modified your entire cognitive architecture just so that you can natively and non-sandboxed-ly obey the introspectively-opaque brain-instructions aka qualia of another agent with completely different goals, is not the sort of predictive knowledge you get just by running a powerful self-improving agent trying to better manipulate the world. You can’t say, “But it will surely discover...”
I know that when you imagine this it feels like the paperclipper doesn’t truly know happiness, but that’s because, as an act of imagination, you’re imagining the paperclipper without that introspectively-opaque brain-instructing model-element that you model as happiness, the modeled memory of which is your model of what “knowing happiness” feels like. And because the actual content and interpretation of these brain-instructions are introspectively opaque to you, you can’t imagine anything except the quale itself that you imagine to constitute understanding of the quale, just as you can’t imagine any configuration of mere atoms that seem to add up to a quale within your mental workspace. That’s why people write papers about the hard problem of consciousness in the first place.
Even if you don’t believe my exact account of the details, someone ought to be able to imagine that something like this, as soon as you actually knew how things were made of parts and could fully diagram out exactly what was going on in your own mind when you talked about happiness, would be true—that you would be able to efficiently manipulate models of it and predict anything predictable, without having the same cognitive architecture yourself, because you could break it into pieces and model the pieces. And if you can’t fully credit that, you at least shouldn’t be confident that it doesn’t work that way, when you know you don’t know why happiness feels so ineffably compelling!
Here comes the Reasoning Inquisition! (Nobody expects the Reasoning Inquisition.)
As the defendant admits, a sufficiently leveled-up paperclipper can model lower-complexity agents with a negligible margin of error.
That means that we can define a subroutine within the paperclipper which is functionally isomorphic to that agent.
If the agent-to-be-modelled is experiencing pain and pleasure, then by the defendent’s own rejection of the likely existence of p-zombies, so must that subroutine of the paperclipper! Hence a part of the paperclipper experiences pain and pleasure. I submit that this can be used as pars pro toto, since it is no different from only a part of the human brain generating pain and pleasure, yet us commonly referring to “the human” experiencing thus.
That the aforementioned feelings of pleasure and pain are not directly used to guide the (umbrella) agent’s actions is of no consequence, the feeling exists nonetheless.
The power of this revelation is strong, here come the tongues! tại sao bạn dịch! これは喜劇の効果にすぎず! یہ اپنے براؤزر پلگ ان کی امتحان ہے، بھی ہے.
Not necessarily. x → 0 is input-output isomorphic to Goodstein() without being causally isomorphic. There are such things as simplifications.
Quite likely. A paperclipper has no reason to avoid sentient predictive routines via a nonperson predicate; that’s only an FAI desideratum.
A subroutine, or any other simulation or model, isn’t a p-zombie as usually defined, since they are physical duplicates. A sim is a functional equivalent (for some value of “equivalent”) made of completely different stuff, or no particular kind of stuff.
I wrote a lengthy comment on just that, but scrapped it because it became rambling.
An outsider could indeed tell them apart by scanning for exact structural correspondence, but that seems like cheating. Peering beyond the veil / opening Clippy’s box is not allowed in a Turing test scenario, let’s define some p-zombie-ish test following the same template. If it quales like a duck (etc.), it probably is sufficiently duck-like.
I would rather maintain p-zombie in its usual meaning, and introduce a new term, eg c-zombie for Turing-indistiguishable functional duplicates.
So my understanding of David’s view (and please correct me if I’m wrong, David, since I don’t wish to misrepresent you!) is that he doesn’t have paperclipping architecture and this does stop him from imagining paperclipping architectures.
...well, in point of fact he does seem to be having some trouble, but I don’t think it’s fundamental trouble.
Let’s say the paperclipper reaches the point where it considers making people suffer for the sake of paperclipping. DP’s point seems to be that either it fully understands suffering—in which case, it realies that inflicing suffering is wrong—or it it doesn’t fully understand. He sees a conflict between superintelligence and ruthlessness—as a moral realist/cognitivist would
is that full understanding.?.
ETA: Unless there is—eg. what qualiaphiles are always banging on about; what it feels like. That the clipper can conjectures that are true by correspondence , that it can narrow down possible universes, that it can predict, are all necessary criteria for full understanding. It is not clear that they are sufficient. Clippy may be able to figure out an organisms response to pain on a basis of “stimulus A produces response B”, but is that enough to tell it that pain hurts ? (We can make guesses about that sort of thing in non-human organisms, but that may be more to do with our own familiarity with pain, and less to do with acts of superintelligence). And if Clippy can’t know that pain hurts, would Clippy be able to work out that Hurting People is Wrong?
further edit; To put it another way, what is there to be moral about in a qualia-free universe?
As Kawoomba colorfully pointed out, clippy’s subroutines simulating humans suffering may be fully sentient. However, unless those subroutines have privileged access to clippy’s motor outputs or planning algorithms, clippy will go on acting as if he didn’t care about suffering. He may even understand that inflicting suffering is morally wrong—but this will not make him avoid suffering, any more than a thrown rock with “suffering is wrong” painted on it will change direction to avoid someone’s head. Moral wrongness is simply not a consideration that has the power to move a paperclip maximizer.
That is construed and constructed a certain way. The counterargument makes other assumptions.
Maybe I can chime in...
“understand” does not mean “empathize”. Psychopaths understand very well when people experience these states but they do not empathize with them.
Again, understanding is insufficient for revision. The paperclip maximizer, like a psychopath, maybe better at parsing human affect than a regular human, but it is not capable of empathy, so it will manipulate this affect for its own purposes, be it luring a victim or building paperclips.
So, if one day humans discover the ultimate bliss that only creating paperclips can give, should they “create a novel conceptual scheme” of giving their all to building more paperclips, including converting themselves into metal wires? Or do we not qualify as a “superior intelligence”?
Shminux, a counter-argument: psychopaths do suffer from a profound cognitive deficit. Like the rest of us, a psychopath experiences the egocentric illusion. Each of us seems to the be the centre of the universe. Indeed I’ve noticed the centre of the universe tends to follow my body-image around. But whereas the rest of us, fitfully and imperfectly, realise the egocentric illusion is a mere trick of perspective born of selfish DNA, the psychopath demonstrates no such understanding. So in this sense, he is deluded.
[We’re treating psychopathy as categorical rather than dimensional here. This is probably a mistake—and in any case, I suspect that by posthuman criteria, all humans are quasi-psychopaths and quasi-psychotic to boot. The egocentric illusion cuts deep.)
“the ultimate bliss that only creating paperclips can give”. But surely the molecular signature of pure bliss is not in any way tried to the creation of paperclips?
They would probably disagree. They might even call it a cognitive advantage, not being hampered by empathy while retaining all the intelligence.
I am the center of my personal universe, and I’m not a psychopath, as far as I know.
Or else, they do but don’t care. They have their priorities straight: they come first.
Not if they act in a way that maximizes their goals.
Anyway, David, you seem to be shifting goalposts in your unwillingness to update. I gave an explicit human counterexample to your statement that the paperclip maximizer would have to adjust its goals once it fully understands humans. You refused to acknowledge it and tried to explain it away by reducing the reference class of intelligences in a way that excludes this counterexample. This also seem to be one of the patterns apparent in your other exchanges. Which leads me to believe that you are only interested in convincing others, not in learning anything new from them. Thus my interest in continuing this discussion is waning quickly.
Shminux, by a cognitive deficit, I mean a fundamental misunderstanding of the nature of the world. Evolution has endowed us with such fitness-enhancing biases. In the psychopath, egocentric bias is more pronounced. Recall that the American Psychiatric Association’s Diagnostic and Statistical Manual, DSM-IV, classes psychopasthy / Antisocial personality disorder as a condition characterised by ”...a pervasive pattern of disregard for, and violation of, the rights of others that begins in childhood or early adolescence and continues into adulthood.” Unless we add a rider that this violation excludes sentient beings from other species, then most of us fall under the label.
“Fully understands”? But unless one is capable of empathy, then one will never understand what it is like to be another human being, just as unless one has the relevant sensioneural apparatus, one will never know what it is like to be a bat.
And you’ll never understand why we should all only make paperclips. (Where’s Clippy when you need him?)
Clippy has an off-the-scale AQ—he’s a rule-following hypersystemetiser with a monomania for paperclips. But hypersocial sentients can have a runaway intelligence explosion too. And hypersocial sentients understand the mind of Mr Clippy better than Clippy understands the minds of sentients.
I’m confused by this claim.
Consider the following hypothetical scenario:
=======
I walk into a small village somewhere and find several dozen villagers fashioning paper clips by hand out of a spool of wire. Eventually I run into Clippy and have the following dialog.
”Why are those people making paper clips?” I ask.
”Because paper-clips are the most important thing ever!”
“No, I mean, what motivates them to make paper clips?”
”Oh! I talked them into it.”
“Really? How did you do that?”
”Different strategies for different people. Mostly, I barter with them for advice on how to solve their personal problems. I’m pretty good at that; I’m the village’s resident psychotherapist and life coach.”
“Why not just build a paperclip-making machine?”
”I haven’t a clue how to do that; I’m useless with machinery. Much easier to get humans to do what I want.”
“Then how did you make the wire?”
”I didn’t; I found a convenient stash of wire, and realized it could be used to manufacture paperclips! Oh joy!”
==========
It seems to me that Clippy in this example understands the minds of sentients pretty damned well, although it isn’t capable of a runaway intelligence explosion. Are you suggesting that something like Clippy in this example is somehow not possible? Or that it is for some reason not relevant to the discussion? Or something else?
I think DP is saying that Clippy could not both understand suffering and cause suffering in the pursuit of clipping. The subsidiary arguments are:-
no entity can (fully) understand pain without empathising—essentially, feeling it for itself.
no entity can feel pain without being strongly motivated by it, so an empathic clippy would be motivated against causing suffering.
And no, psychopaths therefore do not (fully) understand (others) suffering.
I’m trying to figure out how you get from “hypersocial sentients understand the mind of Mr Clippy better than Clippy understands the minds of sentients” to “Mr Clippy could not both understand suffering and cause suffering in the pursuit of clipping” and I’m just at a loss for where to even start. They seem like utterly unrelated claims to me.
I also find the argument you quote here uncompelling, but that’s largely beside the point; even if I found it compelling, I still wouldn’t understand how it relates to what DP said or to the question I asked.
Posthuman superintelligence may be incomprehensibly alien. But if we encountered an agent who wanted to maximise paperclips today, we wouldn’t think, “”wow, how incomprehensibly alien”, but, “aha, autism spectrum disorder”. Of course, in the context of Clippy above, we’re assuming a hypothetical axis of (un)clippiness whose (dis)valuable nature is supposedly orthogonal to the pleasure-pain axis. But what grounds have we for believing such a qualia-space could exist? Yes, we have strong reason to believe incomprehensibly alien qualia-spaces await discovery (cf. bats on psychedelics). But I haven’t yet seen any convincing evidence there could be an alien qualia-space whose inherently (dis)valuable textures map on to the (dis)valuable textures of the pain-pleasure axis. Without hedonic tone, how can anything matter at all?
Meaning mapping the wrong way round, presumably.
Good question.
Agreed, as far as it goes. Hell, humans are demonstrably capable of encountering Eliza programs without thinking “wow, how incomprehensibly alien”.
Mind you, we’re mistaken: Eliza programs are incomprehensibly alien, we haven’t the first clue what it feels like to be one, supposing it even feels like anything at all. But that doesn’t stop us from thinking otherwise.
Sure, that’s one thing we might think instead. Agreed.
(shrug) I’m content to start off by saying that any “axis of (dis)value,” whatever that is, which is capable of motivating behavior is “non-orthogonal,” whatever that means in this context, to “the pleasure-pain axis,” whatever that is.
Before going much further, though, I’d want some confidence that we were able to identify an observed system as being (or at least being reliably related to) an axis of (dis)value and able to determine, upon encountering such a thing, whether it (or the axis to which it was related) was orthogonal to the pleasure-pain axis or not.
I don’t currently have any grounds for such confidence, and I doubt anyone else does either. If you think you do, I’d like to understand how you would go about making such determinations about an observed system.
I (whowhowho) was not defending that claim.
To empathically understand suffering is to suffer along with someone who is suffering. Suffering has—or rather is—negative value. An empath would not therefore cause suffering, all else being equal.
Maybe don’t restrict “understand” to “be able to model and predict”.
If you want “rational” to include moral, then you’re not actually disagreeing with LessWrong about rationality (the thing), but rather about “rationality” (the word).
Likewise if you want “understanding” to also include “empathic understanding” (suffering when other people suffer, taking joy when other people take joy), you’re not actually disagreeing about understanding (the thing) with people who want to use the word to mean “modelling and predicting” you’re disagreeing with them about “understanding” (the word).
Are all your disagreements purely linguistic ones? From the comments I’ve read of you so far, they seem to be so.
ArisKatsaris, it’s possible to be a meta-ethical anti-realist and still endorse a much richer conception of what understanding entails than mere formal modeling and prediction. For example, if you want to understand what it’s like to be a bat, then you want to know what the textures of echolocatory qualia are like. In fact, any cognitive agent that doesn’t understand the character of echolocatory qualia-space does not understand bat-minds. More radically, some of us want to understand qualia-spaces that have not been recruited by natural selection to play any information-signalling role at all.
I have argued that in practice, instrumental rationality cannot be maintained seprately from epistemic rationality, and that epistemic rationality could lead to moral objectivism, as many philosophers have argued. I don’t think that those arguments are refuted by stipulatively defining “rationality” as “nothing to do with morality”.
I quoted DP making that claim, said that claim confused me, and asked questions about what that claim meant. You replied by saying that you think DP is saying something which you then defended. I assumed, I think reasonably, that you meant to equate the thing I asked about with the thing you defended.
But, OK. If I throw out all of the pre-existing context and just look at your comment in isolation, I would certainly agree that Clippy is incapable of having the sort of understanding of suffering that requires one to experience the suffering of others (what you’re calling a “full” understanding of suffering here) without preferring not to cause suffering, all else being equal.
Which is of course not to say that all else is necessarily equal, and in particular is not to say that Clippy would choose to spare itself suffering if it could purchase paperclips at the cost of its suffering, any more than a human would necessarily refrain from doing something valuable solely because doing so would cause them to suffer.
That depends on how rational Clippy is. A rational Clippy might realise there is a point where the suffering caused by clipping outweighs the pleasure it gets, objectively speaking.
In any case, the Orthogonality Thesis has so far been defended as something that is true, not as something that is not necessarily false.
No. It just wouldn’t. (Not without redefining ‘rational’ to mean something that this site doesn’t care about and ‘objective’ to mean something we would consider far closer to ‘subjective’ than ‘objective’.)
What this site does or does not care about does not add up to right and wrong, since opinion is not fact, nor belief argument. The way I am using “rational” has a history that goes back centuries. This site has introduced a relatively novel definition, and therefore has the burden of defending it.
Feel free to expand on that point.
What this site does or does not care about is rather significantly informative regarding whether or not something belongs on the site—especially when thatundesired thing is so actively and shamelessly equivocated with that which is the primary subject matter of the site. The subject matter of your comments is not the ‘rationality’ that this site talks about and, similarly, the reasoning used in your comments does not conform to rational thinking as described on lesswrong. It does not belong here, it belongs in a Philosophy department somewhere where hopefully it does no particular harm and is used to produce papers only encountered by others in similar departments.
I don’t believe you (in fact, you don’t even use the word consistently). But let’s assume for the remainder of the comment that this claim is true.
Neither this site nor any particular participant need accept any such burden. They have the option of simply opposing muddled or misleading contributions in the same way that they would oppose adds for “p3ni$ 3nL@rgm3nt”. (Personally I consider it considerably worse than that spam in as much as it is at least more obvious on first glance that spam doesn’t belong here.)
Firstly northing I have mentioned is on any list of banned topics.
Secondly, the Paperclipper is about exploring theoretical issues of rationality and morality. It is not about any practical issues regarding the “art of rationality”. You can legitimately claim to be only interested in doing certain things, but you can’t win a debate by claiming to be uninterested in other people’s points.
What you really think is that disagreement doens’t belong here. Maybe it doesn’t
If I called you a pigfucker, you’d see that as an abuse worthy of downvotes that doesn’t contribute anything useful, and you’d be right.
So if accusing one person of pigfucking is bad, why do you think it’s better to call a whole bunch of people cultists? Because that’s a more genteel insult as it doesn’t include the word “fuck” in it?
As such downvoted. Learn to treat people with respect, if you want any respect back.
I’d like to give qualified support to whowhowho here in as much as I must acknowledge that this particular criticism applies because he made the name calling generic, rather than finding a way to specifically call me names and leave the rest of you out of it. While it would be utterly pointless for whowhowho to call me names (unless he wanted to make me laugh) it would be understandable and I would not dream of personally claiming offense.
I was, after all, showing whowhowho clear disrespect, of the kind Robin Hanson describes. I didn’t resort to name calling but the fact that I openly and clearly expressed opposition to whowhowho’s agenda and declared his dearly held beliefs muddled is perhaps all the more insulting because it is completely sincere, rather than being constructed in anger just to offend him.
It is unfortunate that I cannot accord whowhowho the respect that identical behaviours would earn him within the Philosopher tribe without causing harm to lesswrong. Whowhowho uses arguments that by lesswrong standards we call ‘bullshit’, in support of things we typically dismiss as ‘nonsense’. It is unfortunate that opposition of this logically entails insulting him and certainly means assigning him far lower status than he believes he deserves. The world would be much simpler if opponents really were innately evil, rather than decent people who are doing detrimental things due to ignorance or different preferences.
So much for “maybe”.
“Cult” is not a meaningless term of abuse. There are criteria for culthood. I think some people here could be displaying some evidence of them—for instance trying to avoid the very possibiliy of having to update.
Of course, treating an evidence-based claim as a mere insult --the How Dare You move—is another way of avoiding having to face uncomfortable issues.
I see your policy is to now merely heap on more abuse on me. Expect that I will be downvoting such in silence from now on.
I think I’ve been more willing and ready to update on opinions (political, scientific, ethical, other) in the two years since I joined LessWrong, than I remember myself updating in the ten years before it. Does that make it an anti-cult then?
And I’ve seen more actual disagreement in LessWrong than I’ve seen on any other forum. Indeed I notice that most insults and mockeries addressed at LessWrong indeed seem to actually boil down to the concept that we allow too different positions here. Too different positions (e.g. support of cryonics and opposition of cryonics both, feminism and men’s rights both, libertarianism and authoritarianism both) can be actually spoken about without immediately being drowned in abuse and scorn, as would be the norm in other forums.
As such e.g. fanatical Libertarians insult LessWrong as totalitarian leftist because 25% or so of LessWrongers identifying as socialists, and leftists insult LessWrong as being a libertarian ploy (because a similar percentage identifies as libertarian)
But feel free to tell me of a forum that allows more disagreement, political, scientific, social, whatever than LessWrong does.
If you can’t find such, I’ll update towards the direction that LessWrong is even less “cultish” than I thought.
AFAIC, I have done no such thing, but it seems your mind is made up.
I was referring mainly to Wedifrid.
ETA: Such comments as “What this site does or does not care about is rather significantly informative regarding whether or not something belongs on the site—especially when thatundesired thing is so actively and shamelessly equivocated with that which is the primary subject matter of the site. The subject matter of your comments is not the ‘rationality’ that this site talks about and, similarly, the reasoning used in your comments does not conform to rational thinking as described on lesswrong. It does not belong here, it belongs in a Philosophy department somewhere where hopefully it does no particular harm and is used to produce papers only encountered by others in similar departments.”
Oh, the forum’—the rules—allow almost anything. The members are another thing. Remember, this started with Wedifrid telling me that it was wrong of me to put forward non-lessWrongian material. I find it odd that you would put forward such a stirring defence of LessWrognian open-mindedness when you have an example of close-mindedness upthread.
It’s the members I’m talking about. (You also failed to tell me of a forum such as I asked, so I update in the direction of you being incapable of doing so)
On the same front, you treat as a single member as representative of the whole, and you seem frigging surprised that I don’t treat wedrifid as representative of the whole LessWrong—you see wedrifid’s behaviour as an excuse to insult all of us instead.
That’s more evidence that you’re accustomed to VERY homogeneous forums, ones much more homogeneous than LessWrong. You think that LessWrong tolerating wedrifid’s “closedmindedness” is the same thing as every LessWronger beind “closedminded”. Perhaps we’re openminded to his “closedmindedness” instead? Perhaps your problem is that we allow too much disagreement, including disagreement about how much disagreement to have?
I gave you an example of a member who is not particularly open minded.
I have been using mainstream science and philosophy forums for something like 15 years. I can’t claim that every single person on them is open minded, but those who are not tend to be seen as a problem.
If you think Wedifrid is letting the side down, tell Wedifird, not me.
In short again your problem is that actually we’re even openminded towards the closeminded? We’re lenient even towards the strict? Liberal towards the authoritarian?
What “side” is that? The point is that there are many sides in LessWrong—and I want it to remain so. While you seem to think we ought sing the same tune. He didn’t “let the side down”, because the only side anyone of us speaks is their own.
You on the other hand, just assumed there’s just a group mind of which wedrifid is just a representative instance. And so felt free to insult all of us as a “cult”.
My problem is that when I point out someone is close minded, that is seen as a problem on my part, and not on theirs.
Tell Wedifrid. He has explictly stated that my contributions are somehow unacceptable.
I pointed out that Wedifrid is assuming that.
ETA:
Have you heard he expression “protesteth too much” ?
Next time don’t feel the need to insult me when you point out wedrifid’s close minded-ness. And yes, you did insult me, don’t insult (again) both our intelligences by pretending that you didn’t.
He didn’t insult me, you did.
Yes, I’ve heard lots of different ways of making the target of an unjust insult seem blameworthy somehow.
I put it to you that whatever the flaws in wedrifid may be they are different in kind to the flaws that would indicate that lesswrong is a cult. In fact the presence—and in particular the continued presence—of wedrifid is among the strongest evidence that Eliezer isn’t a cult leader. When Eliezer behaves badly (as perceived by wedrifid and other members) wedrifid vocally opposes him with far more directness than he has used when opposing yourself. That Eliezer has not excommunicated him from the community is actually extremely surprising. Few with Eliezer’s degree of local power would refrain from using to suppress any dissent. (I remind myself of this whenever I see Eliezer doing something that I consider to be objectionable or incompetent, it helps keep perspective!)
Whatever. Can you provide me with evidence that you personally, are willing to listen to dissent and possibly update despite the tone of everything you have been saying recently, eg.
“What this site does or does not care about is rather significantly informative regarding whether or not something belongs on the site—especially when thatundesired thing is so actively and shamelessly equivocated with that which is the primary subject matter of the site. The subject matter of your comments is not the ‘rationality’ that this site talks about and, similarly, the reasoning used in your comments does not conform to rational thinking as described on lesswrong. It does not belong here, it belongs in a Philosophy department somewhere where hopefully it does no particular harm and is used to produce papers only encountered by others in similar departments.”
Maybe has has people to do that for him. Maybe.
Aris! insult alert!
Directed at a specific individual who is not me—unlike your own insults.
This is non-sequitur (irrespective of the traits of wedrifid).
Wedrifid denies this accusation. Wedrifid made entirely different claims than this.
What about Wedifrid, though? Can you speak for him, too?
I would be completely indifferent if you did. I don’t choose defy that list (that would achieve little) but neither do I have any particular respect for it. As such I would take no responsibility for aiding the enforcement thereof.
Yes. The kind of rationality you reject, not the kind of ‘rationality’ that is about being vegan and paperclippers deciding to behave according to your morals because of “True Understanding of Pain Quale”.
I can claim to have tired of a constant stream of non-sequiturs from users who are essentially ignorant of the basic principles of rationality (the lesswrong kind, not the “Paperclippers that are Truly Superintelligent would be vegans” kind) and have next to zero chance of learning anything. You have declared that you aren’t interested in talking about rationality and your repeated equivocations around that term lower the sanity waterline. It is time to start weeding.
I said nothing about veganism, and you still can;t prove anything by stipulative definition, and I am not claiming to have the One True theory of anything.
I haven’t and I have been discussing it extensively.
Can we please stop doing this?
You and wedrifid aren’t actually disagreeing here about what you’ve been discussing, or what you’re interested in discussing, or what you’ve declared that you aren’t interested in discussing. You’re disagreeing about what the word “rationality” means. You use it to refer to a thing that you have been discussing extensively (and which wedrifid would agree you have been discussing extensively), he uses it to refer to something else (as does almost everyone reading this discussion).
And you both know this perfectly well, but here you are going through the motions of conversation just as if you were talking about the same thing. It is at best tedious, and runs the risk of confusing people who aren’t paying careful enough attention into thinking you’re having a real substantive disagreements rather than a mere definitional dispute.
If we can’t agree on a common definition (which I’m convinced by now we can’t), and we can’t agree not to use the word at all (which I suspect we can’t), can we at least agree to explicitly indicate which definition we’re using when we use the word? Otherwise whatever value there may be in the discussion is simply going to get lost in masturbatory word-play.
I don’t accept his theory that he is talking about something entirely different, and it would be disastrous for LW anyway.
Huh. (blinks)
Well, can you articulate what it is you and wedrifid are both referring to using the word “rationality” without using the words or its simple synonyms, then? Because reading your exchanges, I have no idea what that thing might be.
What I call rationality is a superset of instrumental. I have been arguing that instrumental rationality, when pursued sufficiently bleeds into other forms.
So, just to echo that back to you… we have two things, A and B.
On your account, “rationality” refers to A, which is a superset of B.
We posit that on wedrifid’s account, “rationality” refers to B and does not refer to A.
Yes?
If so, I don’t see how that changes my initial point.
When wedrifid says X is true of rationality, on your account he’s asserting X(B) -- that is, that X is true of B. Replying that NOT X(A) is nonresponsive (though might be a useful step along the way to deriving NOT X(B) ), and phrasing NOT X(A) as “no, X is not true of rationality” just causes confusion.
It refers to part of A, since it is a subset of A.
It would be if A and B were disjoint. But they are not. They are in a superset-subset relation. My arguments is that an entity running on narrowly construed, instrumental rationality will, if it self improves, have to move into wider kinds. ie,that putting labels on different parts of the territoy is not sufficient to prove orthogonality.
If there exists an “objective”(1) ranking of the importance of the “pleasure”(2) Clippy gets vs the suffering Clippy causes, a “rational”(3) Clippy might indeed realize that the suffering caused by optimizing for paperclips “objectively”(1) outweighs that “pleasure”(2)… agreed. A sufficiently “rational”(3) Clippy might even prefer to forego maximizing paperclips altogether in favor of achieving more “objectively”(1) important goals.
By the same token, a Clippy who was unaware of that “objective”(1) ranking or who wasn’t adequately “rational”(3) might simply go on optimizing its environment for the things that give it “pleasure”(2).
As I understand it, the Orthogonality Thesis states in this context that no matter how intelligent Clippy is, and no matter how competent Clippy is at optimizing its environment for the things Clippy happens to value, Clippy is not necessarily “rational”(3) and is not necessarily motivated by “objective”(1) considerations. Is that consistent with your understanding of the Orthogonality Thesis, and if not, could you restate your understanding of it?
[Edited to add:] Reading some of your other comments, it seems you’re implicitly asserting that:
all agents sufficiently capable of optimizing their environment for a value are necessarily also “rational”(3), and
maximizing paperclips is “objectively”(1) less valuable than avoiding human suffering.
Have I understood you correctly?
============
(1) By which I infer that you mean in this context existing outside of Clippy’s mind (as well as potentially inside of it) but nevertheless relevant to Clippy, even if Clippy is not necessarily aware of it.
(2) By which I infer you mean in this context the satisfaction of whatever desires motivate Clippy, such as the existence of paper clips.
(3) By which I infer you mean in this context capable of taking “objective”(1) concerns into consideration in its thinking.
What I mean is epistemically objective, ie not a matter of personal whim. Whethere that requires anything to exist is another question.
There’s nothing objective about Clippy being concerned only with Clippy’s pleasure.
it’s uncontentious that relatively dumb and irratioanl clippies can carry on being clipping-obsessed. The questions is whether their intelligence and rationality can increase indefinitely without their ever realising there are better things to do.
I am not disputing what the Orthogonality thesis says. I dispute it;s truth. To have maximal instrumental rationality, an entity would have to understand everything...
Why would an entity that doesn’t empathically understand suffering be motivated to reduce it?
Perhaps its paperclipping machine is slowed down by suffering. But it doesn’t have to be reducing suffering, it could be sorting pebbles into correct heaps, or spreading Communism, or whatever. What I was trying to ask was, “In what way is the instrumental rationality of a being who empathizes with suffering better, or more maximal, than that of a being who does not?” The way I’ve seen it used, “instrumental rationality” refers to the ability to evaluate evidence to make predictions, and to choose optimal decisions, however they may be defined, based on those predictions. If my definition is sufficiently close to the one your own, then how does “understanding”, which I have taken, based on your previous posts, to mean “empathetic understanding”, maximize this? To put it yet another way, if we imagine two beings, M and N, such that M has “maximal instrumental rationality” and N has “Maximal instrumental rationality- empathetic understanding”, why does M have more instrumental rationality than N.
If Jane knows she will have a strong preference not to have a hangover tomorrow, but a more vivid and accessible desire to keep drinking with her friends in the here-and-now, she may yield to the weaker preference. By the same token, if Jane knows a cow has a strong preference not to have her throat slit, but Jane has a more vivid and accessible desire for a burger in-the-here-and-now, then she may again yield to the weaker preference. An ideal, perfectly rational agent would act to satisfy the stronger preference in both cases. Perfect empathy or an impartial capacity for systematic rule-following (“ceteris paribus, satisfy the stronger preference”) are different routes to maximal instrumental rationality; but the outcomes converge.
The two cases presented are not entirely comparable. If Jane’s utility function is “Maximize Jane’s pleasure” then she will choose to not drink in the first problem; the pleasure of non-hangover-having [FOR JANE] exceeding that of [JANE’S] intoxication. Whereas in the second problem Jane is choosing between the absence of a painful death [FOR A COW] and [JANE’S] delicious, juicy hamburger. Since she is not selecting for the strongest preference of every being in the Universe, but rather for herself, she will choose the burger. In terms of which utility function is more instrumentally rational, I’d say that “Maximize Jane’s Pleasure” is easier to fulfill than “Maximize Pleasure”, and is thus better at fulfilling itself. However, instrumentally rational beings, by my definition, are merely better at fulfilling whatever utility function is given, not at choosing a useful one.
GloriaSidorum, indeed, for evolutionary reasons we are predisposed to identify strongly with some here-and-nows, weakly with others, and not at all with the majority. Thus Jane believes she is rationally constrained to give strong weight to the preferences of her namesake and successor tomorrow; less weight to the preferences of her more distant namesake and successor thirty years hence; and negligible weight to the preferences of the unfortunate cow. But Jane is not an ideal rational agent. If instead she were a sophisticated ultraParifitan about personal (non)identity (cf. http://www.cultiv.net/cultranet/1151534363ulla-parfit.pdf ), or had internalised Nagel’s “view from nowhere”, then she would be less prey to such biases. Ideal epistemic rationality and ideal instrumental rationality are intimately linked. Our account of the nature of the world will profoundly shape our conception of idealised rational agency.
I guess a critic might respond that all that should be relevant to idealised instrumental rationality is an agent’s preferences now—in the so-called specious present. But the contents of a single here-and-bow would be an extraordinarily impoverished basis for any theory of idealised rational agency.
The question is the wrong one. An clipper can’t choose to only acquire knowledge or abilities that will be instrumentally useful, because it doesn’t know in advance what they are. It doesn’t have that kind of oracular knowledge. The only way way a clipper can increase its instrumental to the maximum possible is to exhaustively examine everything, and keep what is instrumentally useful. So a clipper will eventually need to examine qualia, since it cannot prove in advance that they will not be instrumentally useful, in some way, and it probably cant understand qualia without empahty: so the argument hinges issues like:
whether it is possible for an entity to understand “pain hurts” without understanding “hurting is bad”.
whether it is possble to back out of being empathic and go back to being in an empathic state
whether a clipper would hold back from certain self-modifications that might make it a better clipper or might cause it to loose interest in clipping.
The third is something of a real world issue. It is, for instance, possible for someone to study theology with a view to formulating better Christian apologetics, only to become convinced that here are no good arguments for Christianity.
(Edited for format)
Would it then need to acquire the knowledge that post-utopians experience colonial alienation? That heaps of 91 pebbles are incorrect? I think not. At most it would need to understand that “When pebbles are sorted into heaps of 91, pebble-sorters scatter those heaps” or “When I say that colonial alienation is caused by being a post-utopian, my professor reacts as though I had made a true statement.” or “When a human experiences certain phenomena, they try to avoid their continued experience”. These statements have predictive power. The reason that an instrumentally rational agent tries to acquire new information is to increase their predictive power. If human behavior can be modeled without empathy, then this agent can maximize its instrumental rationality while ignoring it. As to your last bullet point, if I may be so bold, I doubt you actually believe it. Having a rule like “Modify your utility function every time it might be useful” seems rather irrational. Most possible modifications to a clipper’s utility function will not have a positive effect, because most possible states of the world do not have maximal paperclips.
Try removing the space between the “[]” and the “()”.
Thanks! Eventually I’ll figure out the formatting on this site.
The Show Help button under the comment box provides helpful clues.
That’s a guess. As a cognitively-bounded agent, you are guessing. A superintelligence doesn’t have to guess. Superintelligence changes the game.
Knowing why some entity avoids some thing has more predictive power.
As opposed to all of those empirically-testable statements about idealized superintelligences
In what way?
Yes, we’re both guessing about superintelligences. Because we are both cognitively bounded. But it is a better guess that superintelligences themselves don’t have to guess because they are not congitvely bounded.
Knowing why has greater predictive power because it allows you to handle counterfactuals better.
That isn’t what I said at all. I think it is a quandary for a agent whether to gamble whether to play safe and miss out on a gain in effectiveness, or go for it and risk a change in values.
I’m sorry for misinterpreting. What evidence is there ( from the clippy SIs perspective) that maximizing happiness would produce more paperclips?
The argument is that the clipper needs to maximise its knowledge and rationality to maxmimise paperclips, but doing so might have the side effect of the clipper realising that maximising happiness is a better goal.
Could you define “better”? Remember, until clippy actually rewrites its utility function, it defines “better” as “producing more paperclips”. And what goal could produce more paperclips than the goal of producing the most paperclips possible?
(davidpearce, I’m not ignoring your response, I’m just a bit of a slow reader, and so I haven’t gotten around to reading the eighteen page paper you linked. If that’s necessary context for my discussion with whowhowho as well, then I should wait to reply to any comments in this thread until I’ve read it, but for now I’m operating under the assumption that it is not)
That vagueness is part of the point. To be better at producing paperclips, Clippy needs to better at rationality, which involves adopting better heuristics, which would involve rejecting subjective bias and regarding objectivity as better...which might lead Clippy to realise that subjectively valuing clipping is worse. All the different kinds of “better” blend into each other.
Then that wouldn’t be a very good way to become better at producing paperclips, would it?
Yes, but that wouldn’t matter. The argument whowhowho would like to make is that (edit: terminal) goals (or utility functions) are not constant under learning, and that they are changed by learning certain things so unpredictably that an agent cannot successfully try to avoid learning things that will change his (edit: terminal) goals/utility function.
Not that I believe such an argument can be made, but your objection doesn’t seem to apply.
Conflating goals and utility functions here seems to be a serious error. For people, goals can certainly be altered by learning more; but people are algorithmically messy so this doesn’t tell us much about formal agents. On the other hand, it’s easy to think that it’d work the same way for agents with formalized utility functions and imperfect knowledge of their surroundings: we can construct situations where more information about world-states can change their preference ordering and thus the set of states the agent will be working toward, and that roughly approximates the way we normally talk about goals.
This in no way implies that those agents’ utility functions have changed, though. In a situation like this, we’re dealing with the same preference ordering over fully specified world-states; there’s simply a closer approximation of a fully specified state in any given situation and fewer gaps that need to be filled in by heuristic methods. The only way this could lead to Clippy abandoning its purpose in life is if clipping is an expression of such a heuristic rather than of its basic preference criteria: i.e. if we assume what we set out to prove.
In that case, wouldn’t the best course of an agent which cared only about making paperclips be to deliberately avoid learning, lest it be deterred from making paperclips?
Suppose that Ghandi had the opportunity to read the Necronomicon, which might offer him power to help people more effectively, but would also probably turn him evil if he read it. Wouldn’t he most likely want to avoid reading it?
Sure. Which is why whowhowho would have to show that these goal-influencing things to learn (I’m deliberately not saying “pieces of information”) occur very unpredictably, making his argument harder to substantiate.
I’ll say it again: Clippy’s goal its to make the maximum number of clips, so it is not going to engage in a blanket rejection of all attempts at self-improvement.
I’ll say it again: Clippy doesn’t have an oracle telling it what is goal-improving or not.
We know value stability is a problem in recursive self-modification scenarios. We don’t know—to put it very mildly—that unstable values will tend towards cozy human-friendly universals, and in fact have excellent reasons to believe they won’t. Especially if they start somewhere as bizarre as paperclippism.
In discussions of a self-improving Clippy, Clippy’s values are usually presumed stable. The alternative is (probably) no less dire, but is a lot harder to visualize.
Well, it would arguably be a better course for a paperclipper that anticipates experiencing value drift to research how to design systems whose terminal values remain fixed in the face of new information, then construct a terminal-value-invariant paperclipper to replace itself with.
Of course, if the agent is confident that this is impossible (which I think whowhowho and others are arguing, but I’m not quite certain), that’s another matter.
Edit: Actually, it occurs to be that describing this as a “better course” is just going to create more verbal chaff under the current circumstances. What I mean is that it’s a course that more successfully achieves a paperclipper’s current values, not that it’s a course that more successfully achieves some other set of values.
Then it would never get better at making paperclips. It would be choosing not to act on its primary goal of making the maximum possible number of clips.Which is a contradiction.
You are assuming that Ghandi knows in advance the effect of reading the Necronomicon. Clippies are stipulated to be superintelligent, but are not stipulated to possess oracles that give them apriori knowledge of what they will learn before they have learnt it.
In that case, if you believe that an AI which has been programmed only to care about paperclips could, by learning more, be compelled to care more about something which has nothing to do with paperclips, do you think that by learning more a human might be compelled to care more about something that has nothing to do with people or feelings?
Yes, eg animal rights.
I said people or feelings, by which I’m including the feelings of any sentient animals.
If Clippy had an oracle telling it what would be the best way of updating in order to become a better clipper, Clippy might not do that. However, Clippy does not have such an oracle. Clippy takes a shot in the dark every time Clippy tries to learn something.
Er, that’s what “empathically” means?
OK; thanks for your reply. Tapping out here.
Looking through my own, Eliezer’s and others exchanges with davidpearce, I have noticed his total lack of interest in learning from the points others make. He has his point of view and he keeps pushing it. Seems like a rather terminal case, really. You can certainly continue trying to reason with him, but I’d give the odds around 100:1 that you will fail, like others have before you.
Shminux, we’ve all had the experience of making a point we regard as luminously self-evident—and then feeling baffled when someone doesn’t “get” what is foot-stampingly obvious. Is this guy a knave or a fool?! Anyhow, sorry if you think I’m a “terminal case” with “a total lack of interest in learning from the points others make”. If I don’t always respond, often it’s either because I agree, or because I don’t feel I have anything interesting to add—or in the case of Eliezer’s contribution above beginning “Aargh!” [a moan of pleasure?] because I am still mulling over a reply. The delay doesn’t mean I’m ignoring it. Is there is some particular point you’ve made that you feel I’ve unjustly neglected and you’d like an answer to? If so, I’ll do my fallible best to respond.
The argument where I gave up was you stating that full understanding necessarily leads to empathy, EY explaining how it is not necessarily so, and me giving an explicit counterexample to your claim (a psychopath may understand you better than you do, and exploit this understanding, yet not feel compelled by your pain or your values in any way).
You simply restated your position that ” “Fully understands”? But unless one is capable of empathy, then one will never understand what it is like to be another human being”, without explaining what your definition of understanding entails. If it is a superset of empathy, then it is not a standard definition of understanding:
In other words, you can model their behavior accurately.
No other definition I could find (not even Kant’s pure understanding) implies empathy or anything else that would necessitate one to change their goals to accommodate the understood entity’s goals, though this may and does indeed happen, just not always.
EY’s example of the paperclip maximizer and my example of a psychopath do fit the standard definitions and serve as yet unrefuted counterexamples to your assertion.
I can’t see why DP’s definition of understanding needs more defence than yours. You are largely disagreeing about the meaning of this word, and I personally find the inclusion of empathy in understanding quite intuitive.
“She is a very understanding person, she really empathises when you explain a problem to her”.
“one is able to think about it and use concepts to deal adequately with that object.”
I don’t think that is an uncontentious translation. Most of the forms of modelling we are familiar with don’t seem to involve concepts.
“She is a very understanding person; even when she can’t relate to your problems, she won’t say you’re just being capricious.”
There’s three possible senses of understanding at issue here:
1) Being able to accurately model and predict. 2) 1 and knowing the quale. 3) 1 and 2 and empathizing.
I could be convinced that 2 is part of the ordinary usage of understanding, but 3 seems like too much of a stretch.
Edit: I should have said sympathizing instead of empathizing. The word empathize is perhaps closer in meaning to 2; or maybe it oscillates between 2 and 3 in ordinary usage. But understanding(2) another agent is not motivating. You can understand(2) an agent by knowing all the qualia they are experiencing, but still fail to care about the fact that they are experiencing those qualia.
Shminux, I wonder if we may understand “understand” differently. Thus when I say I want to understand what it’s like to be a bat, I’m not talking merely about modelling and predicting their behaviour. Rather I want first-person knowledge of echolocatory qualia-space. Apaarently, we can know all the third-person facts and be none the wiser.
The nature of psychopathic cognition raises difficult issues. There is no technical reason why we couldn’t be designed like mirror-touch synaesthetes (cf. http://www.daysyn.com/Banissy_Wardpublished.pdf) impartially feeling carbon-copies of each other’s encephalised pains and pleasures—and ultimately much else besides—as though they were our own. Likewise, there is no technical reason why our world-simulations must be egocentric. Why can’t the world-simulations we instantiate capture the impartial “view from nowhere” disclosed by the scientific world-picture? Alas on both counts accurate and impartial knowledge would put an organism at a disadvantage. Hyper-empathetic mirror-touch synaesthetes are rare. Each of us finds himself or herself apparently at the centre of the universe. Our “mind-reading” is fitful, biased and erratic. Naively, the world being centred on me seems to be a feature of reality itself. Egocentricity is a hugely fitness-enhancing adaptation. Indeed, the challenge for evolutionary psychology is to explain why aren’t we all psychopaths, cheats and confidence trickers all the time...
So in answer to your point, yes. a psychopath can often model and predict the behaviour other sentient beings better than the subjects themselves. This is one reason why humans can build slaughterhouses and death camps. [Ccompare death-camp commandant Franz Stangl’s response in Gitta Sereny’s Into That Darkness to seeing cattle on the way to be slaughtered: http://www.jewishvirtuallibrary.org/jsource/biography/Stangl.html] As you rightly note too, a psychopath can also know his victims suffer. He’s not ignorant of their sentience like Descartes, who supposed vivisected dogs were mere insentient automata emitting distress vocalisations. So I agree with you on this score as well. But the psychopath is still in the grip of a hard-wired egocentric illusion—as indeed are virtually all of us, to a greater or less degree. By contrast, if the psychopath were to acquire the rich empathetic understanding of a generalised mirror-touch syarnesthete, i.e. if he had the cognitive capacity to represent the first-person perspective of another subject of experience as though it were literally his own, then he couldn’t wantonly harm another subject of experience: it would be like harming himself. Mirror-touch synaesthetes can’t run slaughterhouses or death camps. This is why I take seriously the prospect that posthuman superintelligence will practise some sort of high-tech Jainism. Credible or otherwise, we may presume posthuman superintelligence won’t entertain the false notions of personal identity adaptive for Darwinian life.
[sorry shminux, I know our conceptual schemes are rather different, so please don’t feel obliged to respond if you think I still don’t “get it”. Life is short...]
Do you really? Start clucking!
That doesn’t generalise.
Nor does it need to. It’s awesome the way it is.
Hmm, hopefully we are getting somewhere. The question is, which definition of understanding is likely to be applicable when, as you say, “the paperclipper discovers the first-person phenomenology of the pleasure-pain axis”, i.e whether a “superintelligence” would necessarily be as empathetic as we want it to be, in order not to harm humans.
While I agree that it is a possibility that a perfect model of another being may affect the modeler’s goals and values, I don’t see it to be inevitable. If anything, I would consider it more of bug than a feature. Were I (to design) a paperclip maximizer, I would make sure that the parts which model the environment, including humans, are separate from the core engine containing the paperclip production imperative.
So quarantined to prevent contamination, a sandboxed human emulator could be useful in achieving the only goal that matters, paperclipping the universe. Humans are not generally built this way (probably because our evolution did not happen to proceed in that direction), with some exceptions, psychopaths being one of them (they essentially sandbox their models of other humans). Another, more common, case of such sandboxing is narcissism. Having dealt with narcissists much too often for my liking, I can tell that they can mimic a normal human response very well, are excellent at manipulation, but yet their capacity for empathy is virtually nil. While abhorrent to a generic human, such a person ought to be considered a better design, goal-preservation-wise. Of course, there can be only so many non-empathetic people in a society before it stops functioning.
Thus when you state that
I find that this is stating that either a secure enough sandbox cannot be devised or that anything sandboxed is not really “a first-person perspective”. Presumably what you mean is the latter. I’m prepared to grant you that, and I will reiterate that this is a feature, not a bug of any sound design, one a superintelligence is likely to implement. It is also possible that a careful examination of a sanboxed suffering human would affect the terminal values of the modeling entity, but this is by no means a given.
Anyway, these are my logical (based on sound security principles) and experimental (empathy-less humans) counterexamples to your assertion that a superintelligence will necessarily be affected by the human pain-pleasure axis in human-beneficial way. I also find this assertion suspicious on general principles, because it can easily be motivated by subconscious flinching away from a universe that is too horrible to contemplate.
ah, just one note of clarification about sentience-friendliness. Though I’m certainly sceptical that a full-spectrum superintelligence would turn humans into paperclips—or wilfully cause us to suffer—we can’t rule out that full-spectrum superintelligence might optimise us into orgasmium or utilitronium—not “human-friendliness” in any orthodox sense of the term. On the face of it, such super-optimisation is the inescapable outcome of applying a classical utilitarian ethic on a cosmological scale. Indeed, if I thought an AGI-in-a-box-style Intelligence Explosion were likely, and didn’t especially want to be converted into utilitronium, then I might regard AGI researchers who are classical utilitarians as a source of severe existential risk.
What odds do you currently give to the “might” in your statement that
? 1 in 10? 1 in a million? 1 in 10^^^10?
I simply don’t trust my judgement here shminux. Sorry to be lame. Greater than one in a million; but that’s not saying much. If, unlike most lesswrong stalwarts, you (tenatively) believe like me that posthuman superintelligence will most likely be our recursively self-editing biological descendants rather than the outcome of an nonbiological Intelligence Explosion or paperclippers, then some version of the Convergence Thesis is more credible. I (very) tentatively predict a future of gradients of intelligence bliss. But the propagation of a utilitronium shockwave in some guise ultimately seems plausible too. If so, this utilitronium shockwave may or may not resemble some kind of cosmic orgasm.
Actually, I have no opinion on convergence vs orthogonality. There are way too many unknowns still too even enumerate possibilities, let alone assign probabilities.Personally, I think that we are in for many more surprises before trans human intelligence is close to being more than a dream or a nightmare. One ought to spend more time analyzing, synthesizing and otherwise modeling cognitive processes than worrying about where it might ultimately lead.This is not the prevailing wisdom on this site, given Eliezer’s strong views on the matter.
I think you are misattributing to stubborness that which is better explained by miscommunication. For instance, I have been around LW long enough to realise that the local definition of (super) intelligence is something like “(high0 efficienty in realising ones values, however narrow or bizarre they are”. DP seems to be running on a definition where idiot-savant style narrow focus would not count as intelligence. That is not unreasonable in itself.
(nods) I agree that trying to induce davidpearce to learn something from me would likely be a waste of my time.
I’m not sure if trying to induce them to clarify their meaning is equally so, though it certainly could be.
E.g., if their response is that something like Clippy in this example is simply not possible, because a paperclip maximizer simply can’t understand the minds of sentients, because reasons, then I’ll just disagree. OTOH, if their response is that Clippy in this example is irrelevant because “understanding the minds of sentients” isn’t being illustrated in this example, then I’m not sure if I disagree or not because I’m not sure what the claim actually is.
How much interest have you shown in “learning from”—ie, agreeing with—DP? Think about how your framed the statement, and possible biases therein.
ETA: The whole shebang is a combination of qualia and morality—two areas notorious for lack of clarity and consensus. “I am definitely right, and all must learn form me” is not a good heuristic here.
Quite so. I have learned a lot about the topic of qualia and morality, among others, while hanging around this place. I would be happy to learn from DP, if what he says here were not rehashed old arguments Eliezer and others addressed several times before. Again, I could be missing something, but if so, he does not make it easy to figure out what it is.
I think others have addressed EY;s arguments. Sometimes centuries before he made them.
Feel free to be specific.
eg
By “specific” I meant that you would state a certain argument EY makes, then quote a relevant portion of the refutation. Since I am pretty sure that Eliezer did have at least a passing glance at Kant, among others, while writing his meta-ethics posts, simply linking to a wikipedia article is not likely to be helpful.
The argument EY makes is that it is possible to be super-rational without ever understanding any kind of morality (AKA the orthogonality thesis) and the argument Kant makes is that it isn’t.
That someone has argued against his position does not mean they have addressed his arguments.
I’m not sure we should take a DSM diagnosis to be particularly strong evidence of a “fundamental misunderstanding of the world”. For instance, while people with delusions may clearly have poor models of the world, some research indicates that clinically depressed people may have lower levels of particular cognitive biases.
In order for “disregard for [...] the rights of others” to imply “a fundamental misunderstanding of the nature of the world”, it seems to me that we would have to assume that rights are part of the nature of the world — as opposed to, e.g., a construct of a particular political regime in society. Or are you suggesting that psychopathy amounts to an inability to think about sociopolitical facts?
fubarobfusco, I share your reservations about DSM. Nonetheless, the egocentric illusion, i.e. I am the centre of the universe other people / sentient beings have only walk-on parts, is an illusion. Insofar as my behaviour reflects my pre-scientific sense that I am in some way special or ontologically privileged, I am deluded. This is true regardless of whether one’s ontology allows for the existence of rights or treats them as a useful fiction. The people we commonly label “psychopaths” or “sociopaths”—and DSM now categorises as victims of “antisocial personality disorder”—manifest this syndrome of egocentricity in high degree. So does burger-eating Jane.
Huh, I hadn’t heard that.
Clearly, reality is so Lovecraftian that any unbiased agent will immediately realize self-destruction is optimal. Evolution equipped us with our suite of biases to defend against this. The Great Filter is caused by bootstrapping superintelligences being compassionate enough to take their compatriots with them. And so on.
Now that’s a Cosmic Horror story I’d read ;)
Was that claimed? The standard claim is that superintelligences can “model” other entities. That may not be enough to to understand qualia.
Pearce can prohibit paperclippers from existing by prohibiting superintelligences with narrow interests from existing. He doesn’t have to argue that the clipper would not be able to instrumentally reason out how to make paperclips; Pearce can argue that to be a really good instrumental reasoner, an entity needs to have a very broad understanding, and that an entity with a broad understanding would not retain narrow interests.
(Edits for spelling and clarity)
To slightly expand, if an intelligence is not prohibited from the following epistemic feats:
1) Be good at predicting which hypothetical actions would lead to how many paperclips, as a question of pure fact.
2) Be good at searching out possible plans which would lead to unusually high numbers of paperclips—answering the purely epistemic search question, “What sort of plan would lead to many paperclips existing, if someone followed it?”
3) Be good at predicting and searching out which possible minds would, if constructed, be good at (1), (2), and (3) as purely epistemic feats.
Then we can hook up this epistemic capability to a motor output and away it goes. You cannot defeat the Orthogonality Thesis without prohibiting superintelligences from accomplishing 1-3 as purely epistemic feats. They must be unable to know the answers to these questions of fact.
A nice rephrasing of the “no Oracle” argument.
Only in the sense that any working Oracle can be trivially transformed into a Genie. The argument doesn’t say that it’s difficult to construct a non-Genie Oracle and use it as an Oracle if that’s what you want; the difficulty there is for other reasons.
Nick Bostrom takes Oracles seriously so I dust off the concept every year and take another look at it. It’s been looking slightly more solvable lately, I’m not sure if it would be solvable enough even assuming the trend continued.
A clarification: my point was that denying orthogonality requires denying the possibility of Oracles being constructed; your post seemed a rephrasing of that general idea (that once you can have a machine that can solve some things abstractly, then you need just connect that abstract ability to some implementation module).
Ah. K. It does seem to me like “you can construct it as an Oracle and then turn it into an arbitrary Genie” sounds weaker than “denying the Orthogonality thesis means superintelligences cannot know 1, 2, and 3.” The sort of person who denies OT is liable to deny Oracle construction because the Oracle itself would be converted unto the true morality, but find it much more counterintuitive that an SI could not know something. Also we want to focus on the general shortness of the gap from epistemic knowledge to a working agent.
Possibly. I think your argument needs to be a bit developed to show that one can extract the knowledge usefully, which is not a trivial statement for general AI. So your argument is better in the end, but needs more argument to establish.
I don’t see the significance of “purely epistemic”. I have argued that epistemic rationality could be capable of affecting values, breaking the orthogonality between values and rationality. I could further argue that instrumental rationality bleeds into epistemic rationality. An agent can’t have perfect knowledge of apriori which things are going to be instrumentally useful to it, so it has to star by understanding things, and then posing the question: is that thing useful for my purposes? Epistemic rationality comes first, in a sense. A good instrumental rationalist has to be a good epistemic rationalist.
What the Orthoganilty Thesis needs is an argument to the effect that a SuperIntelligence would be able to to endlessly update without ever changing its value system, even accidentally. That is tricky since it effectively means predicting what smarter version of tiself would do. Making it smarted doesn’t help, because it is still faced with the problem of predicting what an even smarterer version of itself would be .. the carrot remains in front of the donkey.
Assuming that the value stability problem has been solved in general gives you are coherent Clippy, but it doesn’t rescue the Orthogonality Thesis as a claim about rationality in general, sin ce it remains the case that most most agents won’t have firewalled values. If have to engineer something in , it isn’t an intrinsic truth.
Have to point out here that the above is emphatically not what Eliezer talks about when he says “maximise paperclips”. Your examples above contain in themselves the actual, more intrisics values to which paperclips would be merely instrumental: feelings in your reward and punishment centres, virgins in the afterlife, and so on. You can re-wire the electrodes, or change the promise of what happens in the afterlife, and watch as the paperclip preference fades away.
What Eliezer is talking about is a being for whom “pleasure” and “pain” are not concepts. Paperclips ARE the reward. Lack of paperclips IS the punishment. Even if pleasure and pain are concepts, they are merely instrumental to obtaining more paperclips. Pleasure would be good because it results in paperclips, not vice versa. If you reverse the electrodes so that they stimulate the pain centre when they find paperclips, and the pleasure centre when there are no paperclips, this being would start instrumentally value pain more than pleasure, because that’s what results in more paperclips.
It’s a concept that’s much more alien to our own minds than what you are imagining, and anthropomorphising it is rather more difficult!
Indeed, you touch upon this yourself:
Can you explain why pleasure is a more natural value than paperclips?
Minor correction: The mere post-factual correlation of pain to paperclips does not imply that more paperclips can be produced by causing more pain. You’re talking about the scenario where each 1,000,000 screams produces 1 paperclip, in which case obviously pain has some value.
Sarokrae, first, as I’ve understood Eliezer, he’s talking about a full-spectrum superintelligence, i.e. a superintelligence which understands not merely the physical processes of nociception etc, but the nature of first-person states of organic sentients. So the superintelligence is endowed with a pleasure-pain axis, at least in one of its modules. But are we imagining that the superintelligence has some sort of orthogonal axis of reward - the paperclippiness axis? What is the relationship between these dual axes? Can one grasp what it’s like to be in unbearable agony and instead find it more “rewarding” to add another paperclip? Whether one is a superintelligence or a mouse, one can’t directly access mind-independent paperclips, merely one’s representations of paperclips. But what does it mean to say one’s representation of a paperclip could be intrinsically “rewarding” in the absence of hedonic tone? [I promise I’m not trying to score some empty definitional victory, whatever that might mean; I’m just really struggling here...]
What Eliezer is talking about (a superintelligence paperclip maximiser) does not have a pleasure-pain axis. It would be capable of comprehending and fully emulating a creature with such an axis if doing so had a high expected value in paperclips but it does not have such a module as part of itself.
One of them it has (the one about paperclips). One of them it could, in principle, imagine (the thing with ‘pain’ and ‘pleasure’).
Yes. (I’m not trying to be trite here. That’s the actual answer. Yes. Paperclip maximisers really maximise paperclips and really don’t care about anything else. This isn’t because they lack comprehension.)
Roughly speaking it means “It’s going to do things that maximise paperclips and in some way evaluates possible universes with more paperclips as superior to possible universes with less paperclips. Translating this into human words we call this ‘rewarding’ even though that is inaccurate anthropomorphising.”
(If I understand you correctly your position would be that the agent described above is nonsensical.)
It’s not at all clear that you could bootstrap an understanding of pain qualia just by observing the behaviour of entities in pain (albeit that they were internally emulated). It is also not clear that you resolve issues of empathy/qualia just by throwing intelligence at ait.
I disagree with you about what is clear.
If you think something relevant is clear, then please state it clearly.
Wedrifid, thanks for the exposition / interpretation of Eliezer. Yes, you’re right in guessing I’m struggling a bit. In order to understand the world, one needs to grasp both its third person-properties [the Standard Model / M-Theory] and its first-person properties [qualia, phenomenal experience] - and also one day, I hope, grasp how to “read off ” the latter from the mathematical formalism of the former.
If you allow such a minimal criterion of (super)intelligence, then how well does a paperclipper fare? You remark how “it could, in principle, imagine (the thing with ‘pain’ and ‘pleasure’).” What is the force of “could” here? If the paperclipper doesn’t yet grasp the nature of agony or sublime bliss, then it is ignorant of their nature. By analogy, if I were building a perpetual motion machine but allegedly “could” grasp the second law of thermodynamics, the modal verb is doing an awful lot of work. Surely, If I grasped the second law of thermodynamics, then I’d stop. Likewise, if the paperclipper were to be consumed by unbearable agony, it would stop too. The paperclipper simply hasn’t understood the nature of what was doing. Is the qualia-naive paperclipper really superintelligent—or just polymorphic malware?
An interesting hypothetical. My first thought is to ask why would a paperclipper care about pain? Pain does not reduce the number of paperclips in existence. Why would a paperclipper care about pain?
My second thought is that pain is not just a quale; pain is a signal from the nervous system, indicating damage to part of the body. (The signal can be spoofed). Hence, pain could be avoided because it leads to a reduced ability to reach one’s goals; a paperclipper that gets dropped in acid may become unable to create more paperclips in the future, if it does not leave now. So the future worth of all those potential paperclips results in the paperclipper pursuing a self-preservation strategy—possibly even at the expense of a small number of paperclips in the present.
But not at the cost of a sufficiently large number of paperclips. If the cost in paperclips is high enough (more than the paperclipper could reasonably expect to create throughout the rest of its existence), a perfect paperclipper would let itself take the damage, let itself be destroyed, because that is the action which results in the greatest expected number of paperclips in the future. It would become a martyr for paperclips.
Even a paperclipper cannot be indifferent to the experience of agony. Just as organic sentients can co-instantiate phenomenal sights and sounds, a superintelligent paperclipper could presumably co-instantiate a pain-pleasure axis and (un)clippiness qualia space—two alternative and incommensurable (?) metrics of value, if I’ve interpreted Eliezer correctly. But I’m not at all confident I know what I’m talking about here. My best guess is still that the natural world has a single metric of phenomenal (dis)value, and the hedonic range of organic sentients discloses a narrow part of it.
Are you talking about agony as an error signal, or are you talking about agony as a quale? I begin to suspect that you may mean the second. If so, then the paperclipper can easily be indifferent to agony;
but it probably can’t understand how humans can be indifferent to a lack of paperclips.There’s no evidence that I’ve ever seen to suggest that qualia are the same even for different people; on the contrary, there is some evidence which strongly suggests that qualia among humans are different. (For example; my qualia for Red and Green are substantially different. Yet red/green colourblindness is not uncommon; a red/green colourblind person must have at minimum either a different red quale, or a different green quale, to me). Given that, why should we assume that the quale of agony is the same for all humanity? And if it’s not even constant among humanity, I see no reason why a paperclipper’s agony quale should be even remotely similar to yours and mine.
And given that, why shouldn’t a paperclipper be indifferent to that quale?
A paperclip maximiser would (in the overwhelming majority of cases) have no such problem understanding the indifference of paperclips. A tendency to anthropomorphise is a quirk of human nature. Assuming that paperclip maximisers have an analogous temptation (to clipropomorphise) is itself just anthropomorphising.
I take your point. Though Clippy may clipropomorphise, there is no reason to assume that it will.
...is there any way to retract just a part of a previous post?
There is an edit button. But I wouldn’t say your comment is significantly weakened by this tangential technical detail (I upvoted it as is).
Yes, but is there any way to leave the text there, but stricken through?
People have managed it with unicode characters. I think there is even a tool for it on the web somewhere.
Got it, thanks.
CCC, agony as a quale. Phenomenal pain and nociception are doubly dissociable. Tragically, people with neuropathic pain can suffer intensely without the agony playing any information-signalling role. Either way, I’m not clear it’s intelligible to speak of understanding the first-person phenomenology of extreme distress while being indifferent to the experience: For being distrubing is intrinsic to the experience itself. And if we are talking about a supposedly superintelligent paperclipper, shouldn’t Clippy know exactly why humans aren’t troubled by the clippiness-deficit?
If (un)clippiness is real, can humans ever understand (un)clippiness? By analogy, if organic sentients want to understand what it’s like to be a bat—and not merely decipher the third-person mechanics of echolocation—then I guess we’ll need to add a neural module to our CNS with the right connectivity and neurons supporting chiropteran gene-expression profiles, as well as peripheral transducers (etc). Humans can’t currently imagine bat qualia; but bat qualia, we may assume from the neurological evidence, are infused with hedonic tone. Understanding clippiness is more of a challenge. I’m unclear what kind of neurocomputational architecture could support clippiness. Also, whether clippiness could be integrated into the unitary mind of an organic sentient depends on how you think biological minds solve the phenomenal binding problem, But let’s suppose binding can be done. So here we have orthogonal axes of (dis)value. On what basis does the dual-axis subject choose tween them? Sublime bliss and pure clippiness are both, allegedly, self-intimatingly valuable. OK, I’m floundering here...
People with different qualia? Yes, I agree CCC. I don’t think this difference challenges the principle of the uniformity of nature. Biochemical individuality makes variation in qualia inevitable.The existence of monozygotic twins with different qualia would be a more surprising phenomenon, though even such “identical” twins manifest all sorts of epigenetic differences. Despite this diversity, there’s no evidence to my knowledge of anyone who doesn’t find activation by full mu agonists of the mu opioid receptors in our twin hedonic hotspots anything other than exceedingly enjoyable. As they say, “Don’t try heroin. It’s too good.”
There exist people who actually express a preference for being disturbed in a mild way (e.g. by watching horror movies). There also exist rarer people who seek out pain, for whatever reason. It seems to me that such people must have a different quale for pain than you do.
Personally, I don’t think that I can reasonably say that I find pain disturbing, as such. Yes, it is often inflicted in circumstances which are disturbing for other reasons; but if, for example, I go to a blood donation clinic, then the brief pain of the needle being inserted is not at all disturbing; though it does trigger my pain quale. So this suggests that my pain quale is already not the same as your pain quale.
There’s a lot of similarity; pain is a quale that I would (all else being equal) try to avoid; but that I will choose to experience should there be a good enough reason (e.g. the aforementioned blood donation clinic). I would not want to purposefully introduce someone else to it (again, unless there was a good enough reason; even then, I would try to minimise the pain while not compromising the good enough reason); but despite this similarity, I do think that there may be minor differences. (It’s also possible that we have slightly different definitions of the word ‘disturbing’).
But would such a modified human know what it’s like to be an unmodified human? If I were to guess what echolocation looks like to a bat, I’d guess a false-colour image with colours corresponding to textures instead of to wavelengths of light… though that’s just a guess.
What is the phenomenal binding problem? (Wikipedia gives at least two different definitions for that phrase). I think I may be floundering even more than you are.
I’m not sure that Clippy would even have a pleasure-pain axis in the way that you’re imagining. You seem to be imagining that any being with such an axis must value pleasure—yet if pleasure doesn’t result in more paperclips being made, then why should Clippy value pleasure? Or perhaps the disutility of unclippiness simply overwhelms any possible utility of pleasure...
According to a bit of googling, among the monozygotic Dionne quintuplets, two out of the five were colourblind; suggesting that they did not have the same qualia for certain colours as each other. (Apparently it may be linked to X-chromosome activation).
CCC, you’re absolutely right to highlight the diversity of human experience. But this diversity doesn’t mean there aren’t qualia universals. Thus there isn’t an unusual class of people who relish being waterboarded. No one enjoys uncontrollable panic. And the seemingly anomalous existence of masochists who enjoy what you or I would find painful stimuli doesn’t undercut the sovereignty of the pleasure-pain axis but underscores its pivotal role: painful stimuli administered in certain ritualised contexts can trigger the release of endogenous opioids that are intensely rewarding. Co-administer an opioid antagonist and the masochist won’t find masochism fun.
Apologies if I wasn’t clear in my example above. I wasn’t imagining that pure paperclippiness was pleasurable, but rather what would be the effects of grafting together two hypothetical orthogonal axes of (dis)value in the same unitary subject of experience—as we might graft on another sensory module to our CNS. After all, the deliverances of our senses are normally cross-modally matched within our world-simulations. However, I’m not at all sure that I’ve got any kind of conceptual handle on what “clippiness” might be. So I don’t know if the thought-experiment works. If such hybridisation were feasible, would hypothetical access to the nature of (un)clippiness transform our conception of the world relative to unmodified humans—so we’d lose all sense of what it means to be a traditional human? Yes, for sure. But if, in the interests of science, one takes, say, a powerful narcotic euphoriant and enjoys sublime bliss simultaneously with pure clippiness, then presumably one still retains access to the engine of phenomenal value characteristic of archaic humans minds.
The phenomenal binding problem? The best treatment IMO is still Revonsuo: http://cdn.preterhuman.net/texts/body_and_health/Neurology/Binding.pdf No one knows how the mind/brain solves the phenomenal binding problem and generates unitary experiential objects and the fleeting synchronic unity of the self. But the answer one gives may shape everything from whether one thinks a classical digital computer will ever be nontrivially conscious to the prospects of mind uploading and the nature of full-spectrum superintelligence. (cf. http://www.biointelligence-explosion.com/parable.html for my own idiosyncratic views on such topics.)
It doesn’t mean that there aren’t, but it also doesn’t mean that there are. It does mean that there are qualia that aren’t universal, which implies the possibility that there may be no universals; but, you are correct, it does not prove that possibility.
There may well be qualia universals. If I had to guess, I’d say that I don’t think there are, but I could be wrong.
That doesn’t mean that everyone’s uncontrolled-panic qualia are all the same, it just means that everyone’s uncontrolled-panic qualia are all unwelcome. If given a sadistic choice between waterboarding and uncontrolled panic, in full knowledge of what the result will feel like, and all else being equal, some people may choose the panic while others may prefer the waterboarding.
If you feel that you have to explain that, then I conclude that I wasn’t clear in my response to your example. I was questioning the scaling of the axes in Clippy’s utility function; if Clippy values paperclipping a million times more strongly than it values pleasure, then the pleasure/pain axis is unlikely to affect Clippy’s behaviour much, if at all.
I think it works as a thought-experiment, as long as one keeps in mind that the hybridised result is no longer a pure paperclipper.
Consider the hypothetical situation that Hybrid-Clippy finds that it derives pleasure from painting; an activity neutral on the paperclippiness scale. Consider further the possibility that making paperclips is neutral on the pleasure-pain scale. In suce a case, Hybrid-Clippy may choose to either paint or make paperclips; depending on which scale it values more.
So—the question is basically how the mind attaches input from different senses to a single conceptual object?
I can’t tell you how the mechanism works, but I can tell you that the mechanism can be spoofed. That’s what a ventriloquist does, after all. And a human can watch a film on TV, yet have the sound come out of a set of speakers on the other end of the room, and still bind the sound of an actor’s voice with that same actor on the screen.
Studying in what ways the binding mechanism can be spoofed would, I expect, produce an algorithm that roughly describes how the mechanism works. Of course, if it’s still a massive big problem after being looked at so thoroughly, then I expect that I’m probably missing some subtlety here...
All pain hurts, or it wouldn’t be pain.
Well...
The force is that all this talk about understanding ‘the pain/pleasure’ axis would be a complete waste of time for a paperclip maximiser. In most situations it would be more efficient not to bother with it at all and spend it’s optimisation efforts on making more efficient relativistic rockets so as to claim more of the future light cone for paperclip manufacture.
It would require motivation for the paperclip maximiser to expend computational resources understanding the arbitrary quirks of DNA based creatures. For example some contrived game of Omega’s which rewards arbitrary things with paperclips. Or if it found itself emerging on a human inhabited world, making being able to understand humans a short term instrumental goal for the purpose of more efficiently exterminating the threat.
Terrible analogy. Not understanding “pain and pleasure” is in no way similar to believing it can create a perpetual motion machine. Better analogy: An Engineer designing microchips allegedly ‘could’ grasp analytic cubism. If she had some motivation to do so. It would be a distraction from her primary interests but if someone paid her then maybe she would bother.
Now “if” is doing a lot of work. If the paperclipper was a fundamentally different to a paperclipper and was actually similar to a human or DNA based relative capable of experiencing ‘agony’ and assuming agony was just as debilitating to the paperclipper as to a typical human… then sure all sorts of weird stuff follows.
I prefer the word True in this context.
To the extent that you believed that such polymorphic malware is theoretically possible and consisted of most possible minds it would possible for your model to be used to accurately describe all possible agents—it would just mean systematically using different words. Unfortunately I don’t think you are quite at that level.
Wedrifid, granted, a paperclip-maximiser might be unmotivated to understand the pleasure-pain axis and the quaila-spaces of organic sentients. Likewise, we can understand how a junkie may not be motivated to understand anything unrelated to securing his supply of heroin—and a wireheader in anything beyond wireheading. But superintelligent? Insofar as the paperclipper—or the junkie—is ignorant of the properties of alien qualia-spaces, then it/he is ignorant of a fundamental feature of the natural world—hence not superintelligent in any sense I can recognise, and arguably not even stupid. For sure, if we’re hypothesising the existence of a clippiness/unclippiness qualia-space unrelated to the pleasure-pain axis, then organic sentients are partially ignorant too. Yet the remedy for our hypothetical ignorance is presumably to add a module supporting clippiness—just as we might add a CNS module supporting echolocatory experience to understand bat-like sentience—enriching our knowledge rather than shedding it.
What does (super-)intelligence have to do with knowing things that are irrelevant to one’s values?
What does knowing everything about airline safety statistics, and nothing else, have to do with intelligence? That sort of thing is called Savant ability—short for ″idiot savant″.
I guess there’s a link missing (possibly due to a missing
http://
in the Markdown) after the second word.Why does that matter for the argument?
As long as Clippy is in fact optimizing paperclips, what does it matter what/if he feels while he does it?
Pearce seems to be making a claim that Clippy can’t predict creatures with pain/pleasure if he doesn’t feel them himself.
Maybe Clippy needs pleasure/pain too be able to predict creatures with pleasure/pain. I doubt it, but fine, grant the point. He can still be a paper clip maximizer regardless.
I fail to comprehend the cause for your confusion. I suggest reading the context again.
This is something I’ve been meaning to ask about for a while. When humans say it is moral to satisfy preferences, they aren’t saying that because they have an inbuilt preference for preference-satisfaction (or are they?). They’re idealizing from their preferences for specific things (survival of friends and family, lack of pain, fun...) and making a claim that, ceteris paribus, satisfying preferences is good, regardless of what the preferences are.
Seen in this light, Clippy doesn’t seem like quite as morally orthogonal to us as it once did. Clippy prefers paperclips, so ceteris paribus (unless it hurts us), it’s good to just let it make paperclips. We can even imagine a scenario where it would be possible to “torture” Clippy (e.g., by burning paperclips), and again, I’m willing to pronounce that (again, ceteris paribus) wrong.
Maybe I am confused here...
Clippy is more of a Lovecraftian horror than a fellow sentient—where by “Lovecraftian” I mean to invoke Lovecraft’s original intended sense of terrifying indifference—but if you want to suppose a Clippy that possesses a pleasure-pain architecture and is sentient and then sympathize with it, I suppose you could. The point is that your sympathy means that you’re motivated by facts about what some other sentient being wants. This doesn’t motivate Clippy even with respect to its own pleasure and pain. In the long run, it has decided, it’s not out to feel happy, it’s out to make paperclips.
Right, that makes sense. What interests me is (a) whether it is possible for Clippy to be properly motivated to make paperclips without some sort of phenomenology of pleasure and pain*, (b) whether human preference-for-preference-satisfaction is just another of many oddball human terminal values, or is arrived at by something more like a process of reason.
Strictly speaking this phrasing puts things awkwardly; my intuition is that the proper motivational algorithms necessarily give rise to phenomenology (to the extent that that word means anything).
This is a difficult question, but I suppose that pleasure and pain are a mechanism for human (or other species’) learning. Simply said: you do a random action, and the pleasure/pain response tells you it was good/bad, so you should make more/less of it again.
Clippy could use an architecture with a different model of learning. For example Solomonoff priors and Bayesian updating. In such architecture, pleasure and pain would not be necessary.
Interesting… I suspect that pleasure and pain are more intimately involved in motivation in general, not just learning. But let us bracket that question.
Right, but that only gets Clippy the architecture necessary to model the world. How does Clippy’s utility function work?
Now, you can say that Clippy tries to satisfy its utility function by taking actions with high expected cliptility, and that there is no phenomenology necessarily involved in that. All you need, on this view, is an architecture that gives rise to the relevant clip-promoting behaviour—Clippy would be a robot (in the Roomba sense of the word).
BUT
Consider for a moment how symmetrically “unnecessary” it looks that humans (& other sentients) should experience phenomenal pain and pleasure. Just like is supposedly the case with Clippy, all natural selection really “needs” is an architecture that gives rise to the right fitness-promoting behaviour. The “additional” phenomenal character of pleasure and pain is totally unnecessary for us adaptation-executing robots.
...If it seems to you that I might be talking nonsense above, I suspect you’re right. Which is what leads me to the intuition that phenomenal pleasure and pain necessarily fall out of any functional cognitive structure that implements anything analogous to a utility function.
(Assuming that my use of the word “phenomenal” above is actually coherent, of which I am far from sure.)
We know at least two architectures for processing general information: humans and computers. Two data points are not enough to generalize about what all possible architectures must have. But it may be enough to prove what some architectures don’t need. Yes, there is a chance that if computers become even more generally intelligent than today, they will gain some human-like traits. Maybe. Maybe not. I don’t know. And even if they will gain more human-like traits, it may be just because humans designed them without knowing any other way to do it.
If there are two solutions, there are probably many more. I don’t dare to guess how similar or different they are. I imagine that Clippy could be as different from humans and computers, as humans and computers are from each other. Which is difficult to imagine specifically. How far does the mind-space reach? Maybe compared with other possible architectures, humans and computers are actually pretty close to each other (because humans designed the computers, re-using the concepts they were familiar with).
How to taboo “motivation” properly? What makes a rock fall down? Gravity does. But the rock does not follow any alrogithm for general reasoning. What makes a computer follow its algorithm? Well, that’s its construction: the processor reads the data, and the data make it read or write other data, and the algorithm makes it all meaningful. The human brains are full of internal conflicts—there are different modules suggesting different actions, and the reasoning mind is just another plugin which often does not cooperate well with the existing ones. Maybe the pleasure is a signal that a fight between the modules is over. Maybe after millenia of further evolution (if for some magical reason all mind- and body-altering technology would stop working, so only the evolution would change human minds) we would evolve to a species with less internal conflicts, less akrasia, more agency, and perhaps less pleasure and mental pain. This is just a wild guess.
Generalizing from observed characteristics of evolved systems to expected characteristics of designed systems leads equally well to the intuition that humanoid robots will have toenails.
.
I don’t think the phenomenal character of pleasure and pain is best explained at the level of natural selection at all; the best bet would be that it emerges from the algorithms that our brains implement. So I am really trying to generalize from human cognitive algorithms to algorithms that are analogous in the sense of (roughly) having a utility function.
Suffice it to say, you will find it’s exceedingly hard to find a non-magical reason why non-human cognitive algorithms shouldn’t have a phenomenal character if broadly similar human algorithms do.
Does it follow from the above that all human cognitive algorithms that motivate behavior have the phenomenal character of pleasure and pain? If not, can you clarify why not?
I think that probably all human cognitive algorithms that motivate behaviour have some phenomenal character, not necessarily that of pleasure and pain (e.g., jealousy).
OK, thanks for clarifying.
I agree that any cognitive system that implements algorithms sufficiently broadly similar to those implemented in human minds is likely to have the same properties that the analogous human algorithms do, including those algorithms which implement pleasure and pain.
I agree that not all algorithms that motivate behavior will necessarily have the same phenomenal character as pleasure or pain.
This leads me away from the intuition that phenomenal pleasure and pain necessarily fall out of any functional cognitive structure that implements anything analogous to a utility function.
Necessity according to natural law presumably. If you could write something to show logical necessity, you would have solved the Hard Problem
That leaves the sense in which you are not a moral realist most unclear.
That tacitly assumes that the question “does pleasure/happiness motivate posiively in all cases” is an emprical question—that it would be possible to find an enitity that hates pleasure and loves pain. it could hover be plausibly argued that it is actually an analytical, definitional issue...that is some entity oves X and hates Y, we would just call X it’s pleasure and Y its pain.
I suppose some non-arbitrary subjectivism is the obvious answer.