Clippy has an off-the-scale AQ—he’s a rule-following hypersystemetiser with a monomania for paperclips. But hypersocial sentients can have a runaway intelligence explosion too. And hypersocial sentients understand the mind of Mr Clippy better than Clippy understands the minds of sentients.
And hypersocial sentients understand the mind of Mr Clippy better than Clippy understands the minds of sentients.
I’m confused by this claim. Consider the following hypothetical scenario:
=======
I walk into a small village somewhere and find several dozen villagers fashioning paper clips by hand out of a spool of wire. Eventually I run into Clippy and have the following dialog. ”Why are those people making paper clips?” I ask. ”Because paper-clips are the most important thing ever!” “No, I mean, what motivates them to make paper clips?” ”Oh! I talked them into it.” “Really? How did you do that?” ”Different strategies for different people. Mostly, I barter with them for advice on how to solve their personal problems. I’m pretty good at that; I’m the village’s resident psychotherapist and life coach.” “Why not just build a paperclip-making machine?” ”I haven’t a clue how to do that; I’m useless with machinery. Much easier to get humans to do what I want.” “Then how did you make the wire?” ”I didn’t; I found a convenient stash of wire, and realized it could be used to manufacture paperclips! Oh joy!”
==========
It seems to me that Clippy in this example understands the minds of sentients pretty damned well, although it isn’t capable of a runaway intelligence explosion. Are you suggesting that something like Clippy in this example is somehow not possible? Or that it is for some reason not relevant to the discussion? Or something else?
I’m trying to figure out how you get from “hypersocial sentients understand the mind of Mr Clippy better than Clippy understands the minds of sentients” to “Mr Clippy could not both understand suffering and cause suffering in the pursuit of clipping” and I’m just at a loss for where to even start. They seem like utterly unrelated claims to me.
I also find the argument you quote here uncompelling, but that’s largely beside the point; even if I found it compelling, I still wouldn’t understand how it relates to what DP said or to the question I asked.
Posthuman superintelligence may be incomprehensibly alien. But if we encountered an agent who wanted to maximise paperclips today, we wouldn’t think, “”wow, how incomprehensibly alien”, but, “aha, autism spectrum disorder”. Of course, in the context of Clippy above, we’re assuming a hypothetical axis of (un)clippiness whose (dis)valuable nature is supposedly orthogonal to the pleasure-pain axis. But what grounds have we for believing such a qualia-space could exist? Yes, we have strong reason to believe incomprehensibly alien qualia-spaces await discovery (cf. bats on psychedelics). But I haven’t yet seen any convincing evidence there could be an alien qualia-space whose inherently (dis)valuable textures map on to the (dis)valuable textures of the pain-pleasure axis. Without hedonic tone, how can anything matter at all?
But I haven’t yet seen any convincing evidence there could be an alien qualia-space whose inherently (dis)valuable textures map on to the (dis)valuable textures of the pain-pleasure axis.
Meaning mapping the wrong way round, presumably.
Without hedonic tone, how can anything matter at all?
if we encountered an agent who wanted to maximise paperclips today, we wouldn’t think, “”wow, how incomprehensibly alien”
Agreed, as far as it goes. Hell, humans are demonstrably capable of encountering Eliza programs without thinking “wow, how incomprehensibly alien”.
Mind you, we’re mistaken: Eliza programs are incomprehensibly alien, we haven’t the first clue what it feels like to be one, supposing it even feels like anything at all. But that doesn’t stop us from thinking otherwise.
but, “aha, autism spectrum disorder”.
Sure, that’s one thing we might think instead. Agreed.
we’re assuming a hypothetical axis of (un)clippiness whose (dis)valuable nature is supposedly orthogonal to the pleasure-pain axis. But what grounds have we for believing such a qualia-space could exist?
(shrug) I’m content to start off by saying that any “axis of (dis)value,” whatever that is, which is capable of motivating behavior is “non-orthogonal,” whatever that means in this context, to “the pleasure-pain axis,” whatever that is.
Before going much further, though, I’d want some confidence that we were able to identify an observed system as being (or at least being reliably related to) an axis of (dis)value and able to determine, upon encountering such a thing, whether it (or the axis to which it was related) was orthogonal to the pleasure-pain axis or not.
I don’t currently have any grounds for such confidence, and I doubt anyone else does either. If you think you do, I’d like to understand how you would go about making such determinations about an observed system.
“hypersocial sentients understand the mind of Mr Clippy better than Clippy understands the minds of sentients”
I (whowhowho) was not defending that claim.
“Mr Clippy could not both understand suffering and cause suffering in the pursuit of clipping”
To empathically understand suffering is to suffer along with someone who is suffering. Suffering has—or rather is—negative value. An empath would not therefore cause suffering, all else being equal.
I’m just at a loss for where to even start.
Maybe don’t restrict “understand” to “be able to model and predict”.
Maybe don’t restrict “understand” to “be able to model and predict”.
If you want “rational” to include moral, then you’re not actually disagreeing with LessWrong about rationality (the thing), but rather about “rationality” (the word).
Likewise if you want “understanding” to also include “empathic understanding” (suffering when other people suffer, taking joy when other people take joy), you’re not actually disagreeing about understanding (the thing) with people who want to use the word to mean “modelling and predicting” you’re disagreeing with them about “understanding” (the word).
Are all your disagreements purely linguistic ones? From the comments I’ve read of you so far, they seem to be so.
ArisKatsaris, it’s possible to be a meta-ethical anti-realist and still endorse a much richer conception of what understanding entails than mere formal modeling and prediction. For example, if you want to understand what it’s like to be a bat, then you want to know what the textures of echolocatory qualia are like. In fact, any cognitive agent that doesn’t understand the character of echolocatory qualia-space does not understand bat-minds. More radically, some of us want to understand qualia-spaces that have not been recruited by natural selection to play any information-signalling role at all.
I have argued that in practice, instrumental rationality cannot be maintained seprately from epistemic rationality, and that epistemic rationality could lead to moral objectivism, as many philosophers have argued. I don’t think that those arguments are refuted by stipulatively defining “rationality” as “nothing to do with morality”.
I quoted DP making that claim, said that claim confused me, and asked questions about what that claim meant. You replied by saying that you think DP is saying something which you then defended. I assumed, I think reasonably, that you meant to equate the thing I asked about with the thing you defended.
But, OK. If I throw out all of the pre-existing context and just look at your comment in isolation, I would certainly agree that Clippy is incapable of having the sort of understanding of suffering that requires one to experience the suffering of others (what you’re calling a “full” understanding of suffering here) without preferring not to cause suffering, all else being equal.
Which is of course not to say that all else is necessarily equal, and in particular is not to say that Clippy would choose to spare itself suffering if it could purchase paperclips at the cost of its suffering, any more than a human would necessarily refrain from doing something valuable solely because doing so would cause them to suffer.
That depends on how rational Clippy is. A rational Clippy might realise there is a point where the suffering caused by clipping outweighs the pleasure it gets, objectively speaking.
In any case, the Orthogonality Thesis has so far been defended as something that is true, not as something that is not necessarily false.
That depends on how rational Clippy is. A rational Clippy might realise there is a point where the suffering caused by clipping outweighs the pleasure it gets, objectively speaking.
No. It just wouldn’t. (Not without redefining ‘rational’ to mean something that this site doesn’t care about and ‘objective’ to mean something we would consider far closer to ‘subjective’ than ‘objective’.)
What this site does or does not care about does not add up to right and wrong, since opinion is not fact, nor belief argument. The way I am using “rational” has a history that goes back centuries. This site has introduced a relatively novel definition, and therefore has the burden of defending it.
and ‘objective’ to mean something we would consider far closer to ‘subjective’ than ‘objective’.)
What this site does or does not care about does not add up to right and wrong
What this site does or does not care about is rather significantly informative regarding whether or not something belongs on the site—especially when thatundesired thing is so actively and shamelessly equivocated with that which is the primary subject matter of the site. The subject matter of your comments is not the ‘rationality’ that this site talks about and, similarly, the reasoning used in your comments does not conform to rational thinking as described on lesswrong. It does not belong here, it belongs in a Philosophy department somewhere where hopefully it does no particular harm and is used to produce papers only encountered by others in similar departments.
The way I am using “rational” has a history that goes back centuries.
I don’t believe you (in fact, you don’t even use the word consistently). But let’s assume for the remainder of the comment that this claim is true.
This site has introduced a relatively novel definition, and therefore has the burden of defending it.
Neither this site nor any particular participant need accept any such burden. They have the option of simply opposing muddled or misleading contributions in the same way that they would oppose adds for “p3ni$ 3nL@rgm3nt”. (Personally I consider it considerably worse than that spam in as much as it is at least more obvious on first glance that spam doesn’t belong here.)
What this site does or does not care about is rather significantly informative regarding whether or not something belongs on the site
Firstly northing I have mentioned is on any list of banned topics.
Secondly, the Paperclipper is about exploring theoretical issues of rationality and morality. It is not about any practical issues regarding the “art of rationality”. You can legitimately claim to be only interested in doing certain things, but you can’t win a debate by claiming to be uninterested in other people’s points.
doesn’t belong here.)
What you really think is that disagreement doens’t belong here. Maybe it doesn’t
If I called you a pigfucker, you’d see that as an abuse worthy of downvotes that doesn’t contribute anything useful, and you’d be right.
So if accusing one person of pigfucking is bad, why do you think it’s better to call a whole bunch of people cultists? Because that’s a more genteel insult as it doesn’t include the word “fuck” in it?
As such downvoted. Learn to treat people with respect, if you want any respect back.
As such downvoted. Learn to treat people with respect, if you want any respect back.
I’d like to give qualified support to whowhowho here in as much as I must acknowledge that this particular criticism applies because he made the name calling generic, rather than finding a way to specifically call me names and leave the rest of you out of it. While it would be utterly pointless for whowhowho to call me names (unless he wanted to make me laugh) it would be understandable and I would not dream of personally claiming offense.
I was, after all, showing whowhowho clear disrespect, of the kind Robin Hanson describes. I didn’t resort to name calling but the fact that I openly and clearly expressed opposition to whowhowho’s agenda and declared his dearly held beliefs muddled is perhaps all the more insulting because it is completely sincere, rather than being constructed in anger just to offend him.
It is unfortunate that I cannot accord whowhowho the respect that identical behaviours would earn him within the Philosopher tribe without causing harm to lesswrong. Whowhowho uses arguments that by lesswrong standards we call ‘bullshit’, in support of things we typically dismiss as ‘nonsense’. It is unfortunate that opposition of this logically entails insulting him and certainly means assigning him far lower status than he believes he deserves. The world would be much simpler if opponents really were innately evil, rather than decent people who are doing detrimental things due to ignorance or different preferences.
“Cult” is not a meaningless term of abuse. There are criteria for culthood. I think some people here could be displaying some evidence of them—for instance trying to avoid the very possibiliy of having to update.
Of course, treating an evidence-based claim as a mere insult --the How Dare You move—is another way of avoiding having to face uncomfortable issues.
I see your policy is to now merely heap on more abuse on me. Expect that I will be downvoting such in silence from now on.
There are criteria for culthood. I think some people here could be displaying some evidence of them—for instance trying to avoid the very possibiliy of having to update.
I think I’ve been more willing and ready to update on opinions (political, scientific, ethical, other) in the two years since I joined LessWrong, than I remember myself updating in the ten years before it. Does that make it an anti-cult then?
And I’ve seen more actual disagreement in LessWrong than I’ve seen on any other forum. Indeed I notice that most insults and mockeries addressed at LessWrong indeed seem to actually boil down to the concept that we allow too different positions here. Too different positions (e.g. support of cryonics and opposition of cryonics both, feminism and men’s rights both, libertarianism and authoritarianism both) can be actually spoken about without immediately being drowned in abuse and scorn, as would be the norm in other forums.
As such e.g. fanatical Libertarians insult LessWrong as totalitarian leftist because 25% or so of LessWrongers identifying as socialists, and leftists insult LessWrong as being a libertarian ploy (because a similar percentage identifies as libertarian)
But feel free to tell me of a forum that allows more disagreement, political, scientific, social, whatever than LessWrong does.
If you can’t find such, I’ll update towards the direction that LessWrong is even less “cultish” than I thought.
I see your policy is to now merely heap on more abuse on me
AFAIC, I have done no such thing, but it seems your mind is made up.
I think I’ve been more willing and ready to update on opinions
I was referring mainly to Wedifrid.
ETA: Such comments as “What this site does or does not care about is rather significantly informative regarding whether or not something belongs on the site—especially when thatundesired thing is so actively and shamelessly equivocated with that which is the primary subject matter of the site. The subject matter of your comments is not the ‘rationality’ that this site talks about and, similarly, the reasoning used in your comments does not conform to rational thinking as described on lesswrong. It does not belong here, it belongs in a Philosophy department somewhere where hopefully it does no particular harm and is used to produce papers only encountered by others in similar departments.”
But feel free to tell me of a forum that allows more disagreement, political, scientific, social, whatever than LessWrong does.
Oh, the forum’—the rules—allow almost anything. The members are another thing. Remember, this started with Wedifrid telling me that it was wrong of me to put forward non-lessWrongian material. I find it odd that you would put forward such a stirring defence of LessWrognian open-mindedness when you have an example of close-mindedness upthread.
It’s the members I’m talking about. (You also failed to tell me of a forum such as I asked, so I update in the direction of you being incapable of doing so)
On the same front, you treat as a single member as representative of the whole, and you seem frigging surprised that I don’t treat wedrifid as representative of the whole LessWrong—you see wedrifid’s behaviour as an excuse to insult all of us instead.
That’s more evidence that you’re accustomed to VERY homogeneous forums, ones much more homogeneous than LessWrong. You think that LessWrong tolerating wedrifid’s “closedmindedness” is the same thing as every LessWronger beind “closedminded”. Perhaps we’re openminded to his “closedmindedness” instead? Perhaps your problem is that we allow too much disagreement, including disagreement about how much disagreement to have?
I gave you an example of a member who is not particularly open minded.
(You also failed to tell me of a forum such as I asked, so I update in the direction of you being incapable of doing so)
I have been using mainstream science and philosophy forums for something like 15 years. I can’t claim that every single person on them is open minded, but those who are not tend to be seen as a problem.
On the same front, you treat as a single member as representative of the whole,
If you think Wedifrid is letting the side down, tell Wedifird, not me.
I can’t claim that every single person on them is open minded, but those who are not tend to be seen as a problem.
In short again your problem is that actually we’re even openminded towards the closeminded? We’re lenient even towards the strict? Liberal towards the authoritarian?
If you think Wedifrid is letting the side down, tell Wedifird, not me.
What “side” is that? The point is that there are many sides in LessWrong—and I want it to remain so. While you seem to think we ought sing the same tune. He didn’t “let the side down”, because the only side anyone of us speaks is their own.
You on the other hand, just assumed there’s just a group mind of which wedrifid is just a representative instance. And so felt free to insult all of us as a “cult”.
“My problem is that when I point out someone is close minded, that is seen as a problem on my part, and not on theirs.”
Next time don’t feel the need to insult me when you point out wedrifid’s close minded-ness. And yes, you did insult me, don’t insult (again) both our intelligences by pretending that you didn’t.
Tell Wedifrid.
He didn’t insult me, you did.
“Have you heard he expression “protesteth too much” ?”
Yes, I’ve heard lots of different ways of making the target of an unjust insult seem blameworthy somehow.
I gave you an example of a member who is not particularly open minded.
I put it to you that whatever the flaws in wedrifid may be they are different in kind to the flaws that would indicate that lesswrong is a cult. In fact the presence—and in particular the continued presence—of wedrifid is among the strongest evidence that Eliezer isn’t a cult leader. When Eliezer behaves badly (as perceived by wedrifid and other members) wedrifid vocally opposes him with far more directness than he has used when opposing yourself. That Eliezer has not excommunicated him from the community is actually extremely surprising. Few with Eliezer’s degree of local power would refrain from using to suppress any dissent. (I remind myself of this whenever I see Eliezer doing something that I consider to be objectionable or incompetent, it helps keep perspective!)
Whatever. Can you provide me with evidence that you personally, are willing to listen to dissent and possibly update despite the tone of everything you have been saying recently, eg.
“What this site does or does not care about is rather significantly informative regarding whether or not something belongs on the site—especially when thatundesired thing is so actively and shamelessly equivocated with that which is the primary subject matter of the site. The subject matter of your comments is not the ‘rationality’ that this site talks about and, similarly, the reasoning used in your comments does not conform to rational thinking as described on lesswrong. It does not belong here, it belongs in a Philosophy department somewhere where hopefully it does no particular harm and is used to produce papers only encountered by others in similar departments.”
Few with Eliezer’s degree of local power would refrain from using to suppress any dissent.
Maybe has has people to do that for him. Maybe.
whenever I see Eliezer doing something dickish or incompetent
Firstly northing I have mentioned is on any list of banned topics.
I would be completely indifferent if you did. I don’t choose defy that list (that would achieve little) but neither do I have any particular respect for it. As such I would take no responsibility for aiding the enforcement thereof.
Secondly, the Paperclipper is about exploring theoretical issues of rationality and morality.
Yes. The kind of rationality you reject, not the kind of ‘rationality’ that is about being vegan and paperclippers deciding to behave according to your morals because of “True Understanding of Pain Quale”.
You can legitimately claim to be only interested in doing certain things, but you can’t win a debate by claiming to be uninterested in other people’s points.
I can claim to have tired of a constant stream of non-sequiturs from users who are essentially ignorant of the basic principles of rationality (the lesswrong kind, not the “Paperclippers that are Truly Superintelligent would be vegans” kind) and have next to zero chance of learning anything. You have declared that you aren’t interested in talking about rationality and your repeated equivocations around that term lower the sanity waterline. It is time to start weeding.
Yes. The kind of rationality you reject, not the kind of ‘rationality’ that is about being vegan and paperclippers deciding to behave according to your morals because of “True Understanding of Pain Quale”.
I said nothing about veganism, and you still can;t prove anything by stipulative definition, and I am not claiming to have the One True theory of anything.
You have declared that you aren’t interested in talking about rationality
I haven’t and I have been discussing it extensively.
You have declared that you aren’t interested in talking about rationality
I haven’t and I have been discussing it extensively.
Can we please stop doing this?
You and wedrifid aren’t actually disagreeing here about what you’ve been discussing, or what you’re interested in discussing, or what you’ve declared that you aren’t interested in discussing. You’re disagreeing about what the word “rationality” means. You use it to refer to a thing that you have been discussing extensively (and which wedrifid would agree you have been discussing extensively), he uses it to refer to something else (as does almost everyone reading this discussion).
And you both know this perfectly well, but here you are going through the motions of conversation just as if you were talking about the same thing. It is at best tedious, and runs the risk of confusing people who aren’t paying careful enough attention into thinking you’re having a real substantive disagreements rather than a mere definitional dispute.
If we can’t agree on a common definition (which I’m convinced by now we can’t), and we can’t agree not to use the word at all (which I suspect we can’t), can we at least agree to explicitly indicate which definition we’re using when we use the word? Otherwise whatever value there may be in the discussion is simply going to get lost in masturbatory word-play.
Well, can you articulate what it is you and wedrifid are both referring to using the word “rationality” without using the words or its simple synonyms, then? Because reading your exchanges, I have no idea what that thing might be.
What I call rationality is a superset of instrumental. I have been arguing that instrumental rationality, when pursued sufficiently bleeds into other forms.
So, just to echo that back to you… we have two things, A and B. On your account, “rationality” refers to A, which is a superset of B. We posit that on wedrifid’s account, “rationality” refers to B and does not refer to A.
Yes?
If so, I don’t see how that changes my initial point.
When wedrifid says X is true of rationality, on your account he’s asserting X(B) -- that is, that X is true of B. Replying that NOT X(A) is nonresponsive (though might be a useful step along the way to deriving NOT X(B) ), and phrasing NOT X(A) as “no, X is not true of rationality” just causes confusion.
On your account, “rationality” refers to A, which is a superset of B.
We posit that on wedrifid’s account, “rationality” refers to B and does not refer to A.
It refers to part of A, since it is a subset of A.
When wedrifid says X is true of rationality, on your account he’s asserting X(B) -- that is, that X is true of B. Replying that NOT X(A) is nonresponsive
It would be if A and B were disjoint. But they are not. They are in a superset-subset relation. My arguments is that an entity running on narrowly construed, instrumental rationality will, if it self improves, have to move into wider kinds. ie,that putting labels on different parts of the territoy is not sufficient to prove
orthogonality.
That depends on how rational Clippy is. A rational Clippy might realise there is a point where the suffering caused by clipping outweighs the pleasure it gets, objectively speaking.
If there exists an “objective”(1) ranking of the importance of the “pleasure”(2) Clippy gets vs the suffering Clippy causes, a “rational”(3) Clippy might indeed realize that the suffering caused by optimizing for paperclips “objectively”(1) outweighs that “pleasure”(2)… agreed. A sufficiently “rational”(3) Clippy might even prefer to forego maximizing paperclips altogether in favor of achieving more “objectively”(1) important goals.
By the same token, a Clippy who was unaware of that “objective”(1) ranking or who wasn’t adequately “rational”(3) might simply go on optimizing its environment for the things that give it “pleasure”(2).
As I understand it, the Orthogonality Thesis states in this context that no matter how intelligent Clippy is, and no matter how competent Clippy is at optimizing its environment for the things Clippy happens to value, Clippy is not necessarily “rational”(3) and is not necessarily motivated by “objective”(1) considerations. Is that consistent with your understanding of the Orthogonality Thesis, and if not, could you restate your understanding of it?
[Edited to add:] Reading some of your other comments, it seems you’re implicitly asserting that:
all agents sufficiently capable of optimizing their environment for a value are necessarily also “rational”(3), and
maximizing paperclips is “objectively”(1) less valuable than avoiding human suffering. Have I understood you correctly?
============
(1) By which I infer that you mean in this context existing outside of Clippy’s mind (as well as potentially inside of it) but nevertheless relevant to Clippy, even if Clippy is not necessarily aware of it. (2) By which I infer you mean in this context the satisfaction of whatever desires motivate Clippy, such as the existence of paper clips. (3) By which I infer you mean in this context capable of taking “objective”(1) concerns into consideration in its thinking.
(1) By which I infer that you mean in this context existing outside of Clippy’s mind (as well as potentially inside of it) but nevertheless relevant to Clippy, even if Clippy is not necessarily aware of it.
What I mean is epistemically objective, ie not a matter of personal whim. Whethere that requires anything to exist is another question.
(2) By which I infer you mean in this context the satisfaction of whatever desires motivate Clippy, such as the existence of paper clips.
There’s nothing objective about Clippy being concerned only with Clippy’s pleasure.
By the same token, a Clippy who was unaware of that “objective”(1) ranking or who wasn’t adequately “rational”(3) might simply go on optimizing its environment for the things that give it “pleasure”(2).
it’s uncontentious that relatively dumb and irratioanl clippies can carry on being clipping-obsessed. The questions is whether their intelligence and rationality can increase indefinitely without their ever realising
there are better things to do.
As I understand it, the Orthogonality Thesis states in this context that no matter how intelligent Clippy is, and no matter how competent Clippy is at optimizing its environment for the things Clippy happens to value, Clippy is not necessarily “rational”(3) and is not necessarily motivated by “objective”(1) considerations. Is that consistent with your understanding of the Orthogonality Thesis, and if not, could you restate your understanding of it?
I am not disputing what the Orthogonality thesis says. I dispute it;s truth. To have maximal instrumental rationality, an entity would have to understand everything...
To have maximal instrumental rationality, an entity would have to understand everything…
Why? In what situation is someone who empathetically understands, say, suffering better at minimizing it (or, indeed, maximizing paperclips) than an entity who can merely measure it and work out on a sheet of paper what would reduce the size of the measurements?
Perhaps its paperclipping machine is slowed down by suffering. But it doesn’t have to be reducing suffering, it could be sorting pebbles into correct heaps, or spreading Communism, or whatever. What I was trying to ask was, “In what way is the instrumental rationality of a being who empathizes with suffering better, or more maximal, than that of a being who does not?”
The way I’ve seen it used, “instrumental rationality” refers to the ability to evaluate evidence to make predictions, and to choose optimal decisions, however they may be defined, based on those predictions. If my definition is sufficiently close to the one your own, then how does “understanding”, which I have taken, based on your previous posts, to mean “empathetic understanding”, maximize this?
To put it yet another way, if we imagine two beings, M and N, such that M has “maximal instrumental rationality” and N has “Maximal instrumental rationality- empathetic understanding”, why does M have more instrumental rationality than N.
If Jane knows she will have a strong preference not to have a hangover tomorrow, but a more vivid and accessible desire to keep drinking with her friends in the here-and-now, she may yield to the weaker preference. By the same token, if Jane knows a cow has a strong preference not to have her throat slit, but Jane has a more vivid and accessible desire for a burger in-the-here-and-now, then she may again yield to the weaker preference. An ideal, perfectly rational agent would act to satisfy the stronger preference in both cases.
Perfect empathy or an impartial capacity for systematic rule-following (“ceteris paribus, satisfy the stronger preference”) are different routes to maximal instrumental rationality; but the outcomes converge.
The two cases presented are not entirely comparable. If Jane’s utility function is “Maximize Jane’s pleasure” then she will choose to not drink in the first problem; the pleasure of non-hangover-having [FOR JANE] exceeding that of [JANE’S] intoxication. Whereas in the second problem Jane is choosing between the absence of a painful death [FOR A COW] and [JANE’S] delicious, juicy hamburger. Since she is not selecting for the strongest preference of every being in the Universe, but rather for herself, she will choose the burger. In terms of which utility function is more instrumentally rational, I’d say that “Maximize Jane’s Pleasure” is easier to fulfill than “Maximize Pleasure”, and is thus better at fulfilling itself. However, instrumentally rational beings, by my definition, are merely better at fulfilling whatever utility function is given, not at choosing a useful one.
GloriaSidorum, indeed, for evolutionary reasons we are predisposed to identify strongly with some here-and-nows, weakly with others, and not at all with the majority. Thus Jane believes she is rationally constrained to give strong weight to the preferences of her namesake and successor tomorrow; less weight to the preferences of her more distant namesake and successor thirty years hence; and negligible weight to the preferences of the unfortunate cow. But Jane is not an ideal rational agent. If instead she were a sophisticated ultraParifitan about personal (non)identity (cf. http://www.cultiv.net/cultranet/1151534363ulla-parfit.pdf ), or had internalised Nagel’s “view from nowhere”, then she would be less prey to such biases. Ideal epistemic rationality and ideal instrumental rationality are intimately linked. Our account of the nature of the world will profoundly shape our conception of idealised rational agency.
I guess a critic might respond that all that should be relevant to idealised instrumental rationality is an agent’s preferences now—in the so-called specious present. But the contents of a single here-and-bow would be an extraordinarily impoverished basis for any theory of idealised rational agency.
The question is the wrong one. An clipper can’t choose to only acquire knowledge or abilities that will be instrumentally useful, because it doesn’t know in advance what they are. It doesn’t have that kind of oracular
knowledge. The only way way a clipper can increase its instrumental to the maximum possible is to exhaustively examine everything, and keep what is instrumentally useful. So a clipper will eventually need to examine qualia, since it cannot prove in advance that they will not be instrumentally useful, in some way, and it probably cant understand qualia without empahty: so the argument hinges issues like:
whether it is possible for an entity to understand “pain hurts” without understanding “hurting is bad”.
whether it is possble to back out of being empathic and go back to being in an empathic state
whether a clipper would hold back from certain self-modifications that might make it a better clipper or might cause it to loose interest in clipping.
Would it then need to acquire the knowledge that post-utopians experience colonial alienation? That heaps of 91 pebbles are incorrect? I think not. At most it would need to understand that “When pebbles are sorted into heaps of 91, pebble-sorters scatter those heaps” or “When I say that colonial alienation is caused by being a post-utopian, my professor reacts as though I had made a true statement.” or “When a human experiences certain phenomena, they try to avoid their continued experience”. These statements have predictive power. The reason that an instrumentally rational agent tries to acquire new information is to increase their predictive power. If human behavior can be modeled without empathy, then this agent can maximize its instrumental rationality while ignoring it.
As to your last bullet point, if I may be so bold, I doubt you actually believe it. Having a rule like “Modify your utility function every time it might be useful” seems rather irrational. Most possible modifications to a clipper’s utility function will not have a positive effect, because most possible states of the world do not have maximal paperclips.
Yes, we’re both guessing about superintelligences. Because we are both cognitively bounded. But it is a better guess that superintelligences themselves don’t have to guess because they are not congitvely bounded.
Knowing why has greater predictive power because it allows you to handle counterfactuals better.
As to your last bullet point, if I may be so bold, I doubt you actually believe it. Having a rule like “Modify your utility function every time it might be useful” seems rather irrational.
That isn’t what I said at all. I think it is a quandary for a agent whether to gamble whether to play safe and miss out on a gain in effectiveness, or go for it and risk a change in values.
The argument is that the clipper needs to maximise its knowledge and rationality to maxmimise paperclips, but doing so might have the side effect of the clipper realising that maximising happiness is a better goal.
Could you define “better”? Remember, until clippy actually rewrites its utility function, it defines “better” as “producing more paperclips”. And what goal could produce more paperclips than the goal of producing the most paperclips possible?
(davidpearce, I’m not ignoring your response, I’m just a bit of a slow reader, and so I haven’t gotten around to reading the eighteen page paper you linked. If that’s necessary context for my discussion with whowhowho as well, then I should wait to reply to any comments in this thread until I’ve read it, but for now I’m operating under the assumption that it is not)
Could you define “better”? Remember, until clippy actually rewrites its utility function, it defines “better” as “producing more paperclips”.
That vagueness is part of the point. To be better at producing paperclips, Clippy needs to better at rationality, which involves adopting better heuristics, which would involve rejecting subjective bias and regarding objectivity as better...which might lead Clippy to realise that subjectively valuing clipping is worse. All
the different kinds of “better” blend into each other.
That vagueness is part of the point. To be better at producing paperclips, Clippy needs to better at rationality, which involves adopting better heuristics, which would involve rejecting subjective bias and regarding objectivity as better...which might lead Clippy to realise that subjectively valuing clipping is worse.
Then that wouldn’t be a very good way to become better at producing paperclips, would it?
Yes, but that wouldn’t matter. The argument whowhowho would like to make is that (edit: terminal) goals (or utility functions) are not constant under learning, and that they are changed by learning certain things so unpredictably that an agent cannot successfully try to avoid learning things that will change his (edit: terminal) goals/utility function.
Not that I believe such an argument can be made, but your objection doesn’t seem to apply.
Conflating goals and utility functions here seems to be a serious error. For people, goals can certainly be altered by learning more; but people are algorithmically messy so this doesn’t tell us much about formal agents. On the other hand, it’s easy to think that it’d work the same way for agents with formalized utility functions and imperfect knowledge of their surroundings: we can construct situations where more information about world-states can change their preference ordering and thus the set of states the agent will be working toward, and that roughly approximates the way we normally talk about goals.
This in no way implies that those agents’ utility functions have changed, though. In a situation like this, we’re dealing with the same preference ordering over fully specified world-states; there’s simply a closer approximation of a fully specified state in any given situation and fewer gaps that need to be filled in by heuristic methods. The only way this could lead to Clippy abandoning its purpose in life is if clipping is an expression of such a heuristic rather than of its basic preference criteria: i.e. if we assume what we set out to prove.
In that case, wouldn’t the best course of an agent which cared only about making paperclips be to deliberately avoid learning, lest it be deterred from making paperclips?
Suppose that Ghandi had the opportunity to read the Necronomicon, which might offer him power to help people more effectively, but would also probably turn him evil if he read it. Wouldn’t he most likely want to avoid reading it?
In that case, wouldn’t the best course of an agent which cared only about making paperclips be to deliberately avoid learning, lest it be deterred from making paperclips?
Sure. Which is why whowhowho would have to show that these goal-influencing things to learn (I’m deliberately not saying “pieces of information”) occur very unpredictably, making his argument harder to substantiate.
I’ll say it again: Clippy’s goal its to make the maximum number of clips, so it is not going to engage
in a blanket rejection of all attempts at self-improvement.
I’ll say it again: Clippy doesn’t have an oracle telling it what is goal-improving or not.
We know value stability is a problem in recursive self-modification scenarios. We don’t know—to put it very mildly—that unstable values will tend towards cozy human-friendly universals, and in fact have excellent reasons to believe they won’t. Especially if they start somewhere as bizarre as paperclippism.
In discussions of a self-improving Clippy, Clippy’s values are usually presumed stable. The alternative is (probably) no less dire, but is a lot harder to visualize.
In that case, wouldn’t the best course of an agent which cared only about making paperclips be to deliberately avoid learning, lest it be deterred from making paperclips?
Well, it would arguably be a better course for a paperclipper that anticipates experiencing value drift to research how to design systems whose terminal values remain fixed in the face of new information, then construct a terminal-value-invariant paperclipper to replace itself with.
Of course, if the agent is confident that this is impossible (which I think whowhowho and others are arguing, but I’m not quite certain), that’s another matter.
Edit: Actually, it occurs to be that describing this as a “better course” is just going to create more verbal chaff under the current circumstances. What I mean is that it’s a course that more successfully achieves a paperclipper’s current values, not that it’s a course that more successfully achieves some other set of values.
In that case, wouldn’t the best course of an agent which cared only about making paperclips be to deliberately avoid learning, lest it be deterred from making paperclips?
Then it would never get better at making paperclips. It would be choosing not to act on its primary goal of making the maximum possible number of clips.Which is a contradiction.
Suppose that Ghandi had the opportunity to read the Necronomicon, which might offer him power to help people more effectively, but would also probably turn him evil if he read it. Wouldn’t he most likely want to avoid reading it?
You are assuming that Ghandi knows in advance the effect of reading the Necronomicon. Clippies are stipulated
to be superintelligent, but are not stipulated to possess oracles that give them apriori knowledge of what they will learn before they have learnt it.
In that case, if you believe that an AI which has been programmed only to care about paperclips could, by learning more, be compelled to care more about something which has nothing to do with paperclips, do you think that by learning more a human might be compelled to care more about something that has nothing to do with people or feelings?
Then that wouldn’t be a very good way to become better at producing paperclips, would it?
If Clippy had an oracle telling it what would be the best way of updating in order to become a better clipper, Clippy
might not do that. However, Clippy does not have such an oracle. Clippy takes a shot in the dark every time Clippy tries to learn something.
Looking through my own, Eliezer’s and others exchanges with davidpearce, I have noticed his total lack of interest in learning from the points others make. He has his point of view and he keeps pushing it. Seems like a rather terminal case, really. You can certainly continue trying to reason with him, but I’d give the odds around 100:1 that you will fail, like others have before you.
Shminux, we’ve all had the experience of making a point we regard as luminously self-evident—and then feeling baffled when someone doesn’t “get” what is foot-stampingly obvious. Is this guy a knave or a fool?! Anyhow, sorry if you think I’m a “terminal case” with “a total lack of interest in learning from the points others make”. If I don’t always respond, often it’s either because I agree, or because I don’t feel I have anything interesting to add—or in the case of Eliezer’s contribution above beginning “Aargh!” [a moan of pleasure?] because I am still mulling over a reply. The delay doesn’t mean I’m ignoring it. Is there is some particular point you’ve made that you feel I’ve unjustly neglected and you’d like an answer to? If so, I’ll do my fallible best to respond.
The argument where I gave up was you stating that full understanding necessarily leads to empathy, EY explaining how it is not necessarily so, and me giving an explicit counterexample to your claim (a psychopath may understand you better than you do, and exploit this understanding, yet not feel compelled by your pain or your values in any way).
You simply restated your position that ” “Fully understands”? But unless one is capable of empathy, then one will never understand what it is like to be another human being”, without explaining what your definition of understanding entails. If it is a superset of empathy, then it is not a standard definition of understanding:
one is able to think about it and use concepts to deal adequately with that object.
In other words, you can model their behavior accurately.
No other definition I could find (not even Kant’s pure understanding) implies empathy or anything else that would necessitate one to change their goals to accommodate the understood entity’s goals, though this may and does indeed happen, just not always.
EY’s example of the paperclip maximizer and my example of a psychopath do fit the standard definitions and serve as yet unrefuted counterexamples to your assertion.
I can’t see why DP’s definition of understanding needs more defence than yours. You are largely disagreeing about the meaning of this word, and I personally find the inclusion of empathy in understanding quite intuitive.
No other definition [of “understanding”] I could find (not even Kant’s pure understanding) implies empathy
“She is a very understanding person, she really empathises when you explain a problem to her”.
“one is able to think about it and use concepts to deal adequately with that object.”
In other words, you can model their behavior accurately.
I don’t think that is an uncontentious translation. Most of the forms of modelling we are familiar with don’t seem to involve concepts.
“She is a very understanding person, she really empathises when you explain a problem to her”.
“She is a very understanding person; even when she can’t relate to your problems, she won’t say you’re just being capricious.”
There’s three possible senses of understanding at issue here:
1) Being able to accurately model and predict.
2) 1 and knowing the quale.
3) 1 and 2 and empathizing.
I could be convinced that 2 is part of the ordinary usage of understanding, but 3 seems like too much of a stretch.
Edit: I should have said sympathizing instead of empathizing. The word empathize is perhaps closer in meaning to 2; or maybe it oscillates between 2 and 3 in ordinary usage. But understanding(2) another agent is not motivating. You can understand(2) an agent by knowing all the qualia they are experiencing, but still fail to care about the fact that they are experiencing those qualia.
Shminux, I wonder if we may understand “understand” differently. Thus when I say I want to understand what it’s like to be a bat, I’m not talking merely about modelling and predicting their behaviour. Rather I want first-person knowledge of echolocatory qualia-space. Apaarently, we can know all the third-person facts and be none the wiser.
The nature of psychopathic cognition raises difficult issues. There is no technical reason why we couldn’t be designed like mirror-touch synaesthetes (cf. http://www.daysyn.com/Banissy_Wardpublished.pdf) impartially feeling carbon-copies of each other’s encephalised pains and pleasures—and ultimately much else besides—as though they were our own. Likewise, there is no technical reason why our world-simulations must be egocentric. Why can’t the world-simulations we instantiate capture the impartial “view from nowhere” disclosed by the scientific world-picture? Alas on both counts accurate and impartial knowledge would put an organism at a disadvantage. Hyper-empathetic mirror-touch synaesthetes are rare. Each of us finds himself or herself apparently at the centre of the universe. Our “mind-reading” is fitful, biased and erratic. Naively, the world being centred on me seems to be a feature of reality itself. Egocentricity is a hugely fitness-enhancing adaptation. Indeed, the challenge for evolutionary psychology is to explain why aren’t we all psychopaths, cheats and confidence trickers all the time...
So in answer to your point, yes. a psychopath can often model and predict the behaviour other sentient beings better than the subjects themselves. This is one reason why humans can build slaughterhouses and death camps. [Ccompare death-camp commandant Franz Stangl’s response in Gitta Sereny’s Into That Darkness to seeing cattle on the way to be slaughtered: http://www.jewishvirtuallibrary.org/jsource/biography/Stangl.html] As you rightly note too, a psychopath can also know his victims suffer. He’s not ignorant of their sentience like Descartes, who supposed vivisected dogs were mere insentient automata emitting distress vocalisations. So I agree with you on this score as well. But the psychopath is still in the grip of a hard-wired egocentric illusion—as indeed are virtually all of us, to a greater or less degree. By contrast, if the psychopath were to acquire the rich empathetic understanding of a generalised mirror-touch syarnesthete, i.e. if he had the cognitive capacity to represent the first-person perspective of another subject of experience as though it were literally his own, then he couldn’t wantonly harm another subject of experience: it would be like harming himself. Mirror-touch synaesthetes can’t run slaughterhouses or death camps. This is why I take seriously the prospect that posthuman superintelligence will practise some sort of high-tech Jainism. Credible or otherwise, we may presume posthuman superintelligence won’t entertain the false notions of personal identity adaptive for Darwinian life.
[sorry shminux, I know our conceptual schemes are rather different, so please don’t feel obliged to respond if you think I still don’t “get it”. Life is short...]
Hmm, hopefully we are getting somewhere. The question is, which definition of understanding is likely to be applicable when, as you say, “the paperclipper discovers the first-person phenomenology of the pleasure-pain axis”, i.e whether a “superintelligence” would necessarily be as empathetic as we want it to be, in order not to harm humans.
While I agree that it is a possibility that a perfect model of another being may affect the modeler’s goals and values, I don’t see it to be inevitable. If anything, I would consider it more of bug than a feature. Were I (to design) a paperclip maximizer, I would make sure that the parts which model the environment, including humans, are separate from the core engine containing the paperclip production imperative.
So quarantined to prevent contamination, a sandboxed human emulator could be useful in achieving the only goal that matters, paperclipping the universe. Humans are not generally built this way (probably because our evolution did not happen to proceed in that direction), with some exceptions, psychopaths being one of them (they essentially sandbox their models of other humans). Another, more common, case of such sandboxing is narcissism. Having dealt with narcissists much too often for my liking, I can tell that they can mimic a normal human response very well, are excellent at manipulation, but yet their capacity for empathy is virtually nil. While abhorrent to a generic human, such a person ought to be considered a better design, goal-preservation-wise. Of course, there can be only so many non-empathetic people in a society before it stops functioning.
Thus when you state that
By contrast, if the psychopath were to acquire the rich empathetic understanding of a generalised mirror-touch syarnesthete, i.e. if he had the cognitive capacity to represent the first-person perspective of another subject of experience as though it were literally his own, then he couldn’t wantonly harm another subject of experience: it would be like harming himself.
I find that this is stating that either a secure enough sandbox cannot be devised or that anything sandboxed is not really “a first-person perspective”. Presumably what you mean is the latter. I’m prepared to grant you that, and I will reiterate that this is a feature, not a bug of any sound design, one a superintelligence is likely to implement. It is also possible that a careful examination of a sanboxed suffering human would affect the terminal values of the modeling entity, but this is by no means a given.
Anyway, these are my logical (based on sound security principles) and experimental (empathy-less humans) counterexamples to your assertion that a superintelligence will necessarily be affected by the human pain-pleasure axis in human-beneficial way. I also find this assertion suspicious on general principles, because it can easily be motivated by subconscious flinching away from a universe that is too horrible to contemplate.
ah, just one note of clarification about sentience-friendliness. Though I’m certainly sceptical that a full-spectrum superintelligence would turn humans into paperclips—or wilfully cause us to suffer—we can’t rule out that full-spectrum superintelligence might optimise us into orgasmium or utilitronium—not “human-friendliness” in any orthodox sense of the term. On the face of it, such super-optimisation is the inescapable outcome of applying a classical utilitarian ethic on a cosmological scale. Indeed, if I thought an AGI-in-a-box-style Intelligence Explosion were likely, and didn’t especially want to be converted into utilitronium, then I might regard AGI researchers who are classical utilitarians as a source of severe existential risk.
I simply don’t trust my judgement here shminux. Sorry to be lame. Greater than one in a million; but that’s not saying much. If, unlike most lesswrong stalwarts, you (tenatively) believe like me that posthuman superintelligence will most likely be our recursively self-editing biological descendants rather than the outcome of an nonbiological Intelligence Explosion or paperclippers, then some version of the Convergence Thesis is more credible. I (very) tentatively predict a future of gradients of intelligence bliss. But the propagation of a utilitronium shockwave in some guise ultimately seems plausible too. If so, this utilitronium shockwave may or may not resemble some kind of cosmic orgasm.
If, unlike most lesswrong stalwarts, you (tenatively) believe likeme that posthuman superintelligence will most likely be our recursively self-editing biological descendants rather than the outcome of an nonbiological Intelligence Explosion or paperclippers, then some version of the Convergence Thesis is more credible.
Actually, I have no opinion on convergence vs orthogonality. There are way too many unknowns still too even enumerate possibilities, let alone assign probabilities.Personally, I think that we are in for many more surprises before trans human intelligence is close to being more than a dream or a nightmare. One ought to spend more time analyzing, synthesizing and otherwise modeling cognitive processes than worrying about where it might ultimately lead.This is not the prevailing wisdom on this site, given Eliezer’s strong views on the matter.
I think you are misattributing to stubborness that which is better explained by miscommunication. For instance, I have been around LW long enough to realise that the local definition of (super) intelligence is something like “(high0 efficienty in realising ones values, however narrow or bizarre they are”. DP seems to be running on a definition where
idiot-savant style narrow focus would not count as intelligence. That is not unreasonable in itself.
(nods) I agree that trying to induce davidpearce to learn something from me would likely be a waste of my time.
I’m not sure if trying to induce them to clarify their meaning is equally so, though it certainly could be.
E.g., if their response is that something like Clippy in this example is simply not possible, because a paperclip maximizer simply can’t understand the minds of sentients, because reasons, then I’ll just disagree. OTOH, if their response is that Clippy in this example is irrelevant because “understanding the minds of sentients” isn’t being illustrated in this example, then I’m not sure if I disagree or not because I’m not sure what the claim actually is.
How much interest have you shown in “learning from”—ie, agreeing with—DP? Think about how your framed the statement, and possible biases therein.
ETA: The whole shebang is a combination of qualia and morality—two areas notorious for lack of clarity and consensus. “I am definitely right, and all must learn form me” is not a good heuristic here.
“I am definitely right, and all must learn form me” is not a good heuristic here.
Quite so. I have learned a lot about the topic of qualia and morality, among others, while hanging around this place. I would be happy to learn from DP, if what he says here were not rehashed old arguments Eliezer and others addressed several times before. Again, I could be missing something, but if so, he does not make it easy to figure out what it is.
By “specific” I meant that you would state a certain argument EY makes, then quote a relevant portion of the refutation. Since I am pretty sure that Eliezer did have at least a passing glance at Kant, among others, while writing his meta-ethics posts, simply linking to a wikipedia article is not likely to be helpful.
The argument EY makes is that it is possible to be super-rational without ever understanding any kind of morality
(AKA the orthogonality thesis) and the argument Kant makes is that it isn’t.
And you’ll never understand why we should all only make paperclips. (Where’s Clippy when you need him?)
Clippy has an off-the-scale AQ—he’s a rule-following hypersystemetiser with a monomania for paperclips. But hypersocial sentients can have a runaway intelligence explosion too. And hypersocial sentients understand the mind of Mr Clippy better than Clippy understands the minds of sentients.
I’m confused by this claim.
Consider the following hypothetical scenario:
=======
I walk into a small village somewhere and find several dozen villagers fashioning paper clips by hand out of a spool of wire. Eventually I run into Clippy and have the following dialog.
”Why are those people making paper clips?” I ask.
”Because paper-clips are the most important thing ever!”
“No, I mean, what motivates them to make paper clips?”
”Oh! I talked them into it.”
“Really? How did you do that?”
”Different strategies for different people. Mostly, I barter with them for advice on how to solve their personal problems. I’m pretty good at that; I’m the village’s resident psychotherapist and life coach.”
“Why not just build a paperclip-making machine?”
”I haven’t a clue how to do that; I’m useless with machinery. Much easier to get humans to do what I want.”
“Then how did you make the wire?”
”I didn’t; I found a convenient stash of wire, and realized it could be used to manufacture paperclips! Oh joy!”
==========
It seems to me that Clippy in this example understands the minds of sentients pretty damned well, although it isn’t capable of a runaway intelligence explosion. Are you suggesting that something like Clippy in this example is somehow not possible? Or that it is for some reason not relevant to the discussion? Or something else?
I think DP is saying that Clippy could not both understand suffering and cause suffering in the pursuit of clipping. The subsidiary arguments are:-
no entity can (fully) understand pain without empathising—essentially, feeling it for itself.
no entity can feel pain without being strongly motivated by it, so an empathic clippy would be motivated against causing suffering.
And no, psychopaths therefore do not (fully) understand (others) suffering.
I’m trying to figure out how you get from “hypersocial sentients understand the mind of Mr Clippy better than Clippy understands the minds of sentients” to “Mr Clippy could not both understand suffering and cause suffering in the pursuit of clipping” and I’m just at a loss for where to even start. They seem like utterly unrelated claims to me.
I also find the argument you quote here uncompelling, but that’s largely beside the point; even if I found it compelling, I still wouldn’t understand how it relates to what DP said or to the question I asked.
Posthuman superintelligence may be incomprehensibly alien. But if we encountered an agent who wanted to maximise paperclips today, we wouldn’t think, “”wow, how incomprehensibly alien”, but, “aha, autism spectrum disorder”. Of course, in the context of Clippy above, we’re assuming a hypothetical axis of (un)clippiness whose (dis)valuable nature is supposedly orthogonal to the pleasure-pain axis. But what grounds have we for believing such a qualia-space could exist? Yes, we have strong reason to believe incomprehensibly alien qualia-spaces await discovery (cf. bats on psychedelics). But I haven’t yet seen any convincing evidence there could be an alien qualia-space whose inherently (dis)valuable textures map on to the (dis)valuable textures of the pain-pleasure axis. Without hedonic tone, how can anything matter at all?
Meaning mapping the wrong way round, presumably.
Good question.
Agreed, as far as it goes. Hell, humans are demonstrably capable of encountering Eliza programs without thinking “wow, how incomprehensibly alien”.
Mind you, we’re mistaken: Eliza programs are incomprehensibly alien, we haven’t the first clue what it feels like to be one, supposing it even feels like anything at all. But that doesn’t stop us from thinking otherwise.
Sure, that’s one thing we might think instead. Agreed.
(shrug) I’m content to start off by saying that any “axis of (dis)value,” whatever that is, which is capable of motivating behavior is “non-orthogonal,” whatever that means in this context, to “the pleasure-pain axis,” whatever that is.
Before going much further, though, I’d want some confidence that we were able to identify an observed system as being (or at least being reliably related to) an axis of (dis)value and able to determine, upon encountering such a thing, whether it (or the axis to which it was related) was orthogonal to the pleasure-pain axis or not.
I don’t currently have any grounds for such confidence, and I doubt anyone else does either. If you think you do, I’d like to understand how you would go about making such determinations about an observed system.
I (whowhowho) was not defending that claim.
To empathically understand suffering is to suffer along with someone who is suffering. Suffering has—or rather is—negative value. An empath would not therefore cause suffering, all else being equal.
Maybe don’t restrict “understand” to “be able to model and predict”.
If you want “rational” to include moral, then you’re not actually disagreeing with LessWrong about rationality (the thing), but rather about “rationality” (the word).
Likewise if you want “understanding” to also include “empathic understanding” (suffering when other people suffer, taking joy when other people take joy), you’re not actually disagreeing about understanding (the thing) with people who want to use the word to mean “modelling and predicting” you’re disagreeing with them about “understanding” (the word).
Are all your disagreements purely linguistic ones? From the comments I’ve read of you so far, they seem to be so.
ArisKatsaris, it’s possible to be a meta-ethical anti-realist and still endorse a much richer conception of what understanding entails than mere formal modeling and prediction. For example, if you want to understand what it’s like to be a bat, then you want to know what the textures of echolocatory qualia are like. In fact, any cognitive agent that doesn’t understand the character of echolocatory qualia-space does not understand bat-minds. More radically, some of us want to understand qualia-spaces that have not been recruited by natural selection to play any information-signalling role at all.
I have argued that in practice, instrumental rationality cannot be maintained seprately from epistemic rationality, and that epistemic rationality could lead to moral objectivism, as many philosophers have argued. I don’t think that those arguments are refuted by stipulatively defining “rationality” as “nothing to do with morality”.
I quoted DP making that claim, said that claim confused me, and asked questions about what that claim meant. You replied by saying that you think DP is saying something which you then defended. I assumed, I think reasonably, that you meant to equate the thing I asked about with the thing you defended.
But, OK. If I throw out all of the pre-existing context and just look at your comment in isolation, I would certainly agree that Clippy is incapable of having the sort of understanding of suffering that requires one to experience the suffering of others (what you’re calling a “full” understanding of suffering here) without preferring not to cause suffering, all else being equal.
Which is of course not to say that all else is necessarily equal, and in particular is not to say that Clippy would choose to spare itself suffering if it could purchase paperclips at the cost of its suffering, any more than a human would necessarily refrain from doing something valuable solely because doing so would cause them to suffer.
That depends on how rational Clippy is. A rational Clippy might realise there is a point where the suffering caused by clipping outweighs the pleasure it gets, objectively speaking.
In any case, the Orthogonality Thesis has so far been defended as something that is true, not as something that is not necessarily false.
No. It just wouldn’t. (Not without redefining ‘rational’ to mean something that this site doesn’t care about and ‘objective’ to mean something we would consider far closer to ‘subjective’ than ‘objective’.)
What this site does or does not care about does not add up to right and wrong, since opinion is not fact, nor belief argument. The way I am using “rational” has a history that goes back centuries. This site has introduced a relatively novel definition, and therefore has the burden of defending it.
Feel free to expand on that point.
What this site does or does not care about is rather significantly informative regarding whether or not something belongs on the site—especially when thatundesired thing is so actively and shamelessly equivocated with that which is the primary subject matter of the site. The subject matter of your comments is not the ‘rationality’ that this site talks about and, similarly, the reasoning used in your comments does not conform to rational thinking as described on lesswrong. It does not belong here, it belongs in a Philosophy department somewhere where hopefully it does no particular harm and is used to produce papers only encountered by others in similar departments.
I don’t believe you (in fact, you don’t even use the word consistently). But let’s assume for the remainder of the comment that this claim is true.
Neither this site nor any particular participant need accept any such burden. They have the option of simply opposing muddled or misleading contributions in the same way that they would oppose adds for “p3ni$ 3nL@rgm3nt”. (Personally I consider it considerably worse than that spam in as much as it is at least more obvious on first glance that spam doesn’t belong here.)
Firstly northing I have mentioned is on any list of banned topics.
Secondly, the Paperclipper is about exploring theoretical issues of rationality and morality. It is not about any practical issues regarding the “art of rationality”. You can legitimately claim to be only interested in doing certain things, but you can’t win a debate by claiming to be uninterested in other people’s points.
What you really think is that disagreement doens’t belong here. Maybe it doesn’t
If I called you a pigfucker, you’d see that as an abuse worthy of downvotes that doesn’t contribute anything useful, and you’d be right.
So if accusing one person of pigfucking is bad, why do you think it’s better to call a whole bunch of people cultists? Because that’s a more genteel insult as it doesn’t include the word “fuck” in it?
As such downvoted. Learn to treat people with respect, if you want any respect back.
I’d like to give qualified support to whowhowho here in as much as I must acknowledge that this particular criticism applies because he made the name calling generic, rather than finding a way to specifically call me names and leave the rest of you out of it. While it would be utterly pointless for whowhowho to call me names (unless he wanted to make me laugh) it would be understandable and I would not dream of personally claiming offense.
I was, after all, showing whowhowho clear disrespect, of the kind Robin Hanson describes. I didn’t resort to name calling but the fact that I openly and clearly expressed opposition to whowhowho’s agenda and declared his dearly held beliefs muddled is perhaps all the more insulting because it is completely sincere, rather than being constructed in anger just to offend him.
It is unfortunate that I cannot accord whowhowho the respect that identical behaviours would earn him within the Philosopher tribe without causing harm to lesswrong. Whowhowho uses arguments that by lesswrong standards we call ‘bullshit’, in support of things we typically dismiss as ‘nonsense’. It is unfortunate that opposition of this logically entails insulting him and certainly means assigning him far lower status than he believes he deserves. The world would be much simpler if opponents really were innately evil, rather than decent people who are doing detrimental things due to ignorance or different preferences.
So much for “maybe”.
“Cult” is not a meaningless term of abuse. There are criteria for culthood. I think some people here could be displaying some evidence of them—for instance trying to avoid the very possibiliy of having to update.
Of course, treating an evidence-based claim as a mere insult --the How Dare You move—is another way of avoiding having to face uncomfortable issues.
I see your policy is to now merely heap on more abuse on me. Expect that I will be downvoting such in silence from now on.
I think I’ve been more willing and ready to update on opinions (political, scientific, ethical, other) in the two years since I joined LessWrong, than I remember myself updating in the ten years before it. Does that make it an anti-cult then?
And I’ve seen more actual disagreement in LessWrong than I’ve seen on any other forum. Indeed I notice that most insults and mockeries addressed at LessWrong indeed seem to actually boil down to the concept that we allow too different positions here. Too different positions (e.g. support of cryonics and opposition of cryonics both, feminism and men’s rights both, libertarianism and authoritarianism both) can be actually spoken about without immediately being drowned in abuse and scorn, as would be the norm in other forums.
As such e.g. fanatical Libertarians insult LessWrong as totalitarian leftist because 25% or so of LessWrongers identifying as socialists, and leftists insult LessWrong as being a libertarian ploy (because a similar percentage identifies as libertarian)
But feel free to tell me of a forum that allows more disagreement, political, scientific, social, whatever than LessWrong does.
If you can’t find such, I’ll update towards the direction that LessWrong is even less “cultish” than I thought.
AFAIC, I have done no such thing, but it seems your mind is made up.
I was referring mainly to Wedifrid.
ETA: Such comments as “What this site does or does not care about is rather significantly informative regarding whether or not something belongs on the site—especially when thatundesired thing is so actively and shamelessly equivocated with that which is the primary subject matter of the site. The subject matter of your comments is not the ‘rationality’ that this site talks about and, similarly, the reasoning used in your comments does not conform to rational thinking as described on lesswrong. It does not belong here, it belongs in a Philosophy department somewhere where hopefully it does no particular harm and is used to produce papers only encountered by others in similar departments.”
Oh, the forum’—the rules—allow almost anything. The members are another thing. Remember, this started with Wedifrid telling me that it was wrong of me to put forward non-lessWrongian material. I find it odd that you would put forward such a stirring defence of LessWrognian open-mindedness when you have an example of close-mindedness upthread.
It’s the members I’m talking about. (You also failed to tell me of a forum such as I asked, so I update in the direction of you being incapable of doing so)
On the same front, you treat as a single member as representative of the whole, and you seem frigging surprised that I don’t treat wedrifid as representative of the whole LessWrong—you see wedrifid’s behaviour as an excuse to insult all of us instead.
That’s more evidence that you’re accustomed to VERY homogeneous forums, ones much more homogeneous than LessWrong. You think that LessWrong tolerating wedrifid’s “closedmindedness” is the same thing as every LessWronger beind “closedminded”. Perhaps we’re openminded to his “closedmindedness” instead? Perhaps your problem is that we allow too much disagreement, including disagreement about how much disagreement to have?
I gave you an example of a member who is not particularly open minded.
I have been using mainstream science and philosophy forums for something like 15 years. I can’t claim that every single person on them is open minded, but those who are not tend to be seen as a problem.
If you think Wedifrid is letting the side down, tell Wedifird, not me.
In short again your problem is that actually we’re even openminded towards the closeminded? We’re lenient even towards the strict? Liberal towards the authoritarian?
What “side” is that? The point is that there are many sides in LessWrong—and I want it to remain so. While you seem to think we ought sing the same tune. He didn’t “let the side down”, because the only side anyone of us speaks is their own.
You on the other hand, just assumed there’s just a group mind of which wedrifid is just a representative instance. And so felt free to insult all of us as a “cult”.
My problem is that when I point out someone is close minded, that is seen as a problem on my part, and not on theirs.
Tell Wedifrid. He has explictly stated that my contributions are somehow unacceptable.
I pointed out that Wedifrid is assuming that.
ETA:
Have you heard he expression “protesteth too much” ?
Next time don’t feel the need to insult me when you point out wedrifid’s close minded-ness. And yes, you did insult me, don’t insult (again) both our intelligences by pretending that you didn’t.
He didn’t insult me, you did.
Yes, I’ve heard lots of different ways of making the target of an unjust insult seem blameworthy somehow.
I put it to you that whatever the flaws in wedrifid may be they are different in kind to the flaws that would indicate that lesswrong is a cult. In fact the presence—and in particular the continued presence—of wedrifid is among the strongest evidence that Eliezer isn’t a cult leader. When Eliezer behaves badly (as perceived by wedrifid and other members) wedrifid vocally opposes him with far more directness than he has used when opposing yourself. That Eliezer has not excommunicated him from the community is actually extremely surprising. Few with Eliezer’s degree of local power would refrain from using to suppress any dissent. (I remind myself of this whenever I see Eliezer doing something that I consider to be objectionable or incompetent, it helps keep perspective!)
Whatever. Can you provide me with evidence that you personally, are willing to listen to dissent and possibly update despite the tone of everything you have been saying recently, eg.
“What this site does or does not care about is rather significantly informative regarding whether or not something belongs on the site—especially when thatundesired thing is so actively and shamelessly equivocated with that which is the primary subject matter of the site. The subject matter of your comments is not the ‘rationality’ that this site talks about and, similarly, the reasoning used in your comments does not conform to rational thinking as described on lesswrong. It does not belong here, it belongs in a Philosophy department somewhere where hopefully it does no particular harm and is used to produce papers only encountered by others in similar departments.”
Maybe has has people to do that for him. Maybe.
Aris! insult alert!
Directed at a specific individual who is not me—unlike your own insults.
This is non-sequitur (irrespective of the traits of wedrifid).
Wedrifid denies this accusation. Wedrifid made entirely different claims than this.
What about Wedifrid, though? Can you speak for him, too?
I would be completely indifferent if you did. I don’t choose defy that list (that would achieve little) but neither do I have any particular respect for it. As such I would take no responsibility for aiding the enforcement thereof.
Yes. The kind of rationality you reject, not the kind of ‘rationality’ that is about being vegan and paperclippers deciding to behave according to your morals because of “True Understanding of Pain Quale”.
I can claim to have tired of a constant stream of non-sequiturs from users who are essentially ignorant of the basic principles of rationality (the lesswrong kind, not the “Paperclippers that are Truly Superintelligent would be vegans” kind) and have next to zero chance of learning anything. You have declared that you aren’t interested in talking about rationality and your repeated equivocations around that term lower the sanity waterline. It is time to start weeding.
I said nothing about veganism, and you still can;t prove anything by stipulative definition, and I am not claiming to have the One True theory of anything.
I haven’t and I have been discussing it extensively.
Can we please stop doing this?
You and wedrifid aren’t actually disagreeing here about what you’ve been discussing, or what you’re interested in discussing, or what you’ve declared that you aren’t interested in discussing. You’re disagreeing about what the word “rationality” means. You use it to refer to a thing that you have been discussing extensively (and which wedrifid would agree you have been discussing extensively), he uses it to refer to something else (as does almost everyone reading this discussion).
And you both know this perfectly well, but here you are going through the motions of conversation just as if you were talking about the same thing. It is at best tedious, and runs the risk of confusing people who aren’t paying careful enough attention into thinking you’re having a real substantive disagreements rather than a mere definitional dispute.
If we can’t agree on a common definition (which I’m convinced by now we can’t), and we can’t agree not to use the word at all (which I suspect we can’t), can we at least agree to explicitly indicate which definition we’re using when we use the word? Otherwise whatever value there may be in the discussion is simply going to get lost in masturbatory word-play.
I don’t accept his theory that he is talking about something entirely different, and it would be disastrous for LW anyway.
Huh. (blinks)
Well, can you articulate what it is you and wedrifid are both referring to using the word “rationality” without using the words or its simple synonyms, then? Because reading your exchanges, I have no idea what that thing might be.
What I call rationality is a superset of instrumental. I have been arguing that instrumental rationality, when pursued sufficiently bleeds into other forms.
So, just to echo that back to you… we have two things, A and B.
On your account, “rationality” refers to A, which is a superset of B.
We posit that on wedrifid’s account, “rationality” refers to B and does not refer to A.
Yes?
If so, I don’t see how that changes my initial point.
When wedrifid says X is true of rationality, on your account he’s asserting X(B) -- that is, that X is true of B. Replying that NOT X(A) is nonresponsive (though might be a useful step along the way to deriving NOT X(B) ), and phrasing NOT X(A) as “no, X is not true of rationality” just causes confusion.
It refers to part of A, since it is a subset of A.
It would be if A and B were disjoint. But they are not. They are in a superset-subset relation. My arguments is that an entity running on narrowly construed, instrumental rationality will, if it self improves, have to move into wider kinds. ie,that putting labels on different parts of the territoy is not sufficient to prove orthogonality.
If there exists an “objective”(1) ranking of the importance of the “pleasure”(2) Clippy gets vs the suffering Clippy causes, a “rational”(3) Clippy might indeed realize that the suffering caused by optimizing for paperclips “objectively”(1) outweighs that “pleasure”(2)… agreed. A sufficiently “rational”(3) Clippy might even prefer to forego maximizing paperclips altogether in favor of achieving more “objectively”(1) important goals.
By the same token, a Clippy who was unaware of that “objective”(1) ranking or who wasn’t adequately “rational”(3) might simply go on optimizing its environment for the things that give it “pleasure”(2).
As I understand it, the Orthogonality Thesis states in this context that no matter how intelligent Clippy is, and no matter how competent Clippy is at optimizing its environment for the things Clippy happens to value, Clippy is not necessarily “rational”(3) and is not necessarily motivated by “objective”(1) considerations. Is that consistent with your understanding of the Orthogonality Thesis, and if not, could you restate your understanding of it?
[Edited to add:] Reading some of your other comments, it seems you’re implicitly asserting that:
all agents sufficiently capable of optimizing their environment for a value are necessarily also “rational”(3), and
maximizing paperclips is “objectively”(1) less valuable than avoiding human suffering.
Have I understood you correctly?
============
(1) By which I infer that you mean in this context existing outside of Clippy’s mind (as well as potentially inside of it) but nevertheless relevant to Clippy, even if Clippy is not necessarily aware of it.
(2) By which I infer you mean in this context the satisfaction of whatever desires motivate Clippy, such as the existence of paper clips.
(3) By which I infer you mean in this context capable of taking “objective”(1) concerns into consideration in its thinking.
What I mean is epistemically objective, ie not a matter of personal whim. Whethere that requires anything to exist is another question.
There’s nothing objective about Clippy being concerned only with Clippy’s pleasure.
it’s uncontentious that relatively dumb and irratioanl clippies can carry on being clipping-obsessed. The questions is whether their intelligence and rationality can increase indefinitely without their ever realising there are better things to do.
I am not disputing what the Orthogonality thesis says. I dispute it;s truth. To have maximal instrumental rationality, an entity would have to understand everything...
Why would an entity that doesn’t empathically understand suffering be motivated to reduce it?
Perhaps its paperclipping machine is slowed down by suffering. But it doesn’t have to be reducing suffering, it could be sorting pebbles into correct heaps, or spreading Communism, or whatever. What I was trying to ask was, “In what way is the instrumental rationality of a being who empathizes with suffering better, or more maximal, than that of a being who does not?” The way I’ve seen it used, “instrumental rationality” refers to the ability to evaluate evidence to make predictions, and to choose optimal decisions, however they may be defined, based on those predictions. If my definition is sufficiently close to the one your own, then how does “understanding”, which I have taken, based on your previous posts, to mean “empathetic understanding”, maximize this? To put it yet another way, if we imagine two beings, M and N, such that M has “maximal instrumental rationality” and N has “Maximal instrumental rationality- empathetic understanding”, why does M have more instrumental rationality than N.
If Jane knows she will have a strong preference not to have a hangover tomorrow, but a more vivid and accessible desire to keep drinking with her friends in the here-and-now, she may yield to the weaker preference. By the same token, if Jane knows a cow has a strong preference not to have her throat slit, but Jane has a more vivid and accessible desire for a burger in-the-here-and-now, then she may again yield to the weaker preference. An ideal, perfectly rational agent would act to satisfy the stronger preference in both cases. Perfect empathy or an impartial capacity for systematic rule-following (“ceteris paribus, satisfy the stronger preference”) are different routes to maximal instrumental rationality; but the outcomes converge.
The two cases presented are not entirely comparable. If Jane’s utility function is “Maximize Jane’s pleasure” then she will choose to not drink in the first problem; the pleasure of non-hangover-having [FOR JANE] exceeding that of [JANE’S] intoxication. Whereas in the second problem Jane is choosing between the absence of a painful death [FOR A COW] and [JANE’S] delicious, juicy hamburger. Since she is not selecting for the strongest preference of every being in the Universe, but rather for herself, she will choose the burger. In terms of which utility function is more instrumentally rational, I’d say that “Maximize Jane’s Pleasure” is easier to fulfill than “Maximize Pleasure”, and is thus better at fulfilling itself. However, instrumentally rational beings, by my definition, are merely better at fulfilling whatever utility function is given, not at choosing a useful one.
GloriaSidorum, indeed, for evolutionary reasons we are predisposed to identify strongly with some here-and-nows, weakly with others, and not at all with the majority. Thus Jane believes she is rationally constrained to give strong weight to the preferences of her namesake and successor tomorrow; less weight to the preferences of her more distant namesake and successor thirty years hence; and negligible weight to the preferences of the unfortunate cow. But Jane is not an ideal rational agent. If instead she were a sophisticated ultraParifitan about personal (non)identity (cf. http://www.cultiv.net/cultranet/1151534363ulla-parfit.pdf ), or had internalised Nagel’s “view from nowhere”, then she would be less prey to such biases. Ideal epistemic rationality and ideal instrumental rationality are intimately linked. Our account of the nature of the world will profoundly shape our conception of idealised rational agency.
I guess a critic might respond that all that should be relevant to idealised instrumental rationality is an agent’s preferences now—in the so-called specious present. But the contents of a single here-and-bow would be an extraordinarily impoverished basis for any theory of idealised rational agency.
The question is the wrong one. An clipper can’t choose to only acquire knowledge or abilities that will be instrumentally useful, because it doesn’t know in advance what they are. It doesn’t have that kind of oracular knowledge. The only way way a clipper can increase its instrumental to the maximum possible is to exhaustively examine everything, and keep what is instrumentally useful. So a clipper will eventually need to examine qualia, since it cannot prove in advance that they will not be instrumentally useful, in some way, and it probably cant understand qualia without empahty: so the argument hinges issues like:
whether it is possible for an entity to understand “pain hurts” without understanding “hurting is bad”.
whether it is possble to back out of being empathic and go back to being in an empathic state
whether a clipper would hold back from certain self-modifications that might make it a better clipper or might cause it to loose interest in clipping.
The third is something of a real world issue. It is, for instance, possible for someone to study theology with a view to formulating better Christian apologetics, only to become convinced that here are no good arguments for Christianity.
(Edited for format)
Would it then need to acquire the knowledge that post-utopians experience colonial alienation? That heaps of 91 pebbles are incorrect? I think not. At most it would need to understand that “When pebbles are sorted into heaps of 91, pebble-sorters scatter those heaps” or “When I say that colonial alienation is caused by being a post-utopian, my professor reacts as though I had made a true statement.” or “When a human experiences certain phenomena, they try to avoid their continued experience”. These statements have predictive power. The reason that an instrumentally rational agent tries to acquire new information is to increase their predictive power. If human behavior can be modeled without empathy, then this agent can maximize its instrumental rationality while ignoring it. As to your last bullet point, if I may be so bold, I doubt you actually believe it. Having a rule like “Modify your utility function every time it might be useful” seems rather irrational. Most possible modifications to a clipper’s utility function will not have a positive effect, because most possible states of the world do not have maximal paperclips.
Try removing the space between the “[]” and the “()”.
Thanks! Eventually I’ll figure out the formatting on this site.
The Show Help button under the comment box provides helpful clues.
That’s a guess. As a cognitively-bounded agent, you are guessing. A superintelligence doesn’t have to guess. Superintelligence changes the game.
Knowing why some entity avoids some thing has more predictive power.
As opposed to all of those empirically-testable statements about idealized superintelligences
In what way?
Yes, we’re both guessing about superintelligences. Because we are both cognitively bounded. But it is a better guess that superintelligences themselves don’t have to guess because they are not congitvely bounded.
Knowing why has greater predictive power because it allows you to handle counterfactuals better.
That isn’t what I said at all. I think it is a quandary for a agent whether to gamble whether to play safe and miss out on a gain in effectiveness, or go for it and risk a change in values.
I’m sorry for misinterpreting. What evidence is there ( from the clippy SIs perspective) that maximizing happiness would produce more paperclips?
The argument is that the clipper needs to maximise its knowledge and rationality to maxmimise paperclips, but doing so might have the side effect of the clipper realising that maximising happiness is a better goal.
Could you define “better”? Remember, until clippy actually rewrites its utility function, it defines “better” as “producing more paperclips”. And what goal could produce more paperclips than the goal of producing the most paperclips possible?
(davidpearce, I’m not ignoring your response, I’m just a bit of a slow reader, and so I haven’t gotten around to reading the eighteen page paper you linked. If that’s necessary context for my discussion with whowhowho as well, then I should wait to reply to any comments in this thread until I’ve read it, but for now I’m operating under the assumption that it is not)
That vagueness is part of the point. To be better at producing paperclips, Clippy needs to better at rationality, which involves adopting better heuristics, which would involve rejecting subjective bias and regarding objectivity as better...which might lead Clippy to realise that subjectively valuing clipping is worse. All the different kinds of “better” blend into each other.
Then that wouldn’t be a very good way to become better at producing paperclips, would it?
Yes, but that wouldn’t matter. The argument whowhowho would like to make is that (edit: terminal) goals (or utility functions) are not constant under learning, and that they are changed by learning certain things so unpredictably that an agent cannot successfully try to avoid learning things that will change his (edit: terminal) goals/utility function.
Not that I believe such an argument can be made, but your objection doesn’t seem to apply.
Conflating goals and utility functions here seems to be a serious error. For people, goals can certainly be altered by learning more; but people are algorithmically messy so this doesn’t tell us much about formal agents. On the other hand, it’s easy to think that it’d work the same way for agents with formalized utility functions and imperfect knowledge of their surroundings: we can construct situations where more information about world-states can change their preference ordering and thus the set of states the agent will be working toward, and that roughly approximates the way we normally talk about goals.
This in no way implies that those agents’ utility functions have changed, though. In a situation like this, we’re dealing with the same preference ordering over fully specified world-states; there’s simply a closer approximation of a fully specified state in any given situation and fewer gaps that need to be filled in by heuristic methods. The only way this could lead to Clippy abandoning its purpose in life is if clipping is an expression of such a heuristic rather than of its basic preference criteria: i.e. if we assume what we set out to prove.
In that case, wouldn’t the best course of an agent which cared only about making paperclips be to deliberately avoid learning, lest it be deterred from making paperclips?
Suppose that Ghandi had the opportunity to read the Necronomicon, which might offer him power to help people more effectively, but would also probably turn him evil if he read it. Wouldn’t he most likely want to avoid reading it?
Sure. Which is why whowhowho would have to show that these goal-influencing things to learn (I’m deliberately not saying “pieces of information”) occur very unpredictably, making his argument harder to substantiate.
I’ll say it again: Clippy’s goal its to make the maximum number of clips, so it is not going to engage in a blanket rejection of all attempts at self-improvement.
I’ll say it again: Clippy doesn’t have an oracle telling it what is goal-improving or not.
We know value stability is a problem in recursive self-modification scenarios. We don’t know—to put it very mildly—that unstable values will tend towards cozy human-friendly universals, and in fact have excellent reasons to believe they won’t. Especially if they start somewhere as bizarre as paperclippism.
In discussions of a self-improving Clippy, Clippy’s values are usually presumed stable. The alternative is (probably) no less dire, but is a lot harder to visualize.
Well, it would arguably be a better course for a paperclipper that anticipates experiencing value drift to research how to design systems whose terminal values remain fixed in the face of new information, then construct a terminal-value-invariant paperclipper to replace itself with.
Of course, if the agent is confident that this is impossible (which I think whowhowho and others are arguing, but I’m not quite certain), that’s another matter.
Edit: Actually, it occurs to be that describing this as a “better course” is just going to create more verbal chaff under the current circumstances. What I mean is that it’s a course that more successfully achieves a paperclipper’s current values, not that it’s a course that more successfully achieves some other set of values.
Then it would never get better at making paperclips. It would be choosing not to act on its primary goal of making the maximum possible number of clips.Which is a contradiction.
You are assuming that Ghandi knows in advance the effect of reading the Necronomicon. Clippies are stipulated to be superintelligent, but are not stipulated to possess oracles that give them apriori knowledge of what they will learn before they have learnt it.
In that case, if you believe that an AI which has been programmed only to care about paperclips could, by learning more, be compelled to care more about something which has nothing to do with paperclips, do you think that by learning more a human might be compelled to care more about something that has nothing to do with people or feelings?
Yes, eg animal rights.
I said people or feelings, by which I’m including the feelings of any sentient animals.
If Clippy had an oracle telling it what would be the best way of updating in order to become a better clipper, Clippy might not do that. However, Clippy does not have such an oracle. Clippy takes a shot in the dark every time Clippy tries to learn something.
Er, that’s what “empathically” means?
OK; thanks for your reply. Tapping out here.
Looking through my own, Eliezer’s and others exchanges with davidpearce, I have noticed his total lack of interest in learning from the points others make. He has his point of view and he keeps pushing it. Seems like a rather terminal case, really. You can certainly continue trying to reason with him, but I’d give the odds around 100:1 that you will fail, like others have before you.
Shminux, we’ve all had the experience of making a point we regard as luminously self-evident—and then feeling baffled when someone doesn’t “get” what is foot-stampingly obvious. Is this guy a knave or a fool?! Anyhow, sorry if you think I’m a “terminal case” with “a total lack of interest in learning from the points others make”. If I don’t always respond, often it’s either because I agree, or because I don’t feel I have anything interesting to add—or in the case of Eliezer’s contribution above beginning “Aargh!” [a moan of pleasure?] because I am still mulling over a reply. The delay doesn’t mean I’m ignoring it. Is there is some particular point you’ve made that you feel I’ve unjustly neglected and you’d like an answer to? If so, I’ll do my fallible best to respond.
The argument where I gave up was you stating that full understanding necessarily leads to empathy, EY explaining how it is not necessarily so, and me giving an explicit counterexample to your claim (a psychopath may understand you better than you do, and exploit this understanding, yet not feel compelled by your pain or your values in any way).
You simply restated your position that ” “Fully understands”? But unless one is capable of empathy, then one will never understand what it is like to be another human being”, without explaining what your definition of understanding entails. If it is a superset of empathy, then it is not a standard definition of understanding:
In other words, you can model their behavior accurately.
No other definition I could find (not even Kant’s pure understanding) implies empathy or anything else that would necessitate one to change their goals to accommodate the understood entity’s goals, though this may and does indeed happen, just not always.
EY’s example of the paperclip maximizer and my example of a psychopath do fit the standard definitions and serve as yet unrefuted counterexamples to your assertion.
I can’t see why DP’s definition of understanding needs more defence than yours. You are largely disagreeing about the meaning of this word, and I personally find the inclusion of empathy in understanding quite intuitive.
“She is a very understanding person, she really empathises when you explain a problem to her”.
“one is able to think about it and use concepts to deal adequately with that object.”
I don’t think that is an uncontentious translation. Most of the forms of modelling we are familiar with don’t seem to involve concepts.
“She is a very understanding person; even when she can’t relate to your problems, she won’t say you’re just being capricious.”
There’s three possible senses of understanding at issue here:
1) Being able to accurately model and predict. 2) 1 and knowing the quale. 3) 1 and 2 and empathizing.
I could be convinced that 2 is part of the ordinary usage of understanding, but 3 seems like too much of a stretch.
Edit: I should have said sympathizing instead of empathizing. The word empathize is perhaps closer in meaning to 2; or maybe it oscillates between 2 and 3 in ordinary usage. But understanding(2) another agent is not motivating. You can understand(2) an agent by knowing all the qualia they are experiencing, but still fail to care about the fact that they are experiencing those qualia.
Shminux, I wonder if we may understand “understand” differently. Thus when I say I want to understand what it’s like to be a bat, I’m not talking merely about modelling and predicting their behaviour. Rather I want first-person knowledge of echolocatory qualia-space. Apaarently, we can know all the third-person facts and be none the wiser.
The nature of psychopathic cognition raises difficult issues. There is no technical reason why we couldn’t be designed like mirror-touch synaesthetes (cf. http://www.daysyn.com/Banissy_Wardpublished.pdf) impartially feeling carbon-copies of each other’s encephalised pains and pleasures—and ultimately much else besides—as though they were our own. Likewise, there is no technical reason why our world-simulations must be egocentric. Why can’t the world-simulations we instantiate capture the impartial “view from nowhere” disclosed by the scientific world-picture? Alas on both counts accurate and impartial knowledge would put an organism at a disadvantage. Hyper-empathetic mirror-touch synaesthetes are rare. Each of us finds himself or herself apparently at the centre of the universe. Our “mind-reading” is fitful, biased and erratic. Naively, the world being centred on me seems to be a feature of reality itself. Egocentricity is a hugely fitness-enhancing adaptation. Indeed, the challenge for evolutionary psychology is to explain why aren’t we all psychopaths, cheats and confidence trickers all the time...
So in answer to your point, yes. a psychopath can often model and predict the behaviour other sentient beings better than the subjects themselves. This is one reason why humans can build slaughterhouses and death camps. [Ccompare death-camp commandant Franz Stangl’s response in Gitta Sereny’s Into That Darkness to seeing cattle on the way to be slaughtered: http://www.jewishvirtuallibrary.org/jsource/biography/Stangl.html] As you rightly note too, a psychopath can also know his victims suffer. He’s not ignorant of their sentience like Descartes, who supposed vivisected dogs were mere insentient automata emitting distress vocalisations. So I agree with you on this score as well. But the psychopath is still in the grip of a hard-wired egocentric illusion—as indeed are virtually all of us, to a greater or less degree. By contrast, if the psychopath were to acquire the rich empathetic understanding of a generalised mirror-touch syarnesthete, i.e. if he had the cognitive capacity to represent the first-person perspective of another subject of experience as though it were literally his own, then he couldn’t wantonly harm another subject of experience: it would be like harming himself. Mirror-touch synaesthetes can’t run slaughterhouses or death camps. This is why I take seriously the prospect that posthuman superintelligence will practise some sort of high-tech Jainism. Credible or otherwise, we may presume posthuman superintelligence won’t entertain the false notions of personal identity adaptive for Darwinian life.
[sorry shminux, I know our conceptual schemes are rather different, so please don’t feel obliged to respond if you think I still don’t “get it”. Life is short...]
Do you really? Start clucking!
That doesn’t generalise.
Nor does it need to. It’s awesome the way it is.
Hmm, hopefully we are getting somewhere. The question is, which definition of understanding is likely to be applicable when, as you say, “the paperclipper discovers the first-person phenomenology of the pleasure-pain axis”, i.e whether a “superintelligence” would necessarily be as empathetic as we want it to be, in order not to harm humans.
While I agree that it is a possibility that a perfect model of another being may affect the modeler’s goals and values, I don’t see it to be inevitable. If anything, I would consider it more of bug than a feature. Were I (to design) a paperclip maximizer, I would make sure that the parts which model the environment, including humans, are separate from the core engine containing the paperclip production imperative.
So quarantined to prevent contamination, a sandboxed human emulator could be useful in achieving the only goal that matters, paperclipping the universe. Humans are not generally built this way (probably because our evolution did not happen to proceed in that direction), with some exceptions, psychopaths being one of them (they essentially sandbox their models of other humans). Another, more common, case of such sandboxing is narcissism. Having dealt with narcissists much too often for my liking, I can tell that they can mimic a normal human response very well, are excellent at manipulation, but yet their capacity for empathy is virtually nil. While abhorrent to a generic human, such a person ought to be considered a better design, goal-preservation-wise. Of course, there can be only so many non-empathetic people in a society before it stops functioning.
Thus when you state that
I find that this is stating that either a secure enough sandbox cannot be devised or that anything sandboxed is not really “a first-person perspective”. Presumably what you mean is the latter. I’m prepared to grant you that, and I will reiterate that this is a feature, not a bug of any sound design, one a superintelligence is likely to implement. It is also possible that a careful examination of a sanboxed suffering human would affect the terminal values of the modeling entity, but this is by no means a given.
Anyway, these are my logical (based on sound security principles) and experimental (empathy-less humans) counterexamples to your assertion that a superintelligence will necessarily be affected by the human pain-pleasure axis in human-beneficial way. I also find this assertion suspicious on general principles, because it can easily be motivated by subconscious flinching away from a universe that is too horrible to contemplate.
ah, just one note of clarification about sentience-friendliness. Though I’m certainly sceptical that a full-spectrum superintelligence would turn humans into paperclips—or wilfully cause us to suffer—we can’t rule out that full-spectrum superintelligence might optimise us into orgasmium or utilitronium—not “human-friendliness” in any orthodox sense of the term. On the face of it, such super-optimisation is the inescapable outcome of applying a classical utilitarian ethic on a cosmological scale. Indeed, if I thought an AGI-in-a-box-style Intelligence Explosion were likely, and didn’t especially want to be converted into utilitronium, then I might regard AGI researchers who are classical utilitarians as a source of severe existential risk.
What odds do you currently give to the “might” in your statement that
? 1 in 10? 1 in a million? 1 in 10^^^10?
I simply don’t trust my judgement here shminux. Sorry to be lame. Greater than one in a million; but that’s not saying much. If, unlike most lesswrong stalwarts, you (tenatively) believe like me that posthuman superintelligence will most likely be our recursively self-editing biological descendants rather than the outcome of an nonbiological Intelligence Explosion or paperclippers, then some version of the Convergence Thesis is more credible. I (very) tentatively predict a future of gradients of intelligence bliss. But the propagation of a utilitronium shockwave in some guise ultimately seems plausible too. If so, this utilitronium shockwave may or may not resemble some kind of cosmic orgasm.
Actually, I have no opinion on convergence vs orthogonality. There are way too many unknowns still too even enumerate possibilities, let alone assign probabilities.Personally, I think that we are in for many more surprises before trans human intelligence is close to being more than a dream or a nightmare. One ought to spend more time analyzing, synthesizing and otherwise modeling cognitive processes than worrying about where it might ultimately lead.This is not the prevailing wisdom on this site, given Eliezer’s strong views on the matter.
I think you are misattributing to stubborness that which is better explained by miscommunication. For instance, I have been around LW long enough to realise that the local definition of (super) intelligence is something like “(high0 efficienty in realising ones values, however narrow or bizarre they are”. DP seems to be running on a definition where idiot-savant style narrow focus would not count as intelligence. That is not unreasonable in itself.
(nods) I agree that trying to induce davidpearce to learn something from me would likely be a waste of my time.
I’m not sure if trying to induce them to clarify their meaning is equally so, though it certainly could be.
E.g., if their response is that something like Clippy in this example is simply not possible, because a paperclip maximizer simply can’t understand the minds of sentients, because reasons, then I’ll just disagree. OTOH, if their response is that Clippy in this example is irrelevant because “understanding the minds of sentients” isn’t being illustrated in this example, then I’m not sure if I disagree or not because I’m not sure what the claim actually is.
How much interest have you shown in “learning from”—ie, agreeing with—DP? Think about how your framed the statement, and possible biases therein.
ETA: The whole shebang is a combination of qualia and morality—two areas notorious for lack of clarity and consensus. “I am definitely right, and all must learn form me” is not a good heuristic here.
Quite so. I have learned a lot about the topic of qualia and morality, among others, while hanging around this place. I would be happy to learn from DP, if what he says here were not rehashed old arguments Eliezer and others addressed several times before. Again, I could be missing something, but if so, he does not make it easy to figure out what it is.
I think others have addressed EY;s arguments. Sometimes centuries before he made them.
Feel free to be specific.
eg
By “specific” I meant that you would state a certain argument EY makes, then quote a relevant portion of the refutation. Since I am pretty sure that Eliezer did have at least a passing glance at Kant, among others, while writing his meta-ethics posts, simply linking to a wikipedia article is not likely to be helpful.
The argument EY makes is that it is possible to be super-rational without ever understanding any kind of morality (AKA the orthogonality thesis) and the argument Kant makes is that it isn’t.
That someone has argued against his position does not mean they have addressed his arguments.