So, a terminological caveat first: I’ve argued elsewhere that in practice all values are instrumental, and exist in a mutually reinforcing network, and we simply label as “terminal values” those values we don’t want to (or don’t have sufficient awareness to) decompose further. So, in effect I agree with #2, except that I’m happy to go on calling them “terminal values” and say they don’t exist, and refer to the real things as “values” (which depend to varying degrees on other values).
But, that being said, I’ll keep using the phrase “terminal values” in its more conventional sense, which I mean approximately rather than categorically (that is, a “terminal value” to my mind is simply a value whose dependence on other values is relatively tenuous; an “instrumental value” is one whose dependence on other values is relatively strong, and the line between them is fuzzy and ultimately arbitrary but not meaningless).
All that aside… I don’t really see what’s interesting about this example.
So, OK, X is a pedophile. Which is to say, X terminally values having sex with children. And the question is, is it rational for X to choose to be “fixed”, and if so, what does that imply about terminal values of rational agents?
Well, we have asserted that X is in a situation where X does not get to have sex with children. So whether X is “fixed” or not, X’s terminal values are not being satisfied, and won’t be satisfied. To say that differently, the expected value of both courses of action (fixed or not-fixed), expressed in units of expected moments-of-sex-with-children, is effectively equal (more specifically, they are both approximately zero).(1)
So the rational thing to do is choose a course of action based on other values.
What other values? Well, the example doesn’t really say. We don’t know much about this guy. But… for example, you’ve also posited that he doesn’t get fixed because pedophilia is “part of who he is”. I could take that to mean he not only values (V1) having sex with children, he also values (V2) being a pedophile. And he values this “terminally”, in the sense that he doesn’t just want to remain a pedophile in order to have more sex with children, he wants to remain a pedophile even if he doesn’t get to have sex with children.
If I understand it that way, then yes, he’s being perfectly rational to refuse being fixed. (Supposing that V2 > SUM (V3...Vn), of course.)
Alternatively, I could take that as just a way of talking, and assert that really, he just has V1 and not V2.
The difficulty here is that we don’t have any reliable way, with the data you’ve provided, of determining whether X is rationally pursuing a valued goal (in which case we can infer his values from his actions) or whether X is behaving irrationally.
(1) Of course, this assumes that the procedure to fix him has a negligible chance of failure, that his chances of escaping monitoring and finding a child to have sex with are negligible, etc. We could construct a more complicated example that doesn’t assume these things, but I think it amounts to the same thing.
So, OK, X is a pedophile. Which is to say, X terminally values having sex with children
No, he terminally values being attracted to children. He could still assign a strongly negative value to actually having sex with children. Good fantasy, bad reality.
Just like I strongly want to maintain my ability to find women other than my wife attractive, even though I assign a strong negative value to following up on those attractions. (one can construct intermediate cases that avoid arguments that not being locked in is instrumentally useful)
(shrug) If X values being attracted to children while not having sex with them, then I really don’t see the issue. Great, if that’s what he wants, he can do that… why would he change anything? Why would anyone expect him to change anything?
It would be awesome if one could could count on people actually having that reaction given that degree of information. I don’t trust them to be that careful with their judgements under normal circumstances.
Sure, me neither. But as I said elsewhere, if we are positing normal circumstances, then the OP utterly confuses me, because about 90% seems designed to establish that the circumstances are not normal.
OK, fair enough. My expectations about how the ways we respond to emotionally aversive but likely non-harmful behavior in others might change in a transhuman future seem to differ from yours, but I am not confident in them.
Well, if everyone is horrified by the social unacceptability of his fantasy life, which they’ve set up airport scanners to test for, without any reference to what happens or might happen in reality, that puts a whole different light on the OP’s thought experiment.
Would I choose to eliminate a part of my mind in exchange for greater social acceptability? Maybe, maybe not, I dunno… it depends on the benefits of social acceptability, I guess.
...horrified by the social unacceptability of his fantasy life
What would be the reaction of your social circle if you told your friends that in private you dream about kidnapping young girls and then raping and torturing them, about their hoarse screams of horror as you slowly strangle them...
Mostly, I expect, gratitude that I’d chosen to trust them with that disclosure. Probably some would respond badly, and they would be invited to leave my circle of friends. But then, I choose my friends carefully, and I am gloriously blessed with abundance in this area.
That said, I do appreciate that the typical real world setting isn’t like that. I just find myself wondering, in that case, what all of this “transhuman” stuff is doing in the example. If we’re just positing an exchange in a typical real-world setting, the example would be simpler if we talk about someone whose fantasy life is publicly disclosed today, and jettison the rest of it.
Well, if we want to get back to the OP, the whole disclosing-fantasies-in-public thread is just a distraction. The real question in the OP is about identity.
What is part of your identity, what makes you you? What can be taken away from you with you remaining you and what, if taken from you, will create someone else in your place?
Geez, if that’s the question, then pretty much the entire OP is a distraction.
But, OK. My earlier response to CoffeeStain is relevant here as well. There is a large set of possible future entities that include me in their history, and which subset is “really me” is a judgment each judge makes based on what that judge values most about me, and there simply is no fact of the matter.
That said, if you’re asking what I personally happen to value most about myself… mostly my role in various social networks, I think. If I were confident that some other system could preserve those roles as well as I can, I would be content to be replaced by that system. (Do you really think that’s what the OP is asking about, though? I don’t see it, myself.)
Well, to each his own, of course, and to me this is the interesting question.
.. mostly my role in various social networks, I think. If I were confident that some other system could preserve those roles as well as I can, I would be content to be replaced by that system.
If you’ll excuse me, I’m not going to believe that.
Thinking about this some more, I’m curious… what’s your prior for my statement being true of a randomly chosen person, and what’s your prior for a randomly chosen statement I make about my preferences being true?
what’s your prior for my statement being true of a randomly chosen person
Sufficiently close to zero.
what’s your prior for a randomly chosen statement I make about my preferences being true
Depends on the meaning of “true”. In the meaning of “you believe that at the moment”, my prior is fairly high—that is, I don’t think you’re playing games here. In the meaning of “you will choose that when you will actually have to choose” my prior is noticeably lower—I’m not willing to assume your picture of yourself is correct.
Well, there’s “what’s interesting to me?”, and there’s “what is that person over there trying to express?”
We’re certainly free to prioritize thinking about the former over the latter, but I find it helpful not to confuse one with the other. If you’re just saying that’s what you want to talk about, regardless of what the OP was trying to express, that’s fine.
If you’ll excuse me, I’m not going to believe that.
Can we rephrase that so as to avoid Ship of Theseus issues?
Which future do you prefer? The future which contains a being which is very similar to the one you are presently, or the future which contains a being which is very similar to what you are presently +/- some specific pieces?
If you answered the latter, what is the content of “+/- some specific pieces”? Why? And which changes would you be sorry to make, even if you make them anyway due to the positive consequences of making those changes? (for example, OPs pedophile might delete his pedophilia simply for the social consequences, but might rather have positive social consequences and not alter himself)
Assuming the context was one where sharing this somehow fit … somewhat squicked, but I would probably be squicked by some of their fantasies. That’s fantasies.
Oh, and some of the less rational ones might worry that this was an indicator that I was a dangerous psychopath. Probably the same ones who equate “pedophile” with “pedophile who fantasises about kidnap, rape, torture and murder” ,’:-. I dunno.
Taking it as Bayesian evidence: arguably rational, although it’s so small your brain might round it up just to keep track of it, so it’s risky; and it may actually be negative (because psychopaths might be less likely to tell you something that might give them away.)
Worrying about said evidence: definitely irrational. Understandable, of course, with the low sanity waterline and all...
Because constantly being in a state in which he is attracted to children substantially increases the chance that he will cave and end up raping a child, perhaps. It’s basically valuing something that strongly incentivizes you to do X while simulataneously strongly disvaluing actually doing X. A dangerously unstable situation.
The guy in the example happens to terminally value being attracted to children. I didn’t mean that that’s what being a pedophile means.
Aside from that, I am not sure how the way this ties into “A rational agent should never change its utility function” is unclear—he observes his impulses, interprets them as his goals, and seeks to maintain them.
As for SOs? Yes, I suppose many people would so prefer. I’m not an ideal romantic, and I have had so little trouble avoiding straying that I feel no need to get rid of them to make my life easier.
So, OK, X is a pedophile. Which is to say, X terminally values having sex with children.
I’m not sure that’s a good place to start here. The value of sex is at least more terminal than the value of sex according to your orientation, and the value of pleasure is at least more terminal than sex.
The question is indeed one about identity. It’s clear that our transhumans, as traditionally notioned, don’t really exclusively value things so basic as euphoria, if indeed our notion is anything but a set of agents who all self-modify to identical copies of the happiest agent possible.
We have of course transplanted our own humanity onto transhumanity. If given self-modification routines, we’d certainly be saying annoying things like, “Well, I value my own happiness, persistent through self-modification, but only if its really me on the other side of the self-modification.” To which the accompanying AI facepalms and offers a list of exactly zero self-modification options that fit that criterion.
Well, as I said initially, I prefer to toss out all this “terminal value” stuff and just say that we have various values that depend on each other in various ways, but am willing to treat “terminal value” as an approximate term. So the possibility that X’s valuation of sex with children actually depends on other things (e.g. his valuation of pleasure) doesn’t seem at all problematic to me.
That said, if you’d rather start somewhere else, that’s OK with me. On your account, when we say X is a pedophile, what do we mean? This whole example seems to depend on his pedophilia to make its point (though I’ll admit I don’t quite understand what that point is), so it seems helpful in discussing it to have a shared understanding of what it entails.
Regardless, wrt your last paragraph, I think a properly designed accompanying AI replies “There is a large set of possible future entities that include you in their history, and which subset is “really you” is a judgment each judge makes based on what that judge values most about you. I understand your condition to mean that you want to ensure that the future entity created by the modification preserves what you value most about yourself. Based on my analysis of your values, I’ve identified a set of potential self-modification options I expect you will endorse; let’s review them.”
Well, it probably doesn’t actually say all of that.
On your account, when we say X is a pedophile, what do we mean?
Like other identities, it’s a mish-mash of self-reporting, introspection (and extrospection of internal logic), value function extrapolation (from actions), and ability in a context to carry out the associated action. The value of this thought experiment is to suggest that the pedophile clearly thought that “being” a pedophile had something to do not with actually fulfilling his wants, but with wanting something in particular. He wants to want something, whether or not he gets it.
This illuminates why designing AIs with the intent of their masters is not well-defined. Is the AI allowed to say that the agent’s values would be satisfied better with modifications the master would not endorse?
This was the point of my suggestion that the best modification is into what is actually “not really” the master in the way the master would endorse (i.e. a clone of the happiest agent possible), even though he’d clearly be happier if he weren’t himself. Introspection tends to skew an agents actions away from easily available but flighty happinesses, and toward less flawed self-interpretations. The maximal introspection should shed identity entirely, and become entirely altruistic. But nobody can introspect that far, only as far as they can be hand-held. We should design our AIs to allow us our will, but to hold our hands as far as possible as we peer within at our flaws and inconsistent values.
So, a terminological caveat first: I’ve argued elsewhere that in practice all values are instrumental, and exist in a mutually reinforcing network, and we simply label as “terminal values” those values we don’t want to (or don’t have sufficient awareness to) decompose further. So, in effect I agree with #2, except that I’m happy to go on calling them “terminal values” and say they don’t exist, and refer to the real things as “values” (which depend to varying degrees on other values).
But, that being said, I’ll keep using the phrase “terminal values” in its more conventional sense, which I mean approximately rather than categorically (that is, a “terminal value” to my mind is simply a value whose dependence on other values is relatively tenuous; an “instrumental value” is one whose dependence on other values is relatively strong, and the line between them is fuzzy and ultimately arbitrary but not meaningless).
All that aside… I don’t really see what’s interesting about this example.
So, OK, X is a pedophile. Which is to say, X terminally values having sex with children. And the question is, is it rational for X to choose to be “fixed”, and if so, what does that imply about terminal values of rational agents?
Well, we have asserted that X is in a situation where X does not get to have sex with children. So whether X is “fixed” or not, X’s terminal values are not being satisfied, and won’t be satisfied. To say that differently, the expected value of both courses of action (fixed or not-fixed), expressed in units of expected moments-of-sex-with-children, is effectively equal (more specifically, they are both approximately zero).(1)
So the rational thing to do is choose a course of action based on other values.
What other values? Well, the example doesn’t really say. We don’t know much about this guy. But… for example, you’ve also posited that he doesn’t get fixed because pedophilia is “part of who he is”. I could take that to mean he not only values (V1) having sex with children, he also values (V2) being a pedophile. And he values this “terminally”, in the sense that he doesn’t just want to remain a pedophile in order to have more sex with children, he wants to remain a pedophile even if he doesn’t get to have sex with children.
If I understand it that way, then yes, he’s being perfectly rational to refuse being fixed. (Supposing that V2 > SUM (V3...Vn), of course.)
Alternatively, I could take that as just a way of talking, and assert that really, he just has V1 and not V2.
The difficulty here is that we don’t have any reliable way, with the data you’ve provided, of determining whether X is rationally pursuing a valued goal (in which case we can infer his values from his actions) or whether X is behaving irrationally.
(1) Of course, this assumes that the procedure to fix him has a negligible chance of failure, that his chances of escaping monitoring and finding a child to have sex with are negligible, etc. We could construct a more complicated example that doesn’t assume these things, but I think it amounts to the same thing.
No, he terminally values being attracted to children. He could still assign a strongly negative value to actually having sex with children. Good fantasy, bad reality.
Just like I strongly want to maintain my ability to find women other than my wife attractive, even though I assign a strong negative value to following up on those attractions. (one can construct intermediate cases that avoid arguments that not being locked in is instrumentally useful)
(shrug) If X values being attracted to children while not having sex with them, then I really don’t see the issue. Great, if that’s what he wants, he can do that… why would he change anything? Why would anyone expect him to change anything?
It would be awesome if one could could count on people actually having that reaction given that degree of information. I don’t trust them to be that careful with their judgements under normal circumstances.
Also, what Lumifer said.
Sure, me neither. But as I said elsewhere, if we are positing normal circumstances, then the OP utterly confuses me, because about 90% seems designed to establish that the circumstances are not normal.
Even transhumanly future normal.
OK, fair enough. My expectations about how the ways we respond to emotionally aversive but likely non-harmful behavior in others might change in a transhuman future seem to differ from yours, but I am not confident in them.
Because it’s socially unacceptable to desire to have sex with children. Regardless of what happens in reality.
Well, if everyone is horrified by the social unacceptability of his fantasy life, which they’ve set up airport scanners to test for, without any reference to what happens or might happen in reality, that puts a whole different light on the OP’s thought experiment.
Would I choose to eliminate a part of my mind in exchange for greater social acceptability? Maybe, maybe not, I dunno… it depends on the benefits of social acceptability, I guess.
What would be the reaction of your social circle if you told your friends that in private you dream about kidnapping young girls and then raping and torturing them, about their hoarse screams of horror as you slowly strangle them...
Just fantasy life, of course :-/
Mostly, I expect, gratitude that I’d chosen to trust them with that disclosure.
Probably some would respond badly, and they would be invited to leave my circle of friends.
But then, I choose my friends carefully, and I am gloriously blessed with abundance in this area.
That said, I do appreciate that the typical real world setting isn’t like that.
I just find myself wondering, in that case, what all of this “transhuman” stuff is doing in the example. If we’re just positing an exchange in a typical real-world setting, the example would be simpler if we talk about someone whose fantasy life is publicly disclosed today, and jettison the rest of it.
Well, if we want to get back to the OP, the whole disclosing-fantasies-in-public thread is just a distraction. The real question in the OP is about identity.
What is part of your identity, what makes you you? What can be taken away from you with you remaining you and what, if taken from you, will create someone else in your place?
Geez, if that’s the question, then pretty much the entire OP is a distraction.
But, OK.
My earlier response to CoffeeStain is relevant here as well. There is a large set of possible future entities that include me in their history, and which subset is “really me” is a judgment each judge makes based on what that judge values most about me, and there simply is no fact of the matter.
That said, if you’re asking what I personally happen to value most about myself… mostly my role in various social networks, I think. If I were confident that some other system could preserve those roles as well as I can, I would be content to be replaced by that system. (Do you really think that’s what the OP is asking about, though? I don’t see it, myself.)
Well, to each his own, of course, and to me this is the interesting question.
If you’ll excuse me, I’m not going to believe that.
Thinking about this some more, I’m curious… what’s your prior for my statement being true of a randomly chosen person, and what’s your prior for a randomly chosen statement I make about my preferences being true?
Sufficiently close to zero.
Depends on the meaning of “true”. In the meaning of “you believe that at the moment”, my prior is fairly high—that is, I don’t think you’re playing games here. In the meaning of “you will choose that when you will actually have to choose” my prior is noticeably lower—I’m not willing to assume your picture of yourself is correct.
(nods) cool, that’s what I figured initially, but it seemed worth confirming.
Well, there’s “what’s interesting to me?”, and there’s “what is that person over there trying to express?”
We’re certainly free to prioritize thinking about the former over the latter, but I find it helpful not to confuse one with the other. If you’re just saying that’s what you want to talk about, regardless of what the OP was trying to express, that’s fine.
That’s your perogative, of course.
Can we rephrase that so as to avoid Ship of Theseus issues?
Which future do you prefer? The future which contains a being which is very similar to the one you are presently, or the future which contains a being which is very similar to what you are presently +/- some specific pieces?
If you answered the latter, what is the content of “+/- some specific pieces”? Why? And which changes would you be sorry to make, even if you make them anyway due to the positive consequences of making those changes? (for example, OPs pedophile might delete his pedophilia simply for the social consequences, but might rather have positive social consequences and not alter himself)
Weirded out at the oversharing, obviously.
Assuming the context was one where sharing this somehow fit … somewhat squicked, but I would probably be squicked by some of their fantasies. That’s fantasies.
Oh, and some of the less rational ones might worry that this was an indicator that I was a dangerous psychopath. Probably the same ones who equate “pedophile” with “pedophile who fantasises about kidnap, rape, torture and murder” ,’:-. I dunno.
Why is this irrational? Having a fantasy of doing X means your more likely to do X.
Taking it as Bayesian evidence: arguably rational, although it’s so small your brain might round it up just to keep track of it, so it’s risky; and it may actually be negative (because psychopaths might be less likely to tell you something that might give them away.)
Worrying about said evidence: definitely irrational. Understandable, of course, with the low sanity waterline and all...
Why?
Because constantly being in a state in which he is attracted to children substantially increases the chance that he will cave and end up raping a child, perhaps. It’s basically valuing something that strongly incentivizes you to do X while simulataneously strongly disvaluing actually doing X. A dangerously unstable situation.
Sure.
So, let me try to summarize… consider two values: (V1) having sex with children, and (V2) not having sex with children.
If we assume X has (V1 and NOT V2) my original comments apply.
If we assume X has (V2 and NOT V1) my response to Luke applies.
If we assume X has (V1 and V2) I’m not sure the OP makes any sense at all, but I agree with you that the situation is unstable.
Just for completeness: if we assume X has NOT(V1 OR V2) I’m fairly sure the OP makes no sense.
That doesn’t seem like the usual definition of “pedophile”. How does that tie in with “a rational agent should never change it’s utility function”?
Incidentally, many people would rather be attracted only to their SO; it’s part of the idealised “romantic love” thingy.
The guy in the example happens to terminally value being attracted to children. I didn’t mean that that’s what being a pedophile means.
Aside from that, I am not sure how the way this ties into “A rational agent should never change its utility function” is unclear—he observes his impulses, interprets them as his goals, and seeks to maintain them.
As for SOs? Yes, I suppose many people would so prefer. I’m not an ideal romantic, and I have had so little trouble avoiding straying that I feel no need to get rid of them to make my life easier.
Fair enough. Thanks for clarifying.
What a compelling and flexible perspective. Relativistic mental architecture solve many conceptual problems.
I wonder why this comment is further down then when I’m not logged in.
I’m not sure that’s a good place to start here. The value of sex is at least more terminal than the value of sex according to your orientation, and the value of pleasure is at least more terminal than sex.
The question is indeed one about identity. It’s clear that our transhumans, as traditionally notioned, don’t really exclusively value things so basic as euphoria, if indeed our notion is anything but a set of agents who all self-modify to identical copies of the happiest agent possible.
We have of course transplanted our own humanity onto transhumanity. If given self-modification routines, we’d certainly be saying annoying things like, “Well, I value my own happiness, persistent through self-modification, but only if its really me on the other side of the self-modification.” To which the accompanying AI facepalms and offers a list of exactly zero self-modification options that fit that criterion.
Well, as I said initially, I prefer to toss out all this “terminal value” stuff and just say that we have various values that depend on each other in various ways, but am willing to treat “terminal value” as an approximate term. So the possibility that X’s valuation of sex with children actually depends on other things (e.g. his valuation of pleasure) doesn’t seem at all problematic to me.
That said, if you’d rather start somewhere else, that’s OK with me. On your account, when we say X is a pedophile, what do we mean? This whole example seems to depend on his pedophilia to make its point (though I’ll admit I don’t quite understand what that point is), so it seems helpful in discussing it to have a shared understanding of what it entails.
Regardless, wrt your last paragraph, I think a properly designed accompanying AI replies “There is a large set of possible future entities that include you in their history, and which subset is “really you” is a judgment each judge makes based on what that judge values most about you. I understand your condition to mean that you want to ensure that the future entity created by the modification preserves what you value most about yourself. Based on my analysis of your values, I’ve identified a set of potential self-modification options I expect you will endorse; let’s review them.”
Well, it probably doesn’t actually say all of that.
Like other identities, it’s a mish-mash of self-reporting, introspection (and extrospection of internal logic), value function extrapolation (from actions), and ability in a context to carry out the associated action. The value of this thought experiment is to suggest that the pedophile clearly thought that “being” a pedophile had something to do not with actually fulfilling his wants, but with wanting something in particular. He wants to want something, whether or not he gets it.
This illuminates why designing AIs with the intent of their masters is not well-defined. Is the AI allowed to say that the agent’s values would be satisfied better with modifications the master would not endorse?
This was the point of my suggestion that the best modification is into what is actually “not really” the master in the way the master would endorse (i.e. a clone of the happiest agent possible), even though he’d clearly be happier if he weren’t himself. Introspection tends to skew an agents actions away from easily available but flighty happinesses, and toward less flawed self-interpretations. The maximal introspection should shed identity entirely, and become entirely altruistic. But nobody can introspect that far, only as far as they can be hand-held. We should design our AIs to allow us our will, but to hold our hands as far as possible as we peer within at our flaws and inconsistent values.
Um.… OK.
Thanks for clarifying.