Well, as I said initially, I prefer to toss out all this “terminal value” stuff and just say that we have various values that depend on each other in various ways, but am willing to treat “terminal value” as an approximate term. So the possibility that X’s valuation of sex with children actually depends on other things (e.g. his valuation of pleasure) doesn’t seem at all problematic to me.
That said, if you’d rather start somewhere else, that’s OK with me. On your account, when we say X is a pedophile, what do we mean? This whole example seems to depend on his pedophilia to make its point (though I’ll admit I don’t quite understand what that point is), so it seems helpful in discussing it to have a shared understanding of what it entails.
Regardless, wrt your last paragraph, I think a properly designed accompanying AI replies “There is a large set of possible future entities that include you in their history, and which subset is “really you” is a judgment each judge makes based on what that judge values most about you. I understand your condition to mean that you want to ensure that the future entity created by the modification preserves what you value most about yourself. Based on my analysis of your values, I’ve identified a set of potential self-modification options I expect you will endorse; let’s review them.”
Well, it probably doesn’t actually say all of that.
On your account, when we say X is a pedophile, what do we mean?
Like other identities, it’s a mish-mash of self-reporting, introspection (and extrospection of internal logic), value function extrapolation (from actions), and ability in a context to carry out the associated action. The value of this thought experiment is to suggest that the pedophile clearly thought that “being” a pedophile had something to do not with actually fulfilling his wants, but with wanting something in particular. He wants to want something, whether or not he gets it.
This illuminates why designing AIs with the intent of their masters is not well-defined. Is the AI allowed to say that the agent’s values would be satisfied better with modifications the master would not endorse?
This was the point of my suggestion that the best modification is into what is actually “not really” the master in the way the master would endorse (i.e. a clone of the happiest agent possible), even though he’d clearly be happier if he weren’t himself. Introspection tends to skew an agents actions away from easily available but flighty happinesses, and toward less flawed self-interpretations. The maximal introspection should shed identity entirely, and become entirely altruistic. But nobody can introspect that far, only as far as they can be hand-held. We should design our AIs to allow us our will, but to hold our hands as far as possible as we peer within at our flaws and inconsistent values.
Well, as I said initially, I prefer to toss out all this “terminal value” stuff and just say that we have various values that depend on each other in various ways, but am willing to treat “terminal value” as an approximate term. So the possibility that X’s valuation of sex with children actually depends on other things (e.g. his valuation of pleasure) doesn’t seem at all problematic to me.
That said, if you’d rather start somewhere else, that’s OK with me. On your account, when we say X is a pedophile, what do we mean? This whole example seems to depend on his pedophilia to make its point (though I’ll admit I don’t quite understand what that point is), so it seems helpful in discussing it to have a shared understanding of what it entails.
Regardless, wrt your last paragraph, I think a properly designed accompanying AI replies “There is a large set of possible future entities that include you in their history, and which subset is “really you” is a judgment each judge makes based on what that judge values most about you. I understand your condition to mean that you want to ensure that the future entity created by the modification preserves what you value most about yourself. Based on my analysis of your values, I’ve identified a set of potential self-modification options I expect you will endorse; let’s review them.”
Well, it probably doesn’t actually say all of that.
Like other identities, it’s a mish-mash of self-reporting, introspection (and extrospection of internal logic), value function extrapolation (from actions), and ability in a context to carry out the associated action. The value of this thought experiment is to suggest that the pedophile clearly thought that “being” a pedophile had something to do not with actually fulfilling his wants, but with wanting something in particular. He wants to want something, whether or not he gets it.
This illuminates why designing AIs with the intent of their masters is not well-defined. Is the AI allowed to say that the agent’s values would be satisfied better with modifications the master would not endorse?
This was the point of my suggestion that the best modification is into what is actually “not really” the master in the way the master would endorse (i.e. a clone of the happiest agent possible), even though he’d clearly be happier if he weren’t himself. Introspection tends to skew an agents actions away from easily available but flighty happinesses, and toward less flawed self-interpretations. The maximal introspection should shed identity entirely, and become entirely altruistic. But nobody can introspect that far, only as far as they can be hand-held. We should design our AIs to allow us our will, but to hold our hands as far as possible as we peer within at our flaws and inconsistent values.
Um.… OK.
Thanks for clarifying.