(I have no idea whether the following is of any interest to anyone on LW. I wrote it mostly to clarify my own confusions, then polished it a bit out of habit. If at least a few folks think it’s potentially interesting, I’ll finish cleaning it up and post it for real.)
I’ve been thinking for a while about the distinction between instrumental and terminal values, because the places it comes up in the Sequences (1) are places where I’ve bogged down in reading them. And I am concluding that it may be a misleading distinction.
EY presents a toy example here, and I certainly agree that failing to distinguish between (V1) “wanting chocolate” and (V2) “wanting to drive to the store” is a fallacy, and a common one, and an important one to dissolve. And the approach he takes to dissolving it is sound, as far as it goes: consider the utility attached to each outcome, consider the probability of each outcome given possible actions, then choose the actions that maximize expected utility.
But in that example, V1 and V2 aren’t just different values, they are hierarchically arranged values… V2 depends on V1, such that if their causal link is severed (e.g., driving to the store stops being a way to get chocolate) then it stops being sensible to consider V2 a goal at all. In other words, the utility of V2 is zero within this toy example, and we just take the action with the highest probability of V1 (which may incidentally involve satisfying V2, but that’s just a path, not a goal).
Of course, we know wanting chocolate isn’t a real terminal value outside of that toy example; it depends on other things. But by showing V1 as the stable root of a toy network, we suggest that in principle there are real terminal values, and a concerted philosophical effort by smart enough minds will identify them. Which dovetails with the recurring(1) idea that FAI depends on this effort because uncovering humanity’s terminal values is a necessary step along the way to implementing them, as per Fun Theory.
But just because values exist in a mutually referential network doesn’t mean they exist in a hierarchy with certain values at the root. Maybe I have (V3) wanting to marry my boyfriend and (V4) wanting to make my boyfriend happy. Here, too, these are different values, and failing to distinguish between them is a problem, and there’s a causal link that matters. But it’s not strictly hierarchical: if the causal link is severed (e.g., marrying my boyfriend isn’t a way to make him happy) I still have both goals. Worse, if the causal link is reversed (e.g., marrying my boyfriend makes him less happy, because he has V5: don’t get married), I still have both goals. Now what?
Well, one answer is to treat V3 and V4 (and V5, if present) as instrumental goals of some shared (as yet undiscovered) terminal goal (V6). But failing that, all that’s left is to work out a mutually acceptable utility distribution that is suboptimal along one or more of (V2-V5) and implement the associated actions. You can’t always get what you want. (2)
Well and good; nobody has claimed otherwise.
But, again, the Metaethics and Fun Sequences seem to depend(1) on a shared as-yet-undiscovered terminal goal that screens off the contradictions in our instrumental goals. If instead it’s instrumental links throughout the network, and what seem like terminal goals are merely those instrumental goals at the edge of whatever subset of the network we’re representing at the moment, and nothing prevents even our post-Singularity descendents from having mutually inhibitory goals… well, then maybe humanity’s values simply aren’t coherent; maybe some of our post-Singularity descendents will be varelse to one another.
So, OK… suppose we discover that, and the various tribes of humanity consequently separate. After we’re done (link)throwing up on the sand(/link)(flawed_utopia), what do we do then?
Perhaps we and our AIs need a pluralist metaethic(3), one that allows us to treat other beings who don’t share our values—including, perhaps, the (link)Babykillers and the SHFP(/link)(see SHFP story) and the (link)Pebblesorters(/link)(see pebblesorters), as well as the other tribes of post-Singularity humans—as beings whose preferences have moral weight?
=============
(1) The whole (link)meta-ethics Sequence(/link)(see meta-ethics Sequence) is shot through with the idea that compromise on instrumental values is possible given shared terminal values, even if it doesn’t seem that way at first, so humans can coexist and extracting a “coherent volition” of humanity is possible, but entities with different terminal values are varelse: there’s just no point of compatibility.
The recurring message is that any notion of compromise on terminal values is just wrongheaded, which is why the (link)SHFP’s solution to the Babykiller problem(/link)(see SHFP story) is presented as flawed, as is viewing the (link)Pebblesorters(/link)($pebblesorters) as having a notion of right and wrong deserving of moral consideration. Implementing our instrumental values can leave us (link)tragically happy(/link)(see flawed utopia), on this view, because our terminal values are the ones that really matter.
More generally, LW’s formulation of post-Singularity ethics (aka (link)Fun(/link)(see fun Sequence)) seems to depend on this distinction. The idea of a reflectively stable shared value system that can survive a radical alteration of our environment (e.g, the ability to create arbitrary numbers of systems with the same moral weight that I have, or even mere immortality) is pretty fundamental, not just for the specific Fun Theory proposed, but for any fixed notion of what humans would find valuable after such a transition. If I don’t have a stable value system in the first place, or if my stable values are fundamentally incompatible with yours, then the whole enterprise is a non-starter… and clearly our instrumental values are neither stable nor shared. So the hope that our terminal values are stable and shared is important.
This distinction also may underlie the warning against (link)messing with emotions(/link)(see emotions)… the idea seems to be that messing with emotions, unlike messing with everything else, risks affecting my terminal values. (I may be pounding that screw with my hammer, though; I’m still not confident I understand why EY thinks messing with everything else is so much safer than messing with emotions.)
(2) I feel I should clarify here that my husband and I are happily married; this is entirely a hypothetical example. Also, my officemate recently brought me chocolate without my even having to leave my cube, let alone drive anywhere. Truly, I live a blessed life.
(3) Mind you, I don’t have one handy. But the longest journey begins, not with a single step, but with the formation of the desire to get somewhere.
I came here from the pedophile discussion. This comment interests me more so I’m replying to it.
To preface, here is what I currently think: Preferences are in a hierarchy. You make a list of possible universes (branching out as a result of your actions) and choose the one you prefer the most—so I’m basically coming from VNM. The terminal value lies in which universe you choose. The instrumental stuff lies in which actions you take to get there.
So I’m reading your line of thought...
But just because values exist in a mutually referential network doesn’t mean they exist in a hierarchy with certain values at the root. Maybe I have (V3) wanting to marry my boyfriend and (V4) wanting to make my boyfriend happy. Here, too, these are different values, and failing to distinguish between them is a problem, and there’s a causal link that matters. But it’s not strictly hierarchical: if the causal link is severed (e.g., marrying my boyfriend isn’t a way to make him happy) I still have both goals. Worse, if the causal link is reversed (e.g., marrying my boyfriend makes him less happy, because he has V5: don’t get married), I still have both goals. Now what?
I’m not sure how this line of thought suggests that terminal values do not exist. It simply suggests that some values are terminal, while others are instrumental. To simplify, you can compress all these terminal goals into a single goal called “Fulfill my preferences”, and do utilitarian game theory from there. This need not involve arranging the preferences in any hierarchy—it only involves balancing them against each other. Speaking of multiple terminal values just decomposes whatever function you use to pick your favorite universe into multiple functions.
maybe humanity’s values simply aren’t coherent; maybe some of our post-Singularity descendents will be varelse to one another.
This seems unrelated to the surrounding points. Of course two agents can diverge—no one said that humans intrinsically shared the same preferences.
(Of course, platonic agents don’t exist, living things don’t actually have VNM preferences, etc, etc)
You might enjoy Arrow’s impossibility theorem though—it seems to relate to your concerns. (i’ts relevant for questions like: Can we compromise between multiple agents? What happens if we conceptualize one human as multiple agents?)
...treating preferences as identifying a sort order for universes.
...treating “values” and “preferences” and “goals” as more or less interchangeable terms.
...aggregating multiple goals into a single complex “fulfill my preferences (insofar as they are not mutually exclusive)” goal, at least in principle. (To the extent that we can actually do this, the fact that preferences might have hierarchical dependencies where satisfying preference A also partially satisfies preference B becomes irrelevant; all of that is factored into the complex goal. Of course, actually doing this might prove too complicated for any given computationally bounded mind,so such dependencies might still be important in practice.)
...balancing preferences against one another to create some kind of weighted aggregate in cases where they are mutually exclusive, in principle. (As above, that’s not to say in practice that all minds can actually do that. Different strategies may be appropriate for less capable minds.)
...drawing a distinction between which universe(s) I choose, on the one hand, and what steps I take to get there, on the other. (And if we want to refer to steps as “instrumental values” and universes as “terminal values”, that’s OK with me. That said, what I see people doing a lot is mis-identifying steps as universes, simply because we haven’t thought enough about the internal structure and intended results of those steps, so in practice I am skeptical of claims about “terminal values.” In practice, I treat the term as referring to instrumental values I haven’t yet thought enough about to understand in detail.)
no one said that humans intrinsically shared the same preferences.
I’m not sure that’s true. IIRC, a lot of the Fun Theory Sequence and the stuff around CEV sounded an awful lot like precisely this claim. That said, it’s been three years, and I don’t remember details. In any case, if we agree that humans don’t necessarily share the same preferences, that’s cool with me, regardless of what someone else might or might not have said.
==
(Footnotes to the above: formatting on them got screwy)
(1) The whole meta-ethics Sequence is shot through with the idea that compromise on instrumental values is possible given shared terminal values, even if it doesn’t seem that way at first, so humans can coexist and extracting a “coherent volition” of humanity is possible, but entities with different terminal values are varelse: there’s just no point of compatibility.
The recurring message is that any notion of compromise on terminal values is just wrongheaded, which is why the SHFP’s solution to the Babykiller problem is presented as flawed, as is viewing the Pebblesorters as having a notion of right and wrong deserving of moral consideration. Implementing our instrumental values can leave us tragically happy, on this view, because our terminal values are the ones that really matter.
More generally, LW’s formulation of post-Singularity ethics (aka Fun) seems to depend on this distinction. The idea of a reflectively stable shared value system that can survive a radical alteration of our environment (e.g, the ability to create arbitrary numbers of systems with the same moral weight that I have, or even mere immortality) is pretty fundamental, not just for the specific Fun Theory proposed, but for any fixed notion of what humans would find valuable after such a transition. If I don’t have a stable value system in the first place, or if my stable values are fundamentally incompatible with yours, then the whole enterprise is a non-starter… and clearly our instrumental values are neither stable nor shared. So the hope that our terminal values are stable and shared is important.
This distinction also may underlie the warning against messing with emotions… the idea seems to be that messing with emotions, unlike messing with everything else, risks affecting my terminal values. (I may be pounding that screw with my hammer, though; I’m still not confident I understand why EY thinks messing with everything else is so much safer than messing with emotions.)
(2) I feel I should clarify here that my husband and I are happily married; this is entirely a hypothetical example. Also, my officemate recently brought me chocolate without my even having to leave my cube, let alone drive anywhere. Truly, I live a blessed life.
(3) Mind you, I don’t have one handy. But the longest journey begins, not with a single step, but with the formation of the desire to get somewhere.
(I have no idea whether the following is of any interest to anyone on LW. I wrote it mostly to clarify my own confusions, then polished it a bit out of habit. If at least a few folks think it’s potentially interesting, I’ll finish cleaning it up and post it for real.)
I’ve been thinking for a while about the distinction between instrumental and terminal values, because the places it comes up in the Sequences (1) are places where I’ve bogged down in reading them. And I am concluding that it may be a misleading distinction.
EY presents a toy example here, and I certainly agree that failing to distinguish between (V1) “wanting chocolate” and (V2) “wanting to drive to the store” is a fallacy, and a common one, and an important one to dissolve. And the approach he takes to dissolving it is sound, as far as it goes: consider the utility attached to each outcome, consider the probability of each outcome given possible actions, then choose the actions that maximize expected utility.
But in that example, V1 and V2 aren’t just different values, they are hierarchically arranged values… V2 depends on V1, such that if their causal link is severed (e.g., driving to the store stops being a way to get chocolate) then it stops being sensible to consider V2 a goal at all. In other words, the utility of V2 is zero within this toy example, and we just take the action with the highest probability of V1 (which may incidentally involve satisfying V2, but that’s just a path, not a goal).
Of course, we know wanting chocolate isn’t a real terminal value outside of that toy example; it depends on other things. But by showing V1 as the stable root of a toy network, we suggest that in principle there are real terminal values, and a concerted philosophical effort by smart enough minds will identify them. Which dovetails with the recurring(1) idea that FAI depends on this effort because uncovering humanity’s terminal values is a necessary step along the way to implementing them, as per Fun Theory.
But just because values exist in a mutually referential network doesn’t mean they exist in a hierarchy with certain values at the root. Maybe I have (V3) wanting to marry my boyfriend and (V4) wanting to make my boyfriend happy. Here, too, these are different values, and failing to distinguish between them is a problem, and there’s a causal link that matters. But it’s not strictly hierarchical: if the causal link is severed (e.g., marrying my boyfriend isn’t a way to make him happy) I still have both goals. Worse, if the causal link is reversed (e.g., marrying my boyfriend makes him less happy, because he has V5: don’t get married), I still have both goals. Now what?
Well, one answer is to treat V3 and V4 (and V5, if present) as instrumental goals of some shared (as yet undiscovered) terminal goal (V6). But failing that, all that’s left is to work out a mutually acceptable utility distribution that is suboptimal along one or more of (V2-V5) and implement the associated actions. You can’t always get what you want. (2)
Well and good; nobody has claimed otherwise.
But, again, the Metaethics and Fun Sequences seem to depend(1) on a shared as-yet-undiscovered terminal goal that screens off the contradictions in our instrumental goals. If instead it’s instrumental links throughout the network, and what seem like terminal goals are merely those instrumental goals at the edge of whatever subset of the network we’re representing at the moment, and nothing prevents even our post-Singularity descendents from having mutually inhibitory goals… well, then maybe humanity’s values simply aren’t coherent; maybe some of our post-Singularity descendents will be varelse to one another.
So, OK… suppose we discover that, and the various tribes of humanity consequently separate. After we’re done (link)throwing up on the sand(/link)(flawed_utopia), what do we do then?
Perhaps we and our AIs need a pluralist metaethic(3), one that allows us to treat other beings who don’t share our values—including, perhaps, the (link)Babykillers and the SHFP(/link)(see SHFP story) and the (link)Pebblesorters(/link)(see pebblesorters), as well as the other tribes of post-Singularity humans—as beings whose preferences have moral weight?
=============
(1) The whole (link)meta-ethics Sequence(/link)(see meta-ethics Sequence) is shot through with the idea that compromise on instrumental values is possible given shared terminal values, even if it doesn’t seem that way at first, so humans can coexist and extracting a “coherent volition” of humanity is possible, but entities with different terminal values are varelse: there’s just no point of compatibility.
The recurring message is that any notion of compromise on terminal values is just wrongheaded, which is why the (link)SHFP’s solution to the Babykiller problem(/link)(see SHFP story) is presented as flawed, as is viewing the (link)Pebblesorters(/link)($pebblesorters) as having a notion of right and wrong deserving of moral consideration. Implementing our instrumental values can leave us (link)tragically happy(/link)(see flawed utopia), on this view, because our terminal values are the ones that really matter.
More generally, LW’s formulation of post-Singularity ethics (aka (link)Fun(/link)(see fun Sequence)) seems to depend on this distinction. The idea of a reflectively stable shared value system that can survive a radical alteration of our environment (e.g, the ability to create arbitrary numbers of systems with the same moral weight that I have, or even mere immortality) is pretty fundamental, not just for the specific Fun Theory proposed, but for any fixed notion of what humans would find valuable after such a transition. If I don’t have a stable value system in the first place, or if my stable values are fundamentally incompatible with yours, then the whole enterprise is a non-starter… and clearly our instrumental values are neither stable nor shared. So the hope that our terminal values are stable and shared is important.
This distinction also may underlie the warning against (link)messing with emotions(/link)(see emotions)… the idea seems to be that messing with emotions, unlike messing with everything else, risks affecting my terminal values. (I may be pounding that screw with my hammer, though; I’m still not confident I understand why EY thinks messing with everything else is so much safer than messing with emotions.)
(2) I feel I should clarify here that my husband and I are happily married; this is entirely a hypothetical example. Also, my officemate recently brought me chocolate without my even having to leave my cube, let alone drive anywhere. Truly, I live a blessed life.
(3) Mind you, I don’t have one handy. But the longest journey begins, not with a single step, but with the formation of the desire to get somewhere.
I came here from the pedophile discussion. This comment interests me more so I’m replying to it.
To preface, here is what I currently think: Preferences are in a hierarchy. You make a list of possible universes (branching out as a result of your actions) and choose the one you prefer the most—so I’m basically coming from VNM. The terminal value lies in which universe you choose. The instrumental stuff lies in which actions you take to get there.
So I’m reading your line of thought...
I’m not sure how this line of thought suggests that terminal values do not exist. It simply suggests that some values are terminal, while others are instrumental. To simplify, you can compress all these terminal goals into a single goal called “Fulfill my preferences”, and do utilitarian game theory from there. This need not involve arranging the preferences in any hierarchy—it only involves balancing them against each other. Speaking of multiple terminal values just decomposes whatever function you use to pick your favorite universe into multiple functions.
This seems unrelated to the surrounding points. Of course two agents can diverge—no one said that humans intrinsically shared the same preferences.
(Of course, platonic agents don’t exist, living things don’t actually have VNM preferences, etc, etc)
You might enjoy Arrow’s impossibility theorem though—it seems to relate to your concerns. (i’ts relevant for questions like: Can we compromise between multiple agents? What happens if we conceptualize one human as multiple agents?)
I’m on board with:
...treating preferences as identifying a sort order for universes.
...treating “values” and “preferences” and “goals” as more or less interchangeable terms.
...aggregating multiple goals into a single complex “fulfill my preferences (insofar as they are not mutually exclusive)” goal, at least in principle. (To the extent that we can actually do this, the fact that preferences might have hierarchical dependencies where satisfying preference A also partially satisfies preference B becomes irrelevant; all of that is factored into the complex goal. Of course, actually doing this might prove too complicated for any given computationally bounded mind,so such dependencies might still be important in practice.)
...balancing preferences against one another to create some kind of weighted aggregate in cases where they are mutually exclusive, in principle. (As above, that’s not to say in practice that all minds can actually do that. Different strategies may be appropriate for less capable minds.)
...drawing a distinction between which universe(s) I choose, on the one hand, and what steps I take to get there, on the other. (And if we want to refer to steps as “instrumental values” and universes as “terminal values”, that’s OK with me. That said, what I see people doing a lot is mis-identifying steps as universes, simply because we haven’t thought enough about the internal structure and intended results of those steps, so in practice I am skeptical of claims about “terminal values.” In practice, I treat the term as referring to instrumental values I haven’t yet thought enough about to understand in detail.)
I’m not sure that’s true. IIRC, a lot of the Fun Theory Sequence and the stuff around CEV sounded an awful lot like precisely this claim. That said, it’s been three years, and I don’t remember details. In any case, if we agree that humans don’t necessarily share the same preferences, that’s cool with me, regardless of what someone else might or might not have said.
And, yes, AIT is relevant.
== (Footnotes to the above: formatting on them got screwy)
(1) The whole meta-ethics Sequence is shot through with the idea that compromise on instrumental values is possible given shared terminal values, even if it doesn’t seem that way at first, so humans can coexist and extracting a “coherent volition” of humanity is possible, but entities with different terminal values are varelse: there’s just no point of compatibility.
The recurring message is that any notion of compromise on terminal values is just wrongheaded, which is why the SHFP’s solution to the Babykiller problem is presented as flawed, as is viewing the Pebblesorters as having a notion of right and wrong deserving of moral consideration. Implementing our instrumental values can leave us tragically happy, on this view, because our terminal values are the ones that really matter.
More generally, LW’s formulation of post-Singularity ethics (aka Fun) seems to depend on this distinction. The idea of a reflectively stable shared value system that can survive a radical alteration of our environment (e.g, the ability to create arbitrary numbers of systems with the same moral weight that I have, or even mere immortality) is pretty fundamental, not just for the specific Fun Theory proposed, but for any fixed notion of what humans would find valuable after such a transition. If I don’t have a stable value system in the first place, or if my stable values are fundamentally incompatible with yours, then the whole enterprise is a non-starter… and clearly our instrumental values are neither stable nor shared. So the hope that our terminal values are stable and shared is important.
This distinction also may underlie the warning against messing with emotions… the idea seems to be that messing with emotions, unlike messing with everything else, risks affecting my terminal values. (I may be pounding that screw with my hammer, though; I’m still not confident I understand why EY thinks messing with everything else is so much safer than messing with emotions.)
(2) I feel I should clarify here that my husband and I are happily married; this is entirely a hypothetical example. Also, my officemate recently brought me chocolate without my even having to leave my cube, let alone drive anywhere. Truly, I live a blessed life.
(3) Mind you, I don’t have one handy. But the longest journey begins, not with a single step, but with the formation of the desire to get somewhere.