Not only is “actual preferences” ill-defined, but so is “accurately represent.” So let me try and operationalize this a bit.
We have someone with a set of preferences that turn out to be mutually exclusive in the world they live in. We can in principle create a procedure for sorting their preferences into categories such that each preference falls into at least one category and all the preferences in a category can (at least in principle) be realized in that world at the same time. So suppose we’ve done this, and it turns out they have two categories A and B, where A includes those preferences Cato describes as “a fit of melancholy.”
I would say that their “actual” preferences = (A + B). It’s not realizable in the world, but it’s nevertheless their preference. So your question can be restated: does A or B more accurately represent (A + B)?
There doesn’t seem to be any nonarbitrary way to measure the extent of A, B, and (A+B) to determine this directly. I mean, what would you measure? The amount of brain matter devoted to representing all three? The number of lines of code required to represent them in some suitably powerful language?
One common approach is to look at their revealed preferences as demonstrated by the choices they make. Given an A-satisfying and a B-satisfying choice that are otherwise equivalent (and constructing such an exercise is left as an exercise to the class), which do they choose? This is tricky in this case, since the whole premise here is that their revealed preferences are inconsistent over time, but you could in principle measure their revealed preferences at multiple different times and weight the results accordingly (assuming for simplicity that all preference-moments are identical in weight).
When you were done doing all of that, you’d know whether A > B, B>A, or A=B.
It’s not in the least clear to me what good knowing that would do you. I suspect that this sort of analysis is not actually what you had in mind.
A more common approach is to decide which of A and B I endorse, and to assert that the one I endorse is his actual preference. E.g., if I endorse choosing to live over choosing to die, then I endorse B, and I therefore assert that B is his actual preference. But this is not emotionally satisfying when I say it baldly like that. Fortunately, there are all kinds of ways to conceal the question-begging nature of this approach, even from oneself.
I would instead ask “What preferences would this agent have, in a counterfactual universe in which they were fully-informed and rational but otherwise identical?”.
“The problem with trying to extrapolate what a person would want with perfect information is, perfect information is a lot of fucking information. The human brain can’t handle that much information, so if you want your extrapolatory homunculus to do anything but scream and die like someone put into the Total Perspective Vortex, you need to enhance its information processing capabilities. And once you’ve reached that point, why not improve its general intelligence too, so it can make better decisions? Maybe teach it a little bit about heuristics and biases, to help it make more rational choices. And you know it wouldn’t really hate blacks except for those pesky emotions that get in the way, so lets throw those out the window. You know what, let’s just replace it with a copy of me, I want all the cool things anyway.
Truly, the path of a utilitarian is a thorny one. That’s why I prefer a whimsicalist moral philosophy. Whimsicalism is a humanism!”
The sophisticated reader presented with a slippery slope argument like that one first checks whether there really is a force driving us in a particular direction, that makes the metaphorical terrain a slippery slope rather than just a slippery field, and secondly they check whether there are any defensible points of cleavage in the metaphorical terrain that could be used to build a fence and stop the slide at some point.
The slippery slope argument you are quoting, when uprooted and placed in this context, seems to me to fail both tests. There’s no reason at all to descend progressively into the problems described, and even if there was you could draw a line and say “we’re just going to inform our mental model of any relevant facts we know that it doesn’t, and fix any mental processes our construct has that are clearly highly irrational”.
You haven’t given us a link but going by the principle of charity I imagine that what you’ve done here is take a genuine problem with building a weakly God-like friendly AI and tried to transplant the argument into the context of intervening in a suicide attempt, where it doesn’t belong.
But which of these more accurately represents his “actual preferences”, to the extent that such a thing even exists?
Not only is “actual preferences” ill-defined, but so is “accurately represent.” So let me try and operationalize this a bit.
We have someone with a set of preferences that turn out to be mutually exclusive in the world they live in.
We can in principle create a procedure for sorting their preferences into categories such that each preference falls into at least one category and all the preferences in a category can (at least in principle) be realized in that world at the same time.
So suppose we’ve done this, and it turns out they have two categories A and B, where A includes those preferences Cato describes as “a fit of melancholy.”
I would say that their “actual” preferences = (A + B). It’s not realizable in the world, but it’s nevertheless their preference. So your question can be restated: does A or B more accurately represent (A + B)?
There doesn’t seem to be any nonarbitrary way to measure the extent of A, B, and (A+B) to determine this directly. I mean, what would you measure? The amount of brain matter devoted to representing all three? The number of lines of code required to represent them in some suitably powerful language?
One common approach is to look at their revealed preferences as demonstrated by the choices they make. Given an A-satisfying and a B-satisfying choice that are otherwise equivalent (and constructing such an exercise is left as an exercise to the class), which do they choose? This is tricky in this case, since the whole premise here is that their revealed preferences are inconsistent over time, but you could in principle measure their revealed preferences at multiple different times and weight the results accordingly (assuming for simplicity that all preference-moments are identical in weight).
When you were done doing all of that, you’d know whether A > B, B>A, or A=B.
It’s not in the least clear to me what good knowing that would do you. I suspect that this sort of analysis is not actually what you had in mind.
A more common approach is to decide which of A and B I endorse, and to assert that the one I endorse is his actual preference. E.g., if I endorse choosing to live over choosing to die, then I endorse B, and I therefore assert that B is his actual preference. But this is not emotionally satisfying when I say it baldly like that. Fortunately, there are all kinds of ways to conceal the question-begging nature of this approach, even from oneself.
I would instead ask “What preferences would this agent have, in a counterfactual universe in which they were fully-informed and rational but otherwise identical?”.
Quoting a forum post from a couple years ago...
“The problem with trying to extrapolate what a person would want with perfect information is, perfect information is a lot of fucking information. The human brain can’t handle that much information, so if you want your extrapolatory homunculus to do anything but scream and die like someone put into the Total Perspective Vortex, you need to enhance its information processing capabilities. And once you’ve reached that point, why not improve its general intelligence too, so it can make better decisions? Maybe teach it a little bit about heuristics and biases, to help it make more rational choices. And you know it wouldn’t really hate blacks except for those pesky emotions that get in the way, so lets throw those out the window. You know what, let’s just replace it with a copy of me, I want all the cool things anyway.
Truly, the path of a utilitarian is a thorny one. That’s why I prefer a whimsicalist moral philosophy. Whimsicalism is a humanism!”
The sophisticated reader presented with a slippery slope argument like that one first checks whether there really is a force driving us in a particular direction, that makes the metaphorical terrain a slippery slope rather than just a slippery field, and secondly they check whether there are any defensible points of cleavage in the metaphorical terrain that could be used to build a fence and stop the slide at some point.
The slippery slope argument you are quoting, when uprooted and placed in this context, seems to me to fail both tests. There’s no reason at all to descend progressively into the problems described, and even if there was you could draw a line and say “we’re just going to inform our mental model of any relevant facts we know that it doesn’t, and fix any mental processes our construct has that are clearly highly irrational”.
You haven’t given us a link but going by the principle of charity I imagine that what you’ve done here is take a genuine problem with building a weakly God-like friendly AI and tried to transplant the argument into the context of intervening in a suicide attempt, where it doesn’t belong.