Actually, this would be a strong argument against CEV. If individual humans commonly have incoherent values (which they do), there is no concrete reason to expect an automated extrapolation process to magically make them coherent. I’ve noticed that CEV proponents have a tendency to argue that the “thought longer, understood more” part of the process will somehow fix all objections of this sort, but given the complete lack of detail about how this process is supposed to work you might as well claim that the morality fairy is going to descend from the heavens and fix everything with a wave of her magic wand.
If you honestly think you can make an AI running CEV produce a coherent result that most people will approve of, it’s up to you to lay out concrete details of the algorithm that will make this happen. If you can’t do that, you’ve just conceded that you don’t actually have an answer for this problem. The burden of proof here is on the party proposing to gamble humanity’s future on a single act of software engineering, and the standard of evidence must be at least as high as that of any other safety-critical engineering.
If you honestly think you can make an AI running CEV produce a coherent result that most people will approve of,
Can you point me to some serious CEV proponents who argue that most people will approve of the results? I agree with you that this seems implausible, but it has never been clear to me that anyone serious actually asserts it.
FWIW, it has seemed to me from the beginning that the result of the CEV strategy would likely include at least something that makes me go “Um… really? I’m not entirely comfortable with that.” More generally, it seems unlikely to me that the system which best implements my values would feel comfortable or even acceptable to me, any more than the diet that best addresses my nutritional needs will necessarily conform to my aesthetic preferences about food.
More generally, it seems unlikely to me that the system which best implements my values would feel comfortable or even acceptable to me, any more than the diet that best addresses my nutritional needs will necessarily conform to my aesthetic preferences about food.
At first I thought this comparison was absolutely perfect, but I’m not really sure about that anymore. With a diet, you have other values to fall back on which might make your decision to adopt an aesthetically displeasing regimen still be something that you should do. With CEV, it’s not entirely clear to me in why I would want to prefer CEV values over my own current ones, so there’s no underlying reason for me to accept that I should accept CEV as the best implementation of my values.
That got a little complicated, and I’m not sure it’s exactly what I meant to say. Basically, I’m trying to say that while you may not be entirely comfortable with a better diet, you would still implement it for yourself since it’s a rational thing to do, whereas if you aren’t comfortable with implementing your own CEV, there’s no rational reason to compel you to do so.
there’s no underlying reason for me to accept that I should accept CEV as the best implementation of my values
Sure.
And even if I did accept CEV(humanity) as the best implementation of my values in principle, the question of what grounds I had to believe that any particular formally specified value system that was generated as output by some seed AI actually was CEV(humanity) is also worth asking.
Then again, there’s no underlying reason for me to accept that I should accept my current collection of habits and surface-level judgments and so forth as the best implementation of my values, either.
So, OK, at some point I’ve got a superhuman value-independent optimizer all rarin’ to go, and the only question is what formal specification of a set of values I ought to provide it with. So, what do I pick, and why do I pick it?
Then again, there’s no underlying reason for me to accept that I should accept my current collection of habits and surface-level judgments and so forth as the best implementation of my values, either.
Isn’t this begging the question? By ‘my values’ I’m pretty sure I literally mean ‘my current collection of habits and surface-level judgements and so forth’.
Could I have terminal values of which I am completely unaware in any way shape or form? How would I even recognize such things, and what reason do I have to prefer them over ‘my values’.
Well, you tell me: if I went out right now and magically altered the world to reflect your current collection of habits and surface-level judgments, do you think you would endorse the result?
I’m pretty sure I wouldn’t, if the positions were reversed.
I would want you to change the world so that what I want is actualized, yes. If you wouldn’t endorse an alteration of the world towards your current values, in what sense do you really ‘value’ said values?
I don’t know if you need to taboo it or not, but I’ll point out that I asked you a question that didn’t use that word, and you answered a question that did.
So perhaps a place to start is by answering the question I asked in the terms that I asked it?
Actually, this would be a strong argument against CEV. If individual humans commonly have incoherent values (which they do), there is no concrete reason to expect an automated extrapolation process to magically make them coherent. I’ve noticed that CEV proponents have a tendency to argue that the “thought longer, understood more” part of the process will somehow fix all objections of this sort, but given the complete lack of detail about how this process is supposed to work you might as well claim that the morality fairy is going to descend from the heavens and fix everything with a wave of her magic wand.
If you honestly think you can make an AI running CEV produce a coherent result that most people will approve of, it’s up to you to lay out concrete details of the algorithm that will make this happen. If you can’t do that, you’ve just conceded that you don’t actually have an answer for this problem. The burden of proof here is on the party proposing to gamble humanity’s future on a single act of software engineering, and the standard of evidence must be at least as high as that of any other safety-critical engineering.
Can you point me to some serious CEV proponents who argue that most people will approve of the results? I agree with you that this seems implausible, but it has never been clear to me that anyone serious actually asserts it.
FWIW, it has seemed to me from the beginning that the result of the CEV strategy would likely include at least something that makes me go “Um… really? I’m not entirely comfortable with that.” More generally, it seems unlikely to me that the system which best implements my values would feel comfortable or even acceptable to me, any more than the diet that best addresses my nutritional needs will necessarily conform to my aesthetic preferences about food.
At first I thought this comparison was absolutely perfect, but I’m not really sure about that anymore. With a diet, you have other values to fall back on which might make your decision to adopt an aesthetically displeasing regimen still be something that you should do. With CEV, it’s not entirely clear to me in why I would want to prefer CEV values over my own current ones, so there’s no underlying reason for me to accept that I should accept CEV as the best implementation of my values.
That got a little complicated, and I’m not sure it’s exactly what I meant to say. Basically, I’m trying to say that while you may not be entirely comfortable with a better diet, you would still implement it for yourself since it’s a rational thing to do, whereas if you aren’t comfortable with implementing your own CEV, there’s no rational reason to compel you to do so.
Sure.
And even if I did accept CEV(humanity) as the best implementation of my values in principle, the question of what grounds I had to believe that any particular formally specified value system that was generated as output by some seed AI actually was CEV(humanity) is also worth asking.
Then again, there’s no underlying reason for me to accept that I should accept my current collection of habits and surface-level judgments and so forth as the best implementation of my values, either.
So, OK, at some point I’ve got a superhuman value-independent optimizer all rarin’ to go, and the only question is what formal specification of a set of values I ought to provide it with. So, what do I pick, and why do I pick it?
Isn’t this begging the question? By ‘my values’ I’m pretty sure I literally mean ‘my current collection of habits and surface-level judgements and so forth’.
Could I have terminal values of which I am completely unaware in any way shape or form? How would I even recognize such things, and what reason do I have to prefer them over ‘my values’.
Did I just go in a circle?
Well, you tell me: if I went out right now and magically altered the world to reflect your current collection of habits and surface-level judgments, do you think you would endorse the result?
I’m pretty sure I wouldn’t, if the positions were reversed.
I would want you to change the world so that what I want is actualized, yes. If you wouldn’t endorse an alteration of the world towards your current values, in what sense do you really ‘value’ said values?
I’m going to need to taboo ‘value’, aren’t I?
I don’t know if you need to taboo it or not, but I’ll point out that I asked you a question that didn’t use that word, and you answered a question that did.
So perhaps a place to start is by answering the question I asked in the terms that I asked it?