More generally, it seems unlikely to me that the system which best implements my values would feel comfortable or even acceptable to me, any more than the diet that best addresses my nutritional needs will necessarily conform to my aesthetic preferences about food.
At first I thought this comparison was absolutely perfect, but I’m not really sure about that anymore. With a diet, you have other values to fall back on which might make your decision to adopt an aesthetically displeasing regimen still be something that you should do. With CEV, it’s not entirely clear to me in why I would want to prefer CEV values over my own current ones, so there’s no underlying reason for me to accept that I should accept CEV as the best implementation of my values.
That got a little complicated, and I’m not sure it’s exactly what I meant to say. Basically, I’m trying to say that while you may not be entirely comfortable with a better diet, you would still implement it for yourself since it’s a rational thing to do, whereas if you aren’t comfortable with implementing your own CEV, there’s no rational reason to compel you to do so.
there’s no underlying reason for me to accept that I should accept CEV as the best implementation of my values
Sure.
And even if I did accept CEV(humanity) as the best implementation of my values in principle, the question of what grounds I had to believe that any particular formally specified value system that was generated as output by some seed AI actually was CEV(humanity) is also worth asking.
Then again, there’s no underlying reason for me to accept that I should accept my current collection of habits and surface-level judgments and so forth as the best implementation of my values, either.
So, OK, at some point I’ve got a superhuman value-independent optimizer all rarin’ to go, and the only question is what formal specification of a set of values I ought to provide it with. So, what do I pick, and why do I pick it?
Then again, there’s no underlying reason for me to accept that I should accept my current collection of habits and surface-level judgments and so forth as the best implementation of my values, either.
Isn’t this begging the question? By ‘my values’ I’m pretty sure I literally mean ‘my current collection of habits and surface-level judgements and so forth’.
Could I have terminal values of which I am completely unaware in any way shape or form? How would I even recognize such things, and what reason do I have to prefer them over ‘my values’.
Well, you tell me: if I went out right now and magically altered the world to reflect your current collection of habits and surface-level judgments, do you think you would endorse the result?
I’m pretty sure I wouldn’t, if the positions were reversed.
I would want you to change the world so that what I want is actualized, yes. If you wouldn’t endorse an alteration of the world towards your current values, in what sense do you really ‘value’ said values?
I don’t know if you need to taboo it or not, but I’ll point out that I asked you a question that didn’t use that word, and you answered a question that did.
So perhaps a place to start is by answering the question I asked in the terms that I asked it?
At first I thought this comparison was absolutely perfect, but I’m not really sure about that anymore. With a diet, you have other values to fall back on which might make your decision to adopt an aesthetically displeasing regimen still be something that you should do. With CEV, it’s not entirely clear to me in why I would want to prefer CEV values over my own current ones, so there’s no underlying reason for me to accept that I should accept CEV as the best implementation of my values.
That got a little complicated, and I’m not sure it’s exactly what I meant to say. Basically, I’m trying to say that while you may not be entirely comfortable with a better diet, you would still implement it for yourself since it’s a rational thing to do, whereas if you aren’t comfortable with implementing your own CEV, there’s no rational reason to compel you to do so.
Sure.
And even if I did accept CEV(humanity) as the best implementation of my values in principle, the question of what grounds I had to believe that any particular formally specified value system that was generated as output by some seed AI actually was CEV(humanity) is also worth asking.
Then again, there’s no underlying reason for me to accept that I should accept my current collection of habits and surface-level judgments and so forth as the best implementation of my values, either.
So, OK, at some point I’ve got a superhuman value-independent optimizer all rarin’ to go, and the only question is what formal specification of a set of values I ought to provide it with. So, what do I pick, and why do I pick it?
Isn’t this begging the question? By ‘my values’ I’m pretty sure I literally mean ‘my current collection of habits and surface-level judgements and so forth’.
Could I have terminal values of which I am completely unaware in any way shape or form? How would I even recognize such things, and what reason do I have to prefer them over ‘my values’.
Did I just go in a circle?
Well, you tell me: if I went out right now and magically altered the world to reflect your current collection of habits and surface-level judgments, do you think you would endorse the result?
I’m pretty sure I wouldn’t, if the positions were reversed.
I would want you to change the world so that what I want is actualized, yes. If you wouldn’t endorse an alteration of the world towards your current values, in what sense do you really ‘value’ said values?
I’m going to need to taboo ‘value’, aren’t I?
I don’t know if you need to taboo it or not, but I’ll point out that I asked you a question that didn’t use that word, and you answered a question that did.
So perhaps a place to start is by answering the question I asked in the terms that I asked it?