But more generally, if you think that a different set of life experiences means that you are a different person with different values, then that’s a really good reason to assume that the whole framework of getting the true human utility function is doomed. Not just ambitious value learning, _any_ framework that involves an AI optimizing some expected utility would not work.
This statement feels pretty strong, especially given that I find it trivially true that I’d be a different person under many plausible alternative histories. This makes me think I’m probably misinterpreting something. :)
At first I read your paragraph as the strong claim that if it’s true that individual human values are underdetermined at birth, then ambitious value learning looks doomed. And I’d take it as proof for “individual human values are underdetermined at birth” if, replaying history, I’d now have different values (or a different probability distribution over values) if I had encountered Yudkowsky’s writings before Singer’s, rather than vice-versa. Or if I would be less single-minded about altruism had I encountered EA a couple of years later in life, after already taking on another self-identity.
But these points (especially the second example) seem so trivially true that I’m probably talking about a different thing. In addition, they’re addressed by the solution you propose in your first paragraph, namely taking current-you as the starting point.
Another concern could be that “there is almost never a stable core of an individual human’s values”, i.e., that “even going forward from today, the values of Lukas or Rohin or Wei are going to be heavily underdetermined”. Is that the concern? This seems like it could be possible for most people, but definitely not for all people. And undetermined values are not necessarily that bad (though I find it mildly disconcerting, personally). [Edit: Wei’s comment and your reply to it sounds like this might indeed be the concern. :) Good discussion there!]
The fact that I have a hard time understanding the framework behind your statement is probably because I’m thinking in terms of a different part of my brain when I talk about “my values”. I identify very much with my reflective life goals to a point that seems unusual. I don’t identify much with “What Lukas’s behavior, if you were to put him in different environments and then watch, would indirectly consistently tell you about the things he appears to want – e.g., ‘values’ like being held in high esteem by others, having a comfortable life, romance, having either some kind of overarching purpose or enough distractions to not feel bother by the lack of purpose, etc.”. There is definitely a sense in which the code that runs me is caring about all these implicit goals. But that’s not how I most want to see it. I also know that in all the environments that offer the options to self-modify into a more efficient pursuer of explicitly held personal ideals, I would make substantial use of the option to self-modify. And that seems relevant for the same reason that we wouldn’t want to count cognitive biases as people’s values.
(I should probably continue reading the sequence and then come back to this later if I still feel unclear about it.)
Another concern could be that “there is almost never a stable core of an individual human’s values”, i.e., that “even going forward from today, the values of Lukas or Rohin or Wei are going to be heavily underdetermined”. Is that the concern?
Yeah. Also I suspect some people are worried about taking current-you as a starting point—that seems somewhat arbitrary. But if you’re fine with that, then the major concern is that values are still underdetermined going forward.
The fact that I have a hard time understanding the framework behind your statement is probably because I’m thinking in terms of a different part of my brain when I talk about “my values”. I identify very much with my reflective life goals to a point that seems unusual.
I interpreted Wei’s comment as saying that even your reflective life goals would be underdetermined—presumably even now if you hear convincing moral argument A but not B, then you’d have different reflective life goals than if you hear B but not A. This seems broadly correct to me.
I interpreted Wei’s comment as saying that even your reflective life goals would be underdetermined—presumably even now if you hear convincing moral argument A but not B, then you’d have different reflective life goals than if you hear B but not A.
Okay yeah, that also seems broadly correct to me.
I am hoping though that, as long as I’m not subjected to optimization pressures from outside that weren’t crafted to be helpful, it’s very rare that something I’d currently consider very important can end up either staying important or becoming completely unimportant merely based on order of new arguments encountered. And similarly I’m hoping that my value endpoints would still cluster decisively around the things I currently consider most important, – though that’s where it becomes tricky to trade off goal preservation versus openness for philosophical progress.
This statement feels pretty strong, especially given that I find it trivially true that I’d be a different person under many plausible alternative histories. This makes me think I’m probably misinterpreting something. :)
At first I read your paragraph as the strong claim that if it’s true that individual human values are underdetermined at birth, then ambitious value learning looks doomed. And I’d take it as proof for “individual human values are underdetermined at birth” if, replaying history, I’d now have different values (or a different probability distribution over values) if I had encountered Yudkowsky’s writings before Singer’s, rather than vice-versa. Or if I would be less single-minded about altruism had I encountered EA a couple of years later in life, after already taking on another self-identity.
But these points (especially the second example) seem so trivially true that I’m probably talking about a different thing. In addition, they’re addressed by the solution you propose in your first paragraph, namely taking current-you as the starting point.
Another concern could be that “there is almost never a stable core of an individual human’s values”, i.e., that “even going forward from today, the values of Lukas or Rohin or Wei are going to be heavily underdetermined”. Is that the concern? This seems like it could be possible for most people, but definitely not for all people. And undetermined values are not necessarily that bad (though I find it mildly disconcerting, personally). [Edit: Wei’s comment and your reply to it sounds like this might indeed be the concern. :) Good discussion there!]
The fact that I have a hard time understanding the framework behind your statement is probably because I’m thinking in terms of a different part of my brain when I talk about “my values”. I identify very much with my reflective life goals to a point that seems unusual. I don’t identify much with “What Lukas’s behavior, if you were to put him in different environments and then watch, would indirectly consistently tell you about the things he appears to want – e.g., ‘values’ like being held in high esteem by others, having a comfortable life, romance, having either some kind of overarching purpose or enough distractions to not feel bother by the lack of purpose, etc.”. There is definitely a sense in which the code that runs me is caring about all these implicit goals. But that’s not how I most want to see it. I also know that in all the environments that offer the options to self-modify into a more efficient pursuer of explicitly held personal ideals, I would make substantial use of the option to self-modify. And that seems relevant for the same reason that we wouldn’t want to count cognitive biases as people’s values.
(I should probably continue reading the sequence and then come back to this later if I still feel unclear about it.)
Yeah. Also I suspect some people are worried about taking current-you as a starting point—that seems somewhat arbitrary. But if you’re fine with that, then the major concern is that values are still underdetermined going forward.
I interpreted Wei’s comment as saying that even your reflective life goals would be underdetermined—presumably even now if you hear convincing moral argument A but not B, then you’d have different reflective life goals than if you hear B but not A. This seems broadly correct to me.
Okay yeah, that also seems broadly correct to me.
I am hoping though that, as long as I’m not subjected to optimization pressures from outside that weren’t crafted to be helpful, it’s very rare that something I’d currently consider very important can end up either staying important or becoming completely unimportant merely based on order of new arguments encountered. And similarly I’m hoping that my value endpoints would still cluster decisively around the things I currently consider most important, – though that’s where it becomes tricky to trade off goal preservation versus openness for philosophical progress.