Figuring out why you have a value, or what the value is attached to, is usually a helpful exercise when it apparently conflicts with other things.
I think that, though you have given good approaches to making a good tradeoff, the conflict between values in this example is real, and the point is that you make the best tradeoff you can in the context, but don’t modify your values because the internal conflict makes it hard to achieve them.
Point taken—you certainly don’t want to routinely solve problems by changing your values instead of changing your environment.
However, I think you tend to think about deep values, what I sometimes call latent values, while I often talk about surface values, of the type that show up in English sentences and in logical representations of them. People do change their surface values: they become vegetarian, quit smoking, go on a diet, realize they don’t enjoy Pokemon anymore, and so on. I think that this surface-value-changing is well-modelled by energy minimization.
Whether there is a set of “deepest values” that never change is an open question. These are the things EY is talking about when he says an agent would never want to change its goals, and that you’re talking about when you say an agent doesn’t change its utility function. The EY-FAI model assumes such a thing exists, or that they should exist, or could exist. This needs to be thought about more. I think my comments in “Only humans can have human values” on “network concepts” are relevant. It’s not obvious that a human’s goal structure has top-level goals. It would be a possibly-unique exception among complex network systems if they do.
I think that, though you have given good approaches to making a good tradeoff, the conflict between values in this example is real, and the point is that you make the best tradeoff you can in the context, but don’t modify your values because the internal conflict makes it hard to achieve them.
Point taken—you certainly don’t want to routinely solve problems by changing your values instead of changing your environment.
However, I think you tend to think about deep values, what I sometimes call latent values, while I often talk about surface values, of the type that show up in English sentences and in logical representations of them. People do change their surface values: they become vegetarian, quit smoking, go on a diet, realize they don’t enjoy Pokemon anymore, and so on. I think that this surface-value-changing is well-modelled by energy minimization.
Whether there is a set of “deepest values” that never change is an open question. These are the things EY is talking about when he says an agent would never want to change its goals, and that you’re talking about when you say an agent doesn’t change its utility function. The EY-FAI model assumes such a thing exists, or that they should exist, or could exist. This needs to be thought about more. I think my comments in “Only humans can have human values” on “network concepts” are relevant. It’s not obvious that a human’s goal structure has top-level goals. It would be a possibly-unique exception among complex network systems if they do.