Say I’m shopping for a loaf of bread. I have two values. I prefer larger loaves over smaller loaves, and I prefer cheaper loaves over more expensive loaves.
Unfortunately, these values are negatively correlated with each other (larger loaves tend to cost more). Clearly, my values are an arbitrary rule system which gives contradictory, hard-to-interpret results resulting in schizophrenic behavior that appears insane to observers from almost any other value system.
So how should I resolve this? Should I switch to preferring smaller loaves of bread, or should I switch to preferring more expensive loaves of bread?
That depends on why you prefer larger loaves of bread.
If you’re maximizing calories or just want to feel that you’re getting a good deal, go for the highest calorie-to-dollar ratio, noting sales.
If you need more surface area for your sandwiches, choose bread that is shaped in a sandwich-optimal configuration with little hard-to-sandwich heel volume. Make thin slices so you can make more sandwiches, and get an amount of bread that will last just about exactly until you go to the store again or until you expect diminishing marginal utility from bread-eating due to staleness.
If you want large loaves to maximize the amount of time between grocery trips, buy 6 loaves of the cheapest kind and put 5 of them in the freezer, to take out as you finish room-temperature bread.
If you just think large loaves of bread are aesthetically pleasing, pick a kind of bread with lots of big air pockets that puff it up, which is priced by dough weight.
etc. etc.
Figuring out why you have a value, or what the value is attached to, is usually a helpful exercise when it apparently conflicts with other things.
Figuring out why you have a value, or what the value is attached to, is usually a helpful exercise when it apparently conflicts with other things.
I think that, though you have given good approaches to making a good tradeoff, the conflict between values in this example is real, and the point is that you make the best tradeoff you can in the context, but don’t modify your values because the internal conflict makes it hard to achieve them.
Point taken—you certainly don’t want to routinely solve problems by changing your values instead of changing your environment.
However, I think you tend to think about deep values, what I sometimes call latent values, while I often talk about surface values, of the type that show up in English sentences and in logical representations of them. People do change their surface values: they become vegetarian, quit smoking, go on a diet, realize they don’t enjoy Pokemon anymore, and so on. I think that this surface-value-changing is well-modelled by energy minimization.
Whether there is a set of “deepest values” that never change is an open question. These are the things EY is talking about when he says an agent would never want to change its goals, and that you’re talking about when you say an agent doesn’t change its utility function. The EY-FAI model assumes such a thing exists, or that they should exist, or could exist. This needs to be thought about more. I think my comments in “Only humans can have human values” on “network concepts” are relevant. It’s not obvious that a human’s goal structure has top-level goals. It would be a possibly-unique exception among complex network systems if they do.
I see your point. I wasn’t thinking of models where you have one preference per object feature. I was thinking of more abstract examples, like trying to be a cheek-turning enemy-loving Christian and a soldier at the same time.
I don’t think of choosing an object whose feature vector has the maximum dot product with your preference vector as conflict resolution; I think of it (and related numerical constraint problems) as simplex optimization. When you want to sum a set of preferences that are continuous functions of continuous features, you can generally take all the preferences and solve directly (or numerically) to find the optimum.
In the “moral values” domain, you’re more likely to have discontinuous rules (e.g., “X is always bad”, or “XN is not”), and be performing logical inference over them. This results in situations that you can’t solve directly, and it can result in circular or indeterminate chains of reasoning, and multiple possible solutions.
My claim is that more conflicts is worse, not that conflicts can or should be eliminated. But I admit that aspect of my model could use more justification.
Is there a way to distinguish moral values from other kinds of values? Coming up with a theory of values that explains both the process of choosing who to vote for, and threading a needle, as value-optimization, is going to be difficult.
In the “moral values” domain, you’re more likely to have discontinuous rules (e.g., “X is always bad”, or “XN is not”), and be performing logical inference over them. This results in situations that you can’t solve directly, and it can result in circular or indeterminate chains of reasoning, and multiple possible solutions.
This line of thinking is setting off my rationalization detectors. It sounds like you’re saying, “OK, I’ll admit that my claim seems wrong in some simple cases. But it’s still correct in all of the cases that are so complicated that nobody understands them.”
I don’t know how to distinguish moral values from other kinds of values, but it seems to me that this isn’t exactly the distinction that would be most useful for you to figure out. My suggestion would be to figure out why you think high IC is bad, and see if there’s some nice way to characterize the value systems that match that intuition.
I think a natural intuition about a moral values domain suggests that things are likely to be non-linear and discontinuous.
I don’t think its so much saying the claim is wrong in simple cases, but its still correct in cases no one understands.
It’s more saying the alternative claims being proposed are a long ways from handling any real world example, and I’m disinclined to believe that a sufficiently complicated system will satisfy continuity and linearity.
Also, we should distinguish between “why do I expect that existing value systems are energy-minimized” and “why should we prefer value systems that are energy-minimized”.
The former is easier to answer, and I gave a bit of an answer in “Only humans can have human values”.
The latter I could justify within EY-FAI by therefore claiming that being energy-minimized is a property of human values.
My suggestion would be to figure out why you think high IC is bad, and see if there’s some nice way to characterize the value systems that match that intuition.
That’s a good idea. My “final reason” for thinking that high IC is bad may be because high-IC systems are a pain in the ass when you’re building intelligent agents. They have a lot of interdependencies among their behaviors, get stuck waffling between different behaviors, and are hard to debug. But we (as designers and as intelligent agents) have mechanisms to deal with these problems; e.g., producing hysteresis by using nonlinear functions to sum activation from different goals.
My other final reason is that I consciously try to energy-minimize my own values, and I think other thoughtful people who aren’t nihilists do too. Probably nihilists do too, if only for their own convenience.
My other other final reason is that energy-minimization is what dynamic network concepts do. It’s how they develop, as e.g. for spin-glasses, economies, or ecologies.
Say I’m shopping for a loaf of bread. I have two values. I prefer larger loaves over smaller loaves, and I prefer cheaper loaves over more expensive loaves.
Unfortunately, these values are negatively correlated with each other (larger loaves tend to cost more). Clearly, my values are an arbitrary rule system which gives contradictory, hard-to-interpret results resulting in schizophrenic behavior that appears insane to observers from almost any other value system.
So how should I resolve this? Should I switch to preferring smaller loaves of bread, or should I switch to preferring more expensive loaves of bread?
That depends on why you prefer larger loaves of bread.
If you’re maximizing calories or just want to feel that you’re getting a good deal, go for the highest calorie-to-dollar ratio, noting sales.
If you need more surface area for your sandwiches, choose bread that is shaped in a sandwich-optimal configuration with little hard-to-sandwich heel volume. Make thin slices so you can make more sandwiches, and get an amount of bread that will last just about exactly until you go to the store again or until you expect diminishing marginal utility from bread-eating due to staleness.
If you want large loaves to maximize the amount of time between grocery trips, buy 6 loaves of the cheapest kind and put 5 of them in the freezer, to take out as you finish room-temperature bread.
If you just think large loaves of bread are aesthetically pleasing, pick a kind of bread with lots of big air pockets that puff it up, which is priced by dough weight.
etc. etc.
Figuring out why you have a value, or what the value is attached to, is usually a helpful exercise when it apparently conflicts with other things.
I think that, though you have given good approaches to making a good tradeoff, the conflict between values in this example is real, and the point is that you make the best tradeoff you can in the context, but don’t modify your values because the internal conflict makes it hard to achieve them.
Point taken—you certainly don’t want to routinely solve problems by changing your values instead of changing your environment.
However, I think you tend to think about deep values, what I sometimes call latent values, while I often talk about surface values, of the type that show up in English sentences and in logical representations of them. People do change their surface values: they become vegetarian, quit smoking, go on a diet, realize they don’t enjoy Pokemon anymore, and so on. I think that this surface-value-changing is well-modelled by energy minimization.
Whether there is a set of “deepest values” that never change is an open question. These are the things EY is talking about when he says an agent would never want to change its goals, and that you’re talking about when you say an agent doesn’t change its utility function. The EY-FAI model assumes such a thing exists, or that they should exist, or could exist. This needs to be thought about more. I think my comments in “Only humans can have human values” on “network concepts” are relevant. It’s not obvious that a human’s goal structure has top-level goals. It would be a possibly-unique exception among complex network systems if they do.
I see your point. I wasn’t thinking of models where you have one preference per object feature. I was thinking of more abstract examples, like trying to be a cheek-turning enemy-loving Christian and a soldier at the same time.
I don’t think of choosing an object whose feature vector has the maximum dot product with your preference vector as conflict resolution; I think of it (and related numerical constraint problems) as simplex optimization. When you want to sum a set of preferences that are continuous functions of continuous features, you can generally take all the preferences and solve directly (or numerically) to find the optimum.
In the “moral values” domain, you’re more likely to have discontinuous rules (e.g., “X is always bad”, or “XN is not”), and be performing logical inference over them. This results in situations that you can’t solve directly, and it can result in circular or indeterminate chains of reasoning, and multiple possible solutions.
My claim is that more conflicts is worse, not that conflicts can or should be eliminated. But I admit that aspect of my model could use more justification.
Is there a way to distinguish moral values from other kinds of values? Coming up with a theory of values that explains both the process of choosing who to vote for, and threading a needle, as value-optimization, is going to be difficult.
This line of thinking is setting off my rationalization detectors. It sounds like you’re saying, “OK, I’ll admit that my claim seems wrong in some simple cases. But it’s still correct in all of the cases that are so complicated that nobody understands them.”
I don’t know how to distinguish moral values from other kinds of values, but it seems to me that this isn’t exactly the distinction that would be most useful for you to figure out. My suggestion would be to figure out why you think high IC is bad, and see if there’s some nice way to characterize the value systems that match that intuition.
I disagree with this.
I think a natural intuition about a moral values domain suggests that things are likely to be non-linear and discontinuous.
I don’t think its so much saying the claim is wrong in simple cases, but its still correct in cases no one understands.
It’s more saying the alternative claims being proposed are a long ways from handling any real world example, and I’m disinclined to believe that a sufficiently complicated system will satisfy continuity and linearity.
Also, we should distinguish between “why do I expect that existing value systems are energy-minimized” and “why should we prefer value systems that are energy-minimized”.
The former is easier to answer, and I gave a bit of an answer in “Only humans can have human values”.
The latter I could justify within EY-FAI by therefore claiming that being energy-minimized is a property of human values.
That’s a good idea. My “final reason” for thinking that high IC is bad may be because high-IC systems are a pain in the ass when you’re building intelligent agents. They have a lot of interdependencies among their behaviors, get stuck waffling between different behaviors, and are hard to debug. But we (as designers and as intelligent agents) have mechanisms to deal with these problems; e.g., producing hysteresis by using nonlinear functions to sum activation from different goals.
My other final reason is that I consciously try to energy-minimize my own values, and I think other thoughtful people who aren’t nihilists do too. Probably nihilists do too, if only for their own convenience.
My other other final reason is that energy-minimization is what dynamic network concepts do. It’s how they develop, as e.g. for spin-glasses, economies, or ecologies.