I agree that friendly AI is probably doomed if the goal is to maximize human values. But what if we, as AI designers (I’m not an AI designer, but bear with me), don’t care about what you’ve defined as “human values?” As I’ve alluded to before, what matters isn’t the entire system Phil Goetz, but a specific subsystem. This is an important metaethical point. There are algorithms running in your brain that you have no control over, yet they do things that you simply don’t want them to be doing. For example, I don’t want to have to purchase fuzzies, but I do so that I may better purchase utilons.
This is one of the things Eliezer implicitly addresses when he says this:
I would say, by the way, that the huge blob of a computation is not just my present terminal values (which I don’t really have—I am not a consistent expected utility maximizers); the huge blob of a computation includes the specification of those moral arguments, those justifications, that would sway me if I heard them. So that I can regard my present values, as an approximation to the ideal morality that I would have if I heard all the arguments, to whatever extent such an extrapolation is coherent.
Values must be determined by reflection. Our actions belie the values of the entire system that is our body (or the values of the various sub-systems battling for control of our body), not just that important subsystem we care about (determining the subsystem that we care about is an important open question, but considering all possible moral arguments seems to be a way of avoiding this problem entirely). So while actions are very useful for setting up a value system to predict behavior, they are much less useful in determining the terminal values that we want to/should act on.
With all that said, there still may be a problem with coherence of values among all of humanity—I don’t necessarily disagree with you there. I just think you are shortchanging the metaethics.
I think Eliezer is drawing the opposite conclusion in that passage. You seem to be saying that we should (or at least could) ask what a subsystem of Phil Goetz values; Eliezer seems to be saying that we shouldn’t.
The complete post referred to shows that Eliezer doesn’t have the simple view of terminal values that I formerly attributed to him. I don’t know that it’s compatible with FAI, though, which IIRC is about preserving top-level goals. I could say that CEV = FAI + non-belief in terminal values.
ADDED: Oops; his post on terminal values shows that he does have the view of terminal values that I attributed to him.
Ok, ignore the talk about subsystems—you may be right, but it’s not really the basis of my criticism.
The problem is using actions to infer terminal values. In order to determine your terminal values, you have to think about them; reflect on them. Probably a lot. So in order for the actions of a person to be a reliable indicator of her terminal values, she must have done some reflecting on what she actually values. For most people, this hasn’t happened.
It’s true that this sort of reflection is probably more common among the wealthy—which buttresses your argument, but it’s still probably not very common in absolute terms. Consider how many people—even among the the rich—are religious. Or how many people hold on to a “written into the fabric of reality” view of morality.
So I don’t think you can draw very meaningful conclusions about the differences in terminal values among the sexes just by looking at their actions. I’ll grant that it’s evidence, just not as strong as your post suggests—and certainly not strong enough to settle the question.
The problem is using actions to infer terminal values. In order to determine your terminal values, you have to think about them; reflect on them. Probably a lot. So in order for the actions of a person to be a reliable indicator of her terminal values, she must have done some reflecting on what she actually values. For most people, this hasn’t happened.
I disagree. People who believe they have thought about their terminal values are often the most confused about what they actually value. Human values as judged by observing how people act rather than by what they claim to think are more self-consistent and more universal than the values professed by people who think they have discovered their own terminal values through reflection. Your conscious beliefs are but a distorted echo of the real values embodied in your brain.
Fair enough—a bit of reflecting might worsen the approximation, however, do our actions allow us to infer what our values would be after we take into account all possible moral arguments? This is what our terminal values are, and my main point is that actions don’t tell us much about them.
This is what our terminal values are, and my main point is that actions don’t tell us much about them.
Putting aside for a moment my issues with the whole idea of terminal values in the sense you seem to be imagining I would suggest that if our actions don’t tell us much about them then our thoughts and words tell us even less.
Matt Simpson was talking about people who have in fact reflected on their values a lot. Why did you switch to talking about people who think they have reflected a lot?
What “someone actually values” or what their “terminal values” are seems to be ambiguous in this discussion. On one reading, it just means what motivates someone the most. In that case, your claims are pretty plausible.
On the other reading, which seems more relevant in this thread and the original comment, it means the terminal values someone should act on, which we might approximate as what they would value at the end of reflection. Switching back to people who have reflected a lot (not merely think they have), it doesn’t seem all that plausible to suppose that people who have reflected a lot about their “terminal values” are often the most confused about them.
For the record, I’m perfectly happy to concede that in general, speaking of what someone “actually values” or what their present “terminal values” are should be reserved for what in fact most motivates people. I think it is tempting to use that kind of talk to refer to what people should value because it allows us to point to existing mental structures that play a clear causal role in influencing actions, but I think it is ultimately only confusing because it is the wrong mental structures to point to when analyzing rightness or shouldness.
The problem is using actions to infer terminal values. In order to determine your terminal values, you have to think about them; reflect on them. Probably a lot. So in order for the actions of a person to be a reliable indicator of her terminal values, she must have done some reflecting on what she actually values. For most people, this hasn’t happened.
I recently wrote a long post arguing that your actions specify what your terminal values are, in a way that your thoughts can’t. And you commented on it, so I won’t repeat the relevant parts here.
click Ok, I think I see now. Forgive me for asking instead of trying to figure this out on my own—finals are looming tomorrow. Morning. And the next morning.
One of two things seem to be going on here. I’ll quote Eliezer’s metaethical position again, for reference (answering what the meaning of “right” is):
For a human this is a much huger blob of a computation that looks like, “Did everyone survive? How many people are happy? Are people in control of their own lives? …” Humans have complex emotions, have many values—the thousand shards of desire, the godshatter of natural selection. I would say, by the way, that the huge blob of a computation is not just my present terminal values (which I don’t really have—I am not a consistent expected utility maximizers); the huge blob of a computation includes the specification of those moral arguments, those justifications, that would sway me if I heard them. So that I can regard my present values, as an approximation to the ideal morality that I would have if I heard all the arguments, to whatever extent such an extrapolation is coherent.
The bolded part is most relevant to my question. Are you agreeing with Eleizer’s argument, and just arguing that present terminal values can only be inferred from action? Or are you disagreeing and arguing that present terminal values, which only can be inferred from action, are the terminal values, i.e. the meaning of “right”/”should”? Or is it something else and I’m still confused?
So—first, I don’t really believe in terminal values. When I use that term, I’m working within a hypothetical frame (if X is true, then Y); or using it as an approximation.
I said that the only real values an organism has are the values it implements. If you want to have a science of values, and for instance be able to predict what organisms will have what values, you want to work with the behaviors produced, which should follow certain rules; whereas the values that an organism believes it has are different from the values it implements due to accidents of evolution, and working with them will make things less predictable and a science of values more difficult.
Eliezer lists only propositional content as factors to consider. So I think he’s not talking about the difficulty of dividing an organism into values and value-implementing infrastructure. He seems to be saying that currently-implemented values are a poor copy of a Platonic ideal which we can extrapolate.
I would be less likely than Eliezer to consider my present values very similar to the “right” values. I think he would either say there are no right values, or that the right values are those extrapolated from your current values in a way that fixes accidents of evolution and flawed cognition and ignorance. But he doesn’t have an independent set of values to set up in opposition to your current values. I would, by contrast, feel comfortable saying “existence is better than non-existence”, “consciousness has value”, and “complexity is good”; those statements override whatever terminal values I have.
I don’t really think those statements are in the same category as my terminal values. My terminal values largely concern my own well-being, not what the universe should be like. My preference for good things to happen to me can’t be true or false.
I’m not comfortable with using “preference” and “value” interchangeably, either. “Preference” connotes likes: chocolate, classical music, fast cars, social status. My preferences are just lists of adjectives. “Value” connotes something with more structure: justice, liberty. Language treats these things as adjectives, but they aren’t primitive adjectives. They are predicates of more than one argument; preferences seem to be predicates of just one argument.
I agree that friendly AI is probably doomed if the goal is to maximize human values. But what if we, as AI designers (I’m not an AI designer, but bear with me), don’t care about what you’ve defined as “human values?” As I’ve alluded to before, what matters isn’t the entire system Phil Goetz, but a specific subsystem. This is an important metaethical point. There are algorithms running in your brain that you have no control over, yet they do things that you simply don’t want them to be doing. For example, I don’t want to have to purchase fuzzies, but I do so that I may better purchase utilons.
This is one of the things Eliezer implicitly addresses when he says this:
Values must be determined by reflection. Our actions belie the values of the entire system that is our body (or the values of the various sub-systems battling for control of our body), not just that important subsystem we care about (determining the subsystem that we care about is an important open question, but considering all possible moral arguments seems to be a way of avoiding this problem entirely). So while actions are very useful for setting up a value system to predict behavior, they are much less useful in determining the terminal values that we want to/should act on.
With all that said, there still may be a problem with coherence of values among all of humanity—I don’t necessarily disagree with you there. I just think you are shortchanging the metaethics.
I think Eliezer is drawing the opposite conclusion in that passage. You seem to be saying that we should (or at least could) ask what a subsystem of Phil Goetz values; Eliezer seems to be saying that we shouldn’t.
The complete post referred to shows that Eliezer doesn’t have the simple view of terminal values that I formerly attributed to him. I don’t know that it’s compatible with FAI, though, which IIRC is about preserving top-level goals. I could say that CEV = FAI + non-belief in terminal values.
ADDED: Oops; his post on terminal values shows that he does have the view of terminal values that I attributed to him.
Ok, ignore the talk about subsystems—you may be right, but it’s not really the basis of my criticism.
The problem is using actions to infer terminal values. In order to determine your terminal values, you have to think about them; reflect on them. Probably a lot. So in order for the actions of a person to be a reliable indicator of her terminal values, she must have done some reflecting on what she actually values. For most people, this hasn’t happened.
It’s true that this sort of reflection is probably more common among the wealthy—which buttresses your argument, but it’s still probably not very common in absolute terms. Consider how many people—even among the the rich—are religious. Or how many people hold on to a “written into the fabric of reality” view of morality.
So I don’t think you can draw very meaningful conclusions about the differences in terminal values among the sexes just by looking at their actions. I’ll grant that it’s evidence, just not as strong as your post suggests—and certainly not strong enough to settle the question.
I disagree. People who believe they have thought about their terminal values are often the most confused about what they actually value. Human values as judged by observing how people act rather than by what they claim to think are more self-consistent and more universal than the values professed by people who think they have discovered their own terminal values through reflection. Your conscious beliefs are but a distorted echo of the real values embodied in your brain.
Fair enough—a bit of reflecting might worsen the approximation, however, do our actions allow us to infer what our values would be after we take into account all possible moral arguments? This is what our terminal values are, and my main point is that actions don’t tell us much about them.
Putting aside for a moment my issues with the whole idea of terminal values in the sense you seem to be imagining I would suggest that if our actions don’t tell us much about them then our thoughts and words tell us even less.
On a day to day basis, sure. I accept that possibility. We don’t get to consider all possible moral arguments, well, ever.
Matt Simpson was talking about people who have in fact reflected on their values a lot. Why did you switch to talking about people who think they have reflected a lot?
What “someone actually values” or what their “terminal values” are seems to be ambiguous in this discussion. On one reading, it just means what motivates someone the most. In that case, your claims are pretty plausible.
On the other reading, which seems more relevant in this thread and the original comment, it means the terminal values someone should act on, which we might approximate as what they would value at the end of reflection. Switching back to people who have reflected a lot (not merely think they have), it doesn’t seem all that plausible to suppose that people who have reflected a lot about their “terminal values” are often the most confused about them.
For the record, I’m perfectly happy to concede that in general, speaking of what someone “actually values” or what their present “terminal values” are should be reserved for what in fact most motivates people. I think it is tempting to use that kind of talk to refer to what people should value because it allows us to point to existing mental structures that play a clear causal role in influencing actions, but I think it is ultimately only confusing because it is the wrong mental structures to point to when analyzing rightness or shouldness.
I recently wrote a long post arguing that your actions specify what your terminal values are, in a way that your thoughts can’t. And you commented on it, so I won’t repeat the relevant parts here.
click Ok, I think I see now. Forgive me for asking instead of trying to figure this out on my own—finals are looming tomorrow. Morning. And the next morning.
One of two things seem to be going on here. I’ll quote Eliezer’s metaethical position again, for reference (answering what the meaning of “right” is):
The bolded part is most relevant to my question. Are you agreeing with Eleizer’s argument, and just arguing that present terminal values can only be inferred from action? Or are you disagreeing and arguing that present terminal values, which only can be inferred from action, are the terminal values, i.e. the meaning of “right”/”should”? Or is it something else and I’m still confused?
So—first, I don’t really believe in terminal values. When I use that term, I’m working within a hypothetical frame (if X is true, then Y); or using it as an approximation.
I said that the only real values an organism has are the values it implements. If you want to have a science of values, and for instance be able to predict what organisms will have what values, you want to work with the behaviors produced, which should follow certain rules; whereas the values that an organism believes it has are different from the values it implements due to accidents of evolution, and working with them will make things less predictable and a science of values more difficult.
Eliezer lists only propositional content as factors to consider. So I think he’s not talking about the difficulty of dividing an organism into values and value-implementing infrastructure. He seems to be saying that currently-implemented values are a poor copy of a Platonic ideal which we can extrapolate.
I would be less likely than Eliezer to consider my present values very similar to the “right” values. I think he would either say there are no right values, or that the right values are those extrapolated from your current values in a way that fixes accidents of evolution and flawed cognition and ignorance. But he doesn’t have an independent set of values to set up in opposition to your current values. I would, by contrast, feel comfortable saying “existence is better than non-existence”, “consciousness has value”, and “complexity is good”; those statements override whatever terminal values I have.
I don’t really think those statements are in the same category as my terminal values. My terminal values largely concern my own well-being, not what the universe should be like. My preference for good things to happen to me can’t be true or false.
I’m not comfortable with using “preference” and “value” interchangeably, either. “Preference” connotes likes: chocolate, classical music, fast cars, social status. My preferences are just lists of adjectives. “Value” connotes something with more structure: justice, liberty. Language treats these things as adjectives, but they aren’t primitive adjectives. They are predicates of more than one argument; preferences seem to be predicates of just one argument.
Dude, we both need to stop this and go to sleep.