I think that, in almost full generality, we should taboo the term “values”. It’s usually ambiguous between a bunch of distinct meanings.
The ideals that, when someone contemplates, invoke strong feelings (of awe, motivation, excitement, exultation, joy, etc.)
The incentives of an agent in a formalized game with quantified payoffs.
A utility function—one’s hypothetical ordering over words, world-trajectories, etc, that results from comparing each pair and evaluating which one is better.
A person’s revealed preferences.
The experiences and activities that a person likes for their own sake.
A person’s vision of an ideal world. (Which, I claim, often reduces to “an imagined world that’s aesthetically appealing.”)
The goals that are at the root of a chain or tree of instrumental goals.
[This often comes with an implicit or explicit implication that most of human behavior has that chain/tree structure, as opposed being, for instance, mostly hardcoded adaptions, or a chain/tree of goals that grounds out in a mess of hardcoded adaptions instead of anything goal-like.]
The goals/narratives that give meaning to someone’s life.
[It can be the case almost all one’s meaning can come through a particular meaning-making schema, but from a broader perspective, a person could have been ~indifferent between multiple schema.
For instance, for some but not most EAs, EA is very central to their personal meaning-making, but they could easily have ended up as a social justice warrior, or a professional Libertarian, instead. And those counterfactual worlds, the other ideology is similarly central to their happiness and meaning-making. I think in such cases, it’s at least somewhat confused if to look at the EA and declare that “maximizing [aggregate/average] utility” is their “terminal value”. That’s papering over the psychological process that adopts ideology or another, which is necessarily more fundamental than the specific chosen ideology/”terminal value”.
It’s kind of like being in love with someone. You might love your wife more than anything, she might be the most important person in your life. But if you admit that it’s possible that if you had been in different communities in your 20s you might have married someone else, then there’s some other goal/process that picks who to marry. So to with ideologies.]
Behaviors and attitudes that signal well regarded qualities.
The goals that are sacred to a person, for many possible meanings of sacred.
What a person “really wants” underneath their trauma responses. What they would want, if their trauma was fully healed.
The actions make someone feel most alive and authentically themselves.
The equilibrium of moral philosophy, under arbitrary reflection.
Most of the time when I see the word “values” used on LessWrong, it’s ambiguous between theses (and other) meanings.
A particular ambiguity: sometimes “values” seem to be referring to the first-person experiences that a person likes for their own sake (“spending time near beautiful women is a terminal value for me”), and other times it seems to be referring to a world that a person thinks is awesome, when viewing that world from a god’s eye view. Those are not the same thing, and they do not have remotely the same psychological functions! Among other differences, one is a near-mode evaluation, and the other is a far-mode evaluation.
Worse than that, I think there’s often a conflation of these meanings.
For instance, I often detect a hidden assumption that that the root of someone’s tree of instrumental goals is the same thing as their ranking over possible worlds. I think that conflation is very rarely, if ever, correct: the deep motivations of a person’s actions are not the same thing as the hypothetical world that is evaluated as best in thought experiments, even if the later thing is properly the person’s “utility function”. At least in the vast majority of cases, one’s hypothetical ideal world has almost no motivational power (as a matter of descriptive psychology, not of normative philosophy).
Also (though this is the weakest reason to change our terminology, I think), there’s additional ambiguity to people who are not already involved in the memeplex.
To broader world “values” usually connotes something high-minded or noble: if you do a corporate-training-style exercise to “reflect on your values”, you get things like “integrity” and “compassion”, not things like “sex” or “spite”. In contrast, LessWrongers would usually count sex and spite, not to mention boredom and pain, as part of “human values” and many would also own them as part of their personal values.
I at least partly buy this, but I want to play devil’s advocate.
Let’s suppose there’s a single underlying thing which ~everyone is gesturing at when talking about (humans’) “values”. How could a common underlying notion of “values” be compatible with our observation that people talk about all the very distinct things you listed, when you start asking questions about their “values”?
An analogy: in political science, people talk about “power”. Right up top, wikipedia defines “power” in the political science sense as:
In political science, power is the social production of an effect that determines the capacities, actions, beliefs, or conduct of actors.
A minute’s thought will probably convince you that this supposed definition does not match the way anybody actually uses the term; for starters, actual usage is narrower. That definition probably doesn’t even match the way the term is used by the person who came up with that definition.
That’s the thing I want to emphasize here: if you ask people to define a term, the definitions they give ~never match their own actual usage of the term, with the important exception of mathematics.
… but that doesn’t imply that there’s no single underlying thing which political scientists are gesturing at when they talk about “power”. It just implies that the political scientists themselves haven’t figured out the True Name of the thing their intuitions are pointed at.
Now back to “values”. It seems pretty plausible to me that people are in fact generally gesturing at the same underlying thing, when they talk about “values”. But people usually have very poor understanding of their own values (a quick check confirms that this applies to arguably-all of the notions of “values” on your list), so it’s not surprising if people end up defining their values in many different incompatible ways which don’t match the underlying common usage very well.
(Example: consider the prototypical deep Christian. They’d probably tell us that their “values” are to follow whatever directives are in the Bible, or some such. But then when actual value-loaded questions come up, they typically find some post-hoc story about how the Bible justifies their preferred value-claim… implying that the source of their value-claims, i.e. “values”, is something other than the Bible. This is totally compatible with deep Christians intuitively meaning the same thing I do when they talk about “values”, it’s just that they don’t reflectively know their actual usage of the term.)
… and if that is the case, then tabooing “values” is exactly the wrong move. The word itself is pointed at the right thing, and it’s all the attempted-definitions which are wrong. Tabooing “values” and replacing it with the definitions people think they’re using would be a step toward less correctness.
I’m kinda confused by this example. Let’s say the person exhibits three behaviors:
(1): They make broad abstract “value claims” like “I follow Biblical values”.
(2): They make narrow specific “value claims” like “It’s wrong to allow immigrants to undermine our communities”.
(3): They do object-level things that can be taken to indicate “values”, like cheating on their spouse
From my perspective, I feel like you’re taking a stand and saying that the real definition of “values” is (2), and is not (1). (Not sure what you think of (3).) But isn’t that adjacent to just declaring that some things on Eli’s list are the real “values” and others are not?
In particular, at some point you have to draw a distinction between values and desires, right? I feel like you’re using the word “value claims” to take that distinction for granted, or something.
(For the record, I have sometimes complained about alignment researchers using the word “values” when they’re actually talking about “desires”.)
tabooing “values” is exactly the wrong move
I agree that it’s possible to use the suite of disparate intuitions surrounding some word as a kind of anthropological evidence that informs an effort to formalize or understand something-or-other. And that, if you’re doing that, you can’t taboo that word. But that’s not what people are doing with words 99+% of the time. They’re using words to (try to) communicate substantive claims. And in that case you should totally beware of words like “values” that have unusually large clouds of conflicting associations, and liberally taboo or define them.
Relatedly, if a writer uses the word “values” without further specifying what they mean, they’re not just invoking lots of object-level situations that seem to somehow relate to “values”; they’re also invoking any or all of those conflicting definitions of the word “values”, i.e. the things on Eli’s list, the definitions that you’re saying are wrong or misleading.
It seems pretty plausible to me that people are in fact generally gesturing at the same underlying thing, when they talk about “values”.
In the power example, the physics definition (energy over time) and the Alex Turner definition have something to do with each other, but I wouldn’t call them “the same underlying thing”—they can totally come apart, especially out of distribution.
It’s worse than just a blegg/rube thing: I think words can develop into multiple clusters connected by analogies. Like, “leg” is a body part, but also “this story has legs” and “the first leg of the journey” and “the legs of the right triangle”. It seems likely to me that “values” has some amount of that.
I agree. Some interpretations of “values” you didn’t explicitly list, but I think are important:
What someone wants to be true (analogous to what someone believes to be true)
What someone would want to be true if they knew what it would be like if it were true
What someone believes would be good if it were true
These are distinct, because either could clearly differ from the others. So the term “value” is actually ambiguous, not just vague. Talking about “values” is usually unnecessarily unclear, similar to talking about “utilities” in utility theory.
A few of the “distinct meanings” you list are very different from the others, but many of those are pretty similar. “Values” is a pretty broad term, including everything on the “ought” side of the is–ought divide, less “high-minded or noble” preferences, and one’s “ranking over possible worlds”, and that’s fine: it seems like a useful (and coherent!) concept to have a word for. You can be more specific with adjectives if context doesn’t adequately clarify what you mean.
Seeing through heaven’s eyes or not, I see no meaningful difference between the statements “I would like to sleep with that pretty girl” and “worlds in which I sleep with that pretty girl are better than the ones in which I don’t, ceteris paribus.” I agree this is the key difference: yes, I conflate these two meanings[1], and like the term “values” because it allows me to avoid awkward constructions like the latter when describing one’s motivations.
You can be more specific with adjectives if context doesn’t adequately clarify what you mean.
Well, can. Problem is that people on LessWrong actually do use the term (in my opinion) pretty excessively, in contrast to, say, philosophers or psychologists. This is no problem in concrete cases like in your example, but on LessWrong the discussion about “values” is usually abstract. The fact that people could be more specific didn’t so far imply that they are.
My honest opinion that this makes discussion worse and you can do better by distinguishing values as objects that have value and mechanism by which value gets assigned.
I think that, in almost full generality, we should taboo the term “values”. It’s usually ambiguous between a bunch of distinct meanings.
The ideals that, when someone contemplates, invoke strong feelings (of awe, motivation, excitement, exultation, joy, etc.)
The incentives of an agent in a formalized game with quantified payoffs.
A utility function—one’s hypothetical ordering over words, world-trajectories, etc, that results from comparing each pair and evaluating which one is better.
A person’s revealed preferences.
The experiences and activities that a person likes for their own sake.
A person’s vision of an ideal world. (Which, I claim, often reduces to “an imagined world that’s aesthetically appealing.”)
The goals that are at the root of a chain or tree of instrumental goals.
[This often comes with an implicit or explicit implication that most of human behavior has that chain/tree structure, as opposed being, for instance, mostly hardcoded adaptions, or a chain/tree of goals that grounds out in a mess of hardcoded adaptions instead of anything goal-like.]
The goals/narratives that give meaning to someone’s life.
[It can be the case almost all one’s meaning can come through a particular meaning-making schema, but from a broader perspective, a person could have been ~indifferent between multiple schema.
For instance, for some but not most EAs, EA is very central to their personal meaning-making, but they could easily have ended up as a social justice warrior, or a professional Libertarian, instead. And those counterfactual worlds, the other ideology is similarly central to their happiness and meaning-making. I think in such cases, it’s at least somewhat confused if to look at the EA and declare that “maximizing [aggregate/average] utility” is their “terminal value”. That’s papering over the psychological process that adopts ideology or another, which is necessarily more fundamental than the specific chosen ideology/”terminal value”.
It’s kind of like being in love with someone. You might love your wife more than anything, she might be the most important person in your life. But if you admit that it’s possible that if you had been in different communities in your 20s you might have married someone else, then there’s some other goal/process that picks who to marry. So to with ideologies.]
Behaviors and attitudes that signal well regarded qualities.
Core States.
The goals that are sacred to a person, for many possible meanings of sacred.
What a person “really wants” underneath their trauma responses. What they would want, if their trauma was fully healed.
The actions make someone feel most alive and authentically themselves.
The equilibrium of moral philosophy, under arbitrary reflection.
Most of the time when I see the word “values” used on LessWrong, it’s ambiguous between theses (and other) meanings.
A particular ambiguity: sometimes “values” seem to be referring to the first-person experiences that a person likes for their own sake (“spending time near beautiful women is a terminal value for me”), and other times it seems to be referring to a world that a person thinks is awesome, when viewing that world from a god’s eye view. Those are not the same thing, and they do not have remotely the same psychological functions! Among other differences, one is a near-mode evaluation, and the other is a far-mode evaluation.
Worse than that, I think there’s often a conflation of these meanings.
For instance, I often detect a hidden assumption that that the root of someone’s tree of instrumental goals is the same thing as their ranking over possible worlds. I think that conflation is very rarely, if ever, correct: the deep motivations of a person’s actions are not the same thing as the hypothetical world that is evaluated as best in thought experiments, even if the later thing is properly the person’s “utility function”. At least in the vast majority of cases, one’s hypothetical ideal world has almost no motivational power (as a matter of descriptive psychology, not of normative philosophy).
Also (though this is the weakest reason to change our terminology, I think), there’s additional ambiguity to people who are not already involved in the memeplex.
To broader world “values” usually connotes something high-minded or noble: if you do a corporate-training-style exercise to “reflect on your values”, you get things like “integrity” and “compassion”, not things like “sex” or “spite”. In contrast, LessWrongers would usually count sex and spite, not to mention boredom and pain, as part of “human values” and many would also own them as part of their personal values.
I at least partly buy this, but I want to play devil’s advocate.
Let’s suppose there’s a single underlying thing which ~everyone is gesturing at when talking about (humans’) “values”. How could a common underlying notion of “values” be compatible with our observation that people talk about all the very distinct things you listed, when you start asking questions about their “values”?
An analogy: in political science, people talk about “power”. Right up top, wikipedia defines “power” in the political science sense as:
A minute’s thought will probably convince you that this supposed definition does not match the way anybody actually uses the term; for starters, actual usage is narrower. That definition probably doesn’t even match the way the term is used by the person who came up with that definition.
That’s the thing I want to emphasize here: if you ask people to define a term, the definitions they give ~never match their own actual usage of the term, with the important exception of mathematics.
… but that doesn’t imply that there’s no single underlying thing which political scientists are gesturing at when they talk about “power”. It just implies that the political scientists themselves haven’t figured out the True Name of the thing their intuitions are pointed at.
Now back to “values”. It seems pretty plausible to me that people are in fact generally gesturing at the same underlying thing, when they talk about “values”. But people usually have very poor understanding of their own values (a quick check confirms that this applies to arguably-all of the notions of “values” on your list), so it’s not surprising if people end up defining their values in many different incompatible ways which don’t match the underlying common usage very well.
(Example: consider the prototypical deep Christian. They’d probably tell us that their “values” are to follow whatever directives are in the Bible, or some such. But then when actual value-loaded questions come up, they typically find some post-hoc story about how the Bible justifies their preferred value-claim… implying that the source of their value-claims, i.e. “values”, is something other than the Bible. This is totally compatible with deep Christians intuitively meaning the same thing I do when they talk about “values”, it’s just that they don’t reflectively know their actual usage of the term.)
… and if that is the case, then tabooing “values” is exactly the wrong move. The word itself is pointed at the right thing, and it’s all the attempted-definitions which are wrong. Tabooing “values” and replacing it with the definitions people think they’re using would be a step toward less correctness.
I’m kinda confused by this example. Let’s say the person exhibits three behaviors:
(1): They make broad abstract “value claims” like “I follow Biblical values”.
(2): They make narrow specific “value claims” like “It’s wrong to allow immigrants to undermine our communities”.
(3): They do object-level things that can be taken to indicate “values”, like cheating on their spouse
From my perspective, I feel like you’re taking a stand and saying that the real definition of “values” is (2), and is not (1). (Not sure what you think of (3).) But isn’t that adjacent to just declaring that some things on Eli’s list are the real “values” and others are not?
In particular, at some point you have to draw a distinction between values and desires, right? I feel like you’re using the word “value claims” to take that distinction for granted, or something.
(For the record, I have sometimes complained about alignment researchers using the word “values” when they’re actually talking about “desires”.)
I agree that it’s possible to use the suite of disparate intuitions surrounding some word as a kind of anthropological evidence that informs an effort to formalize or understand something-or-other. And that, if you’re doing that, you can’t taboo that word. But that’s not what people are doing with words 99+% of the time. They’re using words to (try to) communicate substantive claims. And in that case you should totally beware of words like “values” that have unusually large clouds of conflicting associations, and liberally taboo or define them.
Relatedly, if a writer uses the word “values” without further specifying what they mean, they’re not just invoking lots of object-level situations that seem to somehow relate to “values”; they’re also invoking any or all of those conflicting definitions of the word “values”, i.e. the things on Eli’s list, the definitions that you’re saying are wrong or misleading.
In the power example, the physics definition (energy over time) and the Alex Turner definition have something to do with each other, but I wouldn’t call them “the same underlying thing”—they can totally come apart, especially out of distribution.
It’s worse than just a blegg/rube thing: I think words can develop into multiple clusters connected by analogies. Like, “leg” is a body part, but also “this story has legs” and “the first leg of the journey” and “the legs of the right triangle”. It seems likely to me that “values” has some amount of that.
I agree. Some interpretations of “values” you didn’t explicitly list, but I think are important:
What someone wants to be true (analogous to what someone believes to be true)
What someone would want to be true if they knew what it would be like if it were true
What someone believes would be good if it were true
These are distinct, because either could clearly differ from the others. So the term “value” is actually ambiguous, not just vague. Talking about “values” is usually unnecessarily unclear, similar to talking about “utilities” in utility theory.
A few of the “distinct meanings” you list are very different from the others, but many of those are pretty similar. “Values” is a pretty broad term, including everything on the “ought” side of the is–ought divide, less “high-minded or noble” preferences, and one’s “ranking over possible worlds”, and that’s fine: it seems like a useful (and coherent!) concept to have a word for. You can be more specific with adjectives if context doesn’t adequately clarify what you mean.
Seeing through heaven’s eyes or not, I see no meaningful difference between the statements “I would like to sleep with that pretty girl” and “worlds in which I sleep with that pretty girl are better than the ones in which I don’t, ceteris paribus.” I agree this is the key difference: yes, I conflate these two meanings[1], and like the term “values” because it allows me to avoid awkward constructions like the latter when describing one’s motivations.
I actually don’t see two different meanings, but for the sake of argument, let’s grant that they exist.
Well, can. Problem is that people on LessWrong actually do use the term (in my opinion) pretty excessively, in contrast to, say, philosophers or psychologists. This is no problem in concrete cases like in your example, but on LessWrong the discussion about “values” is usually abstract. The fact that people could be more specific didn’t so far imply that they are.
My honest opinion that this makes discussion worse and you can do better by distinguishing values as objects that have value and mechanism by which value gets assigned.