I’m kinda confused by this example. Let’s say the person exhibits three behaviors:
(1): They make broad abstract “value claims” like “I follow Biblical values”.
(2): They make narrow specific “value claims” like “It’s wrong to allow immigrants to undermine our communities”.
(3): They do object-level things that can be taken to indicate “values”, like cheating on their spouse
From my perspective, I feel like you’re taking a stand and saying that the real definition of “values” is (2), and is not (1). (Not sure what you think of (3).) But isn’t that adjacent to just declaring that some things on Eli’s list are the real “values” and others are not?
In particular, at some point you have to draw a distinction between values and desires, right? I feel like you’re using the word “value claims” to take that distinction for granted, or something.
(For the record, I have sometimes complained about alignment researchers using the word “values” when they’re actually talking about “desires”.)
tabooing “values” is exactly the wrong move
I agree that it’s possible to use the suite of disparate intuitions surrounding some word as a kind of anthropological evidence that informs an effort to formalize or understand something-or-other. And that, if you’re doing that, you can’t taboo that word. But that’s not what people are doing with words 99+% of the time. They’re using words to (try to) communicate substantive claims. And in that case you should totally beware of words like “values” that have unusually large clouds of conflicting associations, and liberally taboo or define them.
Relatedly, if a writer uses the word “values” without further specifying what they mean, they’re not just invoking lots of object-level situations that seem to somehow relate to “values”; they’re also invoking any or all of those conflicting definitions of the word “values”, i.e. the things on Eli’s list, the definitions that you’re saying are wrong or misleading.
It seems pretty plausible to me that people are in fact generally gesturing at the same underlying thing, when they talk about “values”.
In the power example, the physics definition (energy over time) and the Alex Turner definition have something to do with each other, but I wouldn’t call them “the same underlying thing”—they can totally come apart, especially out of distribution.
It’s worse than just a blegg/rube thing: I think words can develop into multiple clusters connected by analogies. Like, “leg” is a body part, but also “this story has legs” and “the first leg of the journey” and “the legs of the right triangle”. It seems likely to me that “values” has some amount of that.
I’m kinda confused by this example. Let’s say the person exhibits three behaviors:
(1): They make broad abstract “value claims” like “I follow Biblical values”.
(2): They make narrow specific “value claims” like “It’s wrong to allow immigrants to undermine our communities”.
(3): They do object-level things that can be taken to indicate “values”, like cheating on their spouse
From my perspective, I feel like you’re taking a stand and saying that the real definition of “values” is (2), and is not (1). (Not sure what you think of (3).) But isn’t that adjacent to just declaring that some things on Eli’s list are the real “values” and others are not?
In particular, at some point you have to draw a distinction between values and desires, right? I feel like you’re using the word “value claims” to take that distinction for granted, or something.
(For the record, I have sometimes complained about alignment researchers using the word “values” when they’re actually talking about “desires”.)
I agree that it’s possible to use the suite of disparate intuitions surrounding some word as a kind of anthropological evidence that informs an effort to formalize or understand something-or-other. And that, if you’re doing that, you can’t taboo that word. But that’s not what people are doing with words 99+% of the time. They’re using words to (try to) communicate substantive claims. And in that case you should totally beware of words like “values” that have unusually large clouds of conflicting associations, and liberally taboo or define them.
Relatedly, if a writer uses the word “values” without further specifying what they mean, they’re not just invoking lots of object-level situations that seem to somehow relate to “values”; they’re also invoking any or all of those conflicting definitions of the word “values”, i.e. the things on Eli’s list, the definitions that you’re saying are wrong or misleading.
In the power example, the physics definition (energy over time) and the Alex Turner definition have something to do with each other, but I wouldn’t call them “the same underlying thing”—they can totally come apart, especially out of distribution.
It’s worse than just a blegg/rube thing: I think words can develop into multiple clusters connected by analogies. Like, “leg” is a body part, but also “this story has legs” and “the first leg of the journey” and “the legs of the right triangle”. It seems likely to me that “values” has some amount of that.