Edit: There are actually many ambiguities with the use of these words. This post is about one specific ambiguity that I think is often overlooked or forgotten.
The word “preference” is overloaded (and so are related words like “want”). It can refer to one of two things:
How you want the world to be i.e. your terminal values e.g. “I prefer worlds in which people don’t needlessly suffer.”
What makes you happy e.g. “I prefer my ice cream in a waffle cone”
I’m not sure how we should distinguish these. So far, my best idea is to call the former “global preferences” and the latter “local preferences”, but that clashes with the pre-existing notion of locality of preferences as the quality of terminally caring more about people/objects closer to you in spacetime. Does anyone have a better name for this distinction?
I think we definitely need to distinguish them, however, because they often disagree, and most “values disagreements” between people are just disagreements in local preferences, and so could be resolved by considering global preferences.
I may write a longpost at some point on the nuances of local/global preference aggregation.
Example: Two alignment researchers, Alice and Bob, both want access to a limited supply of compute. The rest of this example is left as an exercise.
I think you are missing even more confusing meaning: preference means what you actually choose.
In VNM axioms “agent prefers A to B” literally means “agent chooses A over B”. It’s confusing, because when we talk about human preferences we usually mean mental states, not their behavioral expressions.
This is indeed a meaningful distinction! I’d phrase it as:
Values about what the entire cosmos should be like
Values about what kind of places one wants one’s (future) selves to inhabit (eg, in an internet-like upload-utopia, “what servers does one want to hang out on”)
“Global” and “local” is not the worst nomenclature. Maybe “global” vs “personal” values? I dunno.
my best idea is to call the former “global preferences” and the latter “local preferences”, but that clashes with the pre-existing notion of locality of preferences as the quality of terminally caring more about people/objects closer to you in spacetime
I mean, it’s not unrelated! One can view a utility function with both kinds of values as a combination of two utility functions: the part that only cares about the state of the entire cosmos and the part that only cares about what’s around them (see also “locally-caring agents”).
(One might be tempted to say “consequentialist” vs “experiential”, but I don’t think that’s right — one can still value contact with reality in their personal/local values.)
There are lots of different dimensions on which these vary. I’d note that one is purely imaginary (no human has actually experienced anything like that) while the second is prediction strongly based on past experience. One is far-mode (non-specific in experience, scope, or timeframe) and the other near-mode (specific, steps to achieve well-understood).
Does using the word “values” not sufficiently distinguish from “preferences” for you?
The second type of preference seems to apply to anticipated perceptions of the world by the agent—such as the anticipated perception of eating ice cream in a waffle cone. It doesn’t have to be so immediately direct, since it could also apply to instrumental goals such as doing something unpleasant now for expected improved experiences later.
The first seems to be a more like a “principle” than a preference, in that the agent is judging outcomes on the principle of whether needless suffering exists in it, regardless of whether that suffering has any effect on the agent at all.
To distinguish them, we could imagine a thought experiment in which such a person could choose to accept or deny some ongoing benefit for themselves that causes needless suffering on some distant world, and they will have their memory of the decision and any psychological consequences of it immediately negated regardless of which they chose.
It’s even worse than that. Maybe I would be happier with my ice cream in a waffle cone the next time I have ice cream, but actually this is just a specific expression of being happier eating a variety of tasty things over time and it’s just that I haven’t had ice cream in a waffle cone for a while. The time after that, I will likely “prefer” something else despite my underlying preferences not having changed. Or something even more complex and interrelated with various parts of history and internal state.
It may be better to distinguish instances of “preferences” that are specific to a given internal state and history, and an agent’s general mapping over all internal states and histories.
Edit: There are actually many ambiguities with the use of these words. This post is about one specific ambiguity that I think is often overlooked or forgotten.
The word “preference” is overloaded (and so are related words like “want”). It can refer to one of two things:
How you want the world to be i.e. your terminal values e.g. “I prefer worlds in which people don’t needlessly suffer.”
What makes you happy e.g. “I prefer my ice cream in a waffle cone”
I’m not sure how we should distinguish these. So far, my best idea is to call the former “global preferences” and the latter “local preferences”, but that clashes with the pre-existing notion of locality of preferences as the quality of terminally caring more about people/objects closer to you in spacetime. Does anyone have a better name for this distinction?
I think we definitely need to distinguish them, however, because they often disagree, and most “values disagreements” between people are just disagreements in local preferences, and so could be resolved by considering global preferences.
I may write a longpost at some point on the nuances of local/global preference aggregation.
Example: Two alignment researchers, Alice and Bob, both want access to a limited supply of compute. The rest of this example is left as an exercise.
I think you are missing even more confusing meaning: preference means what you actually choose.
In VNM axioms “agent prefers A to B” literally means “agent chooses A over B”. It’s confusing, because when we talk about human preferences we usually mean mental states, not their behavioral expressions.
This is indeed a meaningful distinction! I’d phrase it as:
Values about what the entire cosmos should be like
Values about what kind of places one wants one’s (future) selves to inhabit (eg, in an internet-like upload-utopia, “what servers does one want to hang out on”)
“Global” and “local” is not the worst nomenclature. Maybe “global” vs “personal” values? I dunno.
I mean, it’s not unrelated! One can view a utility function with both kinds of values as a combination of two utility functions: the part that only cares about the state of the entire cosmos and the part that only cares about what’s around them (see also “locally-caring agents”).
(One might be tempted to say “consequentialist” vs “experiential”, but I don’t think that’s right — one can still value contact with reality in their personal/local values.)
There are lots of different dimensions on which these vary. I’d note that one is purely imaginary (no human has actually experienced anything like that) while the second is prediction strongly based on past experience. One is far-mode (non-specific in experience, scope, or timeframe) and the other near-mode (specific, steps to achieve well-understood).
Does using the word “values” not sufficiently distinguish from “preferences” for you?
The second type of preference seems to apply to anticipated perceptions of the world by the agent—such as the anticipated perception of eating ice cream in a waffle cone. It doesn’t have to be so immediately direct, since it could also apply to instrumental goals such as doing something unpleasant now for expected improved experiences later.
The first seems to be a more like a “principle” than a preference, in that the agent is judging outcomes on the principle of whether needless suffering exists in it, regardless of whether that suffering has any effect on the agent at all.
To distinguish them, we could imagine a thought experiment in which such a person could choose to accept or deny some ongoing benefit for themselves that causes needless suffering on some distant world, and they will have their memory of the decision and any psychological consequences of it immediately negated regardless of which they chose.
It’s even worse than that. Maybe I would be happier with my ice cream in a waffle cone the next time I have ice cream, but actually this is just a specific expression of being happier eating a variety of tasty things over time and it’s just that I haven’t had ice cream in a waffle cone for a while. The time after that, I will likely “prefer” something else despite my underlying preferences not having changed. Or something even more complex and interrelated with various parts of history and internal state.
It may be better to distinguish instances of “preferences” that are specific to a given internal state and history, and an agent’s general mapping over all internal states and histories.