Defining the ways human values are messy
In many of my posts, I’ve been using phrases like “human values are contradictory, underdefined, changeable, and manipulable”. I also tend to slide between calling things preferences, values, morals, rewards, and utilities. This post will clarify some of this terminology.
I say that human values are contradictory, when humans have firm and strong opinions that are in conflict. For instance, a respect for human rights versus desires to reduced harm, when those two come in conflict (more broadly, deontology versus utilitarian conflicts). Or enjoying food (or wanting to be someone who enjoys food) versus wanting to get thin (or wanting to be the someone who gets thin). Or family loyalty versus more universal values.
I say that human values are underdefined, when humans don’t have a strong opinion on something, and where their opinion can be very different depending on how the something is phrased. This includes how the issue is framed (saving versus dying), or how people interpret moral choices (such as abortion or international press freedom) depending on what category they put that choice in. New technologies often open up new areas where old values don’t apply, forcing people to define new values in the space (often by analogy to old values).
Notice that there is no clear distinction between contradictory and underdefined: as the values in conflict or potential conflict get firmer, this moves from underdefined to contradictory.
I say that human values are changeable, because of the way that values shift, often in predictable ways, depending on such things as social pressure, tribalism, changes in life-roles or positions, or new information (fictional as well as factual information). I suspect that most of these shifts are undetectable to the subject, just as most belief changes are.
I say that human values are manipulable, in that capable humans and potentially advanced AI, can use the vulnerabilities of human cognition to push values in a particular direction. This is a subset of changeable, but with a different emphasis.
Rewards/values/preferences...
At the object level, I see values, preferences, and morals as the same thing. All express the fact that a certain state of the world or a certain course of action, is better than another one.
At the meta level, humans tend to distinguish between them, seeing values and morals as fundamental, wrapped up with identity, and universalisable, and preferences as more personal and contingent. Since I’ll be dealing with preferences and meta-preferences, however, I don’t have a need to distinguish between the concepts, letting the meta-preferences do that automatically.
Finally, reward functions and utility functions rank outcomes in a similar way to how preferences do, so I’ll generally slip between the three unless the difference is relevant (reward functions and utility functions are a total order, preferences need not be; rewards are generally defined over observations, utilities over world-states...).
Finally, there’s the issue of hedonism, the fact that human pleasure and enjoyment don’t match up perfectly with preferences. I’ll generally be treating enjoyment the same as preferences (in that certain world states have higher enjoyment that others) with meta-preferences distinguishing them from standard preferences, and choosing the extent to endorse hedonism.
- Introduction to Reducing Goodhart by 26 Aug 2021 18:38 UTC; 48 points) (
- Resolving human values, completely and adequately by 30 Mar 2018 3:35 UTC; 32 points) (
- A possible preference algorithm by 8 Apr 2021 18:25 UTC; 22 points) (
- 1 Apr 2018 0:24 UTC; 2 points) 's comment on Resolving human values, completely and adequately by (
Scott A. wrote an article about axiology, morality, and law that distinguishes between some of these concepts. For instance, after reading it I now see a qualitative distinction between morality and preference. (Which isn’t to say that no one ever uses those words in ambiguous ways; rather that I see two clearly distinct concepts that largely agree with the way the words are typically used.)
“Contradictory” could also be called “decomposable”. I.e. my utility for food can be decomposed into a balancing act between a sub-utility that likes to eat and a sub-utility that wants to be thin. I’d say this sort of decomposition is the opposite of messiness.
“Changeable” could be called “context-dependent”. I.e. when your situation changes, so do your preferences. This too doesn’t suggest any sort of “messiness” to me.
The issue of “underdefined” and “manipulable” is more complex, but I don’t believe that “messy values” is the right diagnosis. I propose that a big factor here is a human inability to predict and evaluate consequences. If I could move between the alternate reality where I chose A, and the alternate reality where I chose B, live in each for some time, my preference of A>B would become much more certain and less susceptible to suggestion and manipulation. The apparent messiness of the values comes from the usual messiness of real computations and the strategies for coping with our limited computational power, made worse by the habit of reporting deterministic choices instead of probability distributions.
Changeable is more than context dependent—it’s how people can change their values, in response to learnt experience or social pressure.
Changing values in response to experience is also reasonable. E.g. I used to value food more than my weight, but then I got fat, experienced health problems, and decided that I should value weight more. This is not some human nonsense, this is what a perfect Bayesian would do.
Social pressure surely falls under “underdefined and manipulable”, so I don’t have anything to add to my previous comment.
Though values like pure freedom, play, spontaneity, improvisation, lack of inhibition, etc. can lead one to value “irrational” or “transrational” behaviors… doing things (whether art, social behaviors, etc.) for the fun, thrill, freedom of just doing it. Letting pure “BEing” be the final “cause” to your “because.” Crypto Expert
Yep, there are the kind of values which I feel are lost in overly elegant systems. https://www.lesswrong.com/posts/Y2LhX3925RodndwpC/resolving-human-values-completely-and-adequately