Note that, in treating these sentiments as evidence that we don’t know our own values, we’re using stated values as a proxy measure for values. When we talk about a human’s “values”, we are notably not talking about:
The human’s stated preferences
The human’s revealed preferences
The human’s in-the-moment experience of bliss or dopamine or whatever
<whatever other readily-measurable notion of “values” springs to mind>
The thing we’re talking about, when we talk about a human’s “values”, is a thing internal to the human’s mind. It’s a high-level cognitive structure. (...) But clearly the reward signal is not itself our values. (...) reward is the evidence from which we learn about our values.
So we humans have a high-level cognitive structure to which we do not have direct access (values), but about which we can learn by observing and reflecting on the stimulus-reward mappings we experience, thus constructing an internal representation of such structure. This reward-based updating bridges the is-ought gap, since reward is a thing we experience and our values encode the way things ought to be.
Two questions:
How accurate is the summary I have presented above?
Where do values, as opposed to beliefs-about-values, come from?
How accurate is the summary I have presented above?
Basically accurate.
Where do values, as opposed to beliefs-about-values, come from?
That is the right next question to ask. Humans have a map of their values, and can update that map in response to rewards in order to “learn about values”, but still leaves the question of when/whether there’s any “real values” which the map represents, and what kind-of-things those “real values” are.
A few parts of an answer:
“human values” are not one monolithic thing; we value lots of different stuff, and different parts of our value-estimates can separately represent “a real thing” or fail to represent “a real thing”.
we don’t yet understand what it means for part of our value-estimates to represent “a real thing”, but it probably works pretty similarly to epistemic representation more generally—e.g. my belief about the position of the dog in my apartment represents a real thing (even if the position itself is wrong) exactly when there is in fact a dog in my apartment at all.
Thank you for the answer. I notice I feel somewhat confused, and that I regard the notion of “real values” with some suspicion I can’t quite put my finger on. Regardless, an attempted definition follows.
Let a subject observation set be a complete specification of a subject and it’s past and current environment, from the subject’s own subjectively accessible perspective. The elements of a subject observation set are observations/experiences observed/experienced by its subject.
Let O be the set of all subject observation sets.
Let a subject observation set class be a subset of O such that all it’s elements specify subjects that belong to an intuitive “kind of subject”: e.g. humans, cats, parasitoid wasps.
Let V be the set of all (subject_observation_set, subject_reward_value) tuples. Note that all possible utility functions of all possible subjects can be defined as subsets of V, and that V = O x ℝ.
Let “real human values” be the subset of V such that all subject_observation_set elements belong to the human subject observation set class.[1]
… this above definition feels pretty underwhelming, and I suspect that I would endorse a pretty small subset of “real human values” as defined above as actually good.
So we humans have a high-level cognitive structure to which we do not have direct access (values), but about which we can learn by observing and reflecting on the stimulus-reward mappings we experience, thus constructing an internal representation of such structure. This reward-based updating bridges the is-ought gap, since reward is a thing we experience and our values encode the way things ought to be.
Two questions:
How accurate is the summary I have presented above?
Where do values, as opposed to beliefs-about-values, come from?
Basically accurate.
That is the right next question to ask. Humans have a map of their values, and can update that map in response to rewards in order to “learn about values”, but still leaves the question of when/whether there’s any “real values” which the map represents, and what kind-of-things those “real values” are.
A few parts of an answer:
“human values” are not one monolithic thing; we value lots of different stuff, and different parts of our value-estimates can separately represent “a real thing” or fail to represent “a real thing”.
we don’t yet understand what it means for part of our value-estimates to represent “a real thing”, but it probably works pretty similarly to epistemic representation more generally—e.g. my belief about the position of the dog in my apartment represents a real thing (even if the position itself is wrong) exactly when there is in fact a dog in my apartment at all.
Thank you for the answer. I notice I feel somewhat confused, and that I regard the notion of “real values” with some suspicion I can’t quite put my finger on. Regardless, an attempted definition follows.
Let a subject observation set be a complete specification of a subject and it’s past and current environment, from the subject’s own subjectively accessible perspective. The elements of a subject observation set are observations/experiences observed/experienced by its subject.
Let O be the set of all subject observation sets.
Let a subject observation set class be a subset of O such that all it’s elements specify subjects that belong to an intuitive “kind of subject”: e.g. humans, cats, parasitoid wasps.
Let V be the set of all (subject_observation_set, subject_reward_value) tuples. Note that all possible utility functions of all possible subjects can be defined as subsets of V, and that
V = O x ℝ.
Let “real human values” be the subset of V such that all subject_observation_set elements belong to the human subject observation set class.[1]
… this above definition feels pretty underwhelming, and I suspect that I would endorse a pretty small subset of “real human values” as defined above as actually good.
Let the reader feel free take the political decision of restricting the subject observation set class that defines “real human values” to sane humans.