How accurate is the summary I have presented above?
Basically accurate.
Where do values, as opposed to beliefs-about-values, come from?
That is the right next question to ask. Humans have a map of their values, and can update that map in response to rewards in order to “learn about values”, but still leaves the question of when/whether there’s any “real values” which the map represents, and what kind-of-things those “real values” are.
A few parts of an answer:
“human values” are not one monolithic thing; we value lots of different stuff, and different parts of our value-estimates can separately represent “a real thing” or fail to represent “a real thing”.
we don’t yet understand what it means for part of our value-estimates to represent “a real thing”, but it probably works pretty similarly to epistemic representation more generally—e.g. my belief about the position of the dog in my apartment represents a real thing (even if the position itself is wrong) exactly when there is in fact a dog in my apartment at all.
Thank you for the answer. I notice I feel somewhat confused, and that I regard the notion of “real values” with some suspicion I can’t quite put my finger on. Regardless, an attempted definition follows.
Let a subject observation set be a complete specification of a subject and it’s past and current environment, from the subject’s own subjectively accessible perspective. The elements of a subject observation set are observations/experiences observed/experienced by its subject.
Let O be the set of all subject observation sets.
Let a subject observation set class be a subset of O such that all it’s elements specify subjects that belong to an intuitive “kind of subject”: e.g. humans, cats, parasitoid wasps.
Let V be the set of all (subject_observation_set, subject_reward_value) tuples. Note that all possible utility functions of all possible subjects can be defined as subsets of V, and that V = O x ℝ.
Let “real human values” be the subset of V such that all subject_observation_set elements belong to the human subject observation set class.[1]
… this above definition feels pretty underwhelming, and I suspect that I would endorse a pretty small subset of “real human values” as defined above as actually good.
Basically accurate.
That is the right next question to ask. Humans have a map of their values, and can update that map in response to rewards in order to “learn about values”, but still leaves the question of when/whether there’s any “real values” which the map represents, and what kind-of-things those “real values” are.
A few parts of an answer:
“human values” are not one monolithic thing; we value lots of different stuff, and different parts of our value-estimates can separately represent “a real thing” or fail to represent “a real thing”.
we don’t yet understand what it means for part of our value-estimates to represent “a real thing”, but it probably works pretty similarly to epistemic representation more generally—e.g. my belief about the position of the dog in my apartment represents a real thing (even if the position itself is wrong) exactly when there is in fact a dog in my apartment at all.
Thank you for the answer. I notice I feel somewhat confused, and that I regard the notion of “real values” with some suspicion I can’t quite put my finger on. Regardless, an attempted definition follows.
Let a subject observation set be a complete specification of a subject and it’s past and current environment, from the subject’s own subjectively accessible perspective. The elements of a subject observation set are observations/experiences observed/experienced by its subject.
Let O be the set of all subject observation sets.
Let a subject observation set class be a subset of O such that all it’s elements specify subjects that belong to an intuitive “kind of subject”: e.g. humans, cats, parasitoid wasps.
Let V be the set of all (subject_observation_set, subject_reward_value) tuples. Note that all possible utility functions of all possible subjects can be defined as subsets of V, and that
V = O x ℝ.
Let “real human values” be the subset of V such that all subject_observation_set elements belong to the human subject observation set class.[1]
… this above definition feels pretty underwhelming, and I suspect that I would endorse a pretty small subset of “real human values” as defined above as actually good.
Let the reader feel free take the political decision of restricting the subject observation set class that defines “real human values” to sane humans.