Milan W comments on We Don’t Know Our Own Values, but Reward Bridges The Is-Ought Gap

Milan W 19 Sep 2024 22:59 UTC
3 points
0
Note that, in treating these sentiments as evidence that we don’t know our own values, we’re using stated values as a proxy measure for values. When we talk about a human’s “values”, we are notably not talking about:
- The human’s stated preferences
- The human’s revealed preferences
- The human’s in-the-moment experience of bliss or dopamine or whatever
- <whatever other readily-measurable notion of “values” springs to mind>
The thing we’re talking about, when we talk about a human’s “values”, is a thing internal to the human’s mind. It’s a high-level cognitive structure.
(...)
But clearly the reward signal is not itself our values.
(...)
reward is the evidence from which we learn about our values.
So we humans have a high-level cognitive structure to which we do not have direct access (values), but about which we can learn by observing and reflecting on the stimulus-reward mappings we experience, thus constructing an internal representation of such structure. This reward-based updating bridges the is-ought gap, since reward is a thing we experience and our values encode the way things ought to be.

Two questions:
- How accurate is the summary I have presented above?
- Where do values, as opposed to beliefs-about-values, come from?
- johnswentworth 20 Sep 2024 0:52 UTC
  5 points
  0
  Parent
  How accurate is the summary I have presented above?
  Basically accurate.
  Where do values, as opposed to beliefs-about-values, come from?
  That is the right next question to ask. Humans have a map of their values, and can update that map in response to rewards in order to “learn about values”, but still leaves the question of when/whether there’s any “real values” which the map represents, and what kind-of-things those “real values” are.
  A few parts of an answer:
  - “human values” are not one monolithic thing; we value lots of different stuff, and different parts of our value-estimates can separately represent “a real thing” or fail to represent “a real thing”.
  - we don’t yet understand what it means for part of our value-estimates to represent “a real thing”, but it probably works pretty similarly to epistemic representation more generally—e.g. my belief about the position of the dog in my apartment represents a real thing (even if the position itself is wrong) exactly when there is in fact a dog in my apartment at all.
  - Milan W 20 Sep 2024 3:26 UTC
    3 points
    0
    Parent
    Thank you for the answer. I notice I feel somewhat confused, and that I regard the notion of “real values” with some suspicion I can’t quite put my finger on. Regardless, an attempted definition follows.
    
    Let a subject observation set be a complete specification of a subject and it’s past and current environment, from the subject’s own subjectively accessible perspective. The elements of a subject observation set are observations/experiences observed/experienced by its subject.
    Let O be the set of all subject observation sets.
    Let a subject observation set class be a subset of O such that all it’s elements specify subjects that belong to an intuitive “kind of subject”: e.g. humans, cats, parasitoid wasps.
    
    Let V be the set of all (subject_observation_set, subject_reward_value) tuples. Note that all possible utility functions of all possible subjects can be defined as subsets of V, and that
    V = O x ℝ.
    
    Let “real human values” be the subset of V such that all subject_observation_set elements belong to the human subject observation set class.^[1]
    
    … this above definition feels pretty underwhelming, and I suspect that I would endorse a pretty small subset of “real human values” as defined above as actually good.
    ^
    Let the reader feel free take the political decision of restricting the subject observation set class that defines “real human values” to sane humans.