johnswentworth comments on We Don’t Know Our Own Values, but Reward Bridges The Is-Ought Gap

johnswentworth 20 Sep 2024 0:52 UTC
5 points
0
How accurate is the summary I have presented above?
Basically accurate.
Where do values, as opposed to beliefs-about-values, come from?
That is the right next question to ask. Humans have a map of their values, and can update that map in response to rewards in order to “learn about values”, but still leaves the question of when/whether there’s any “real values” which the map represents, and what kind-of-things those “real values” are.
A few parts of an answer:
- “human values” are not one monolithic thing; we value lots of different stuff, and different parts of our value-estimates can separately represent “a real thing” or fail to represent “a real thing”.
- we don’t yet understand what it means for part of our value-estimates to represent “a real thing”, but it probably works pretty similarly to epistemic representation more generally—e.g. my belief about the position of the dog in my apartment represents a real thing (even if the position itself is wrong) exactly when there is in fact a dog in my apartment at all.
- Milan W 20 Sep 2024 3:26 UTC
  3 points
  0
  Parent
  Thank you for the answer. I notice I feel somewhat confused, and that I regard the notion of “real values” with some suspicion I can’t quite put my finger on. Regardless, an attempted definition follows.
  
  Let a subject observation set be a complete specification of a subject and it’s past and current environment, from the subject’s own subjectively accessible perspective. The elements of a subject observation set are observations/experiences observed/experienced by its subject.
  Let O be the set of all subject observation sets.
  Let a subject observation set class be a subset of O such that all it’s elements specify subjects that belong to an intuitive “kind of subject”: e.g. humans, cats, parasitoid wasps.
  
  Let V be the set of all (subject_observation_set, subject_reward_value) tuples. Note that all possible utility functions of all possible subjects can be defined as subsets of V, and that
  V = O x ℝ.
  
  Let “real human values” be the subset of V such that all subject_observation_set elements belong to the human subject observation set class.^[1]
  
  … this above definition feels pretty underwhelming, and I suspect that I would endorse a pretty small subset of “real human values” as defined above as actually good.
  1. ^
    Let the reader feel free take the political decision of restricting the subject observation set class that defines “real human values” to sane humans.