Richard_Ngo comments on Value systematization: how values become coherent (and misaligned)

Richard_Ngo 27 Oct 2023 23:38 UTC
LW: 6 AF: 4
0
AF
I agree that this is closely related to the predictive processing view of the brain. In the post I briefly distinguish between “low-level systematization” and “high-level systematization”; I’d call the thing you’re describing the former. Whereas the latter seems like it might be more complicated, and rely on whatever machinery brains have on top of the predictive coding (e.g. abstract reasoning, etc).
In particular, some humans are way more systematizing than others (even at comparable levels of intelligence). And so just saying “humans are constantly doing this” feels like it’s missing something important. Whatever the thing is that some humans are doing way more of than others, that’s what I’m calling high-level systematizing.
Re self-unalignment: that framing feels a bit too abstract for me; I don’t really know what it would mean, concretely, to be “self-aligned”. I do know what it would mean for a human to systematize their values—but as I argue above, it’s neither desirable to fully systematize them nor to fully conserve them. Identifying whether there’s a “correct” amount of systematization to do feels like it will require a theory of cognition and morality that we don’t yet have.
- Jan_Kulveit 29 Oct 2023 13:18 UTC
  LW: 8 AF: 4
  2
  AF Parent
  My impression is you get a lot of “the later” if you run “the former” on the domain of language and symbolic reasoning, and often the underlying model is still S1-type. E.g.
  rights inherent & inalienable, among which are the preservation of life, & liberty, & the pursuit of happiness
  
  does not sound to me like someone did a ton of abstract reasoning to systematize other abstract values, but more like someone succeeded to write words which resonate with the “the former”.
  
  Also, I’m not sure why do you think the later is more important for the connection to AI. Curent ML seem more similar to “the former”, informal, intuitive, fuzzy reasonining.
  
  Re self-unalignment: that framing feels a bit too abstract for me; I don’t really know what it would mean, concretely, to be “self-aligned”. I do know what it would mean for a human to systematize their values—but as I argue above, it’s neither desirable to fully systematize them nor to fully conserve them.
  That’s interesting—in contrast, I have a pretty clear intuitive sense of a direction where some people have a lot of internal conflict and as a result their actions are less coherent, and some people have less of that.
  
  In contrast I think in case of humans who you would likely describe as ‘having systematized there values’ … I often doubt what’s going on. A lot people who describe themselves as hardcore utilitarians seem to be … actually not that, but more resemble a system where somewhat confused verbal part fights with other parts, which are sometimes suppressed.
  Identifying whether there’s a “correct” amount of systematization to do feels like it will require a theory of cognition and morality that we don’t yet have.
  That’s where I think looking at what human brains are doing seems interesting. Even if you believe the low-level / “the former” is not what’s going with human theories of morality, the technical problem seems very similar and the same math possibly applies
- Nathan Helm-Burger 22 Dec 2023 5:23 UTC
  2 points
  0
  Parent
  I agree with Jan here, and how Jan’s comment connects with Thane’s comment elsewhere in this post’s comments.
  I think that if ‘you’, as in, ‘your conscious, thinking mind’ chooses to write down that your values are X, where X is some simplified abstract rule system much easier to calculate than the underlying ground level details, then ‘you’ are wrong. The abstract representation is a map, not the territory, of your values. The values are still there, unchanged, hiding. When in a situation where the map finds itself in conflict with the territory, ‘you’ might chose to obey the map. But then you’ll probably feel bad, because you’ll have acted against your true hidden values. Pretending that the map is the new truth of your values is just pretending.
- M. Y. Zuo 28 Oct 2023 1:49 UTC
  1 point
  1
  Parent
  There’s an even more fundamental problem in terms of ‘hard to pin down concepts’, namely what counts as a ‘human’ in the first place?