Jan_Kulveit comments on Value systematization: how values become coherent (and misaligned)

Jan_Kulveit Oct 29, 2023, 1:18 PM
LW: 8 AF: 4
2
AF
My impression is you get a lot of “the later” if you run “the former” on the domain of language and symbolic reasoning, and often the underlying model is still S1-type. E.g.
rights inherent & inalienable, among which are the preservation of life, & liberty, & the pursuit of happiness

does not sound to me like someone did a ton of abstract reasoning to systematize other abstract values, but more like someone succeeded to write words which resonate with the “the former”.

Also, I’m not sure why do you think the later is more important for the connection to AI. Curent ML seem more similar to “the former”, informal, intuitive, fuzzy reasonining.

Re self-unalignment: that framing feels a bit too abstract for me; I don’t really know what it would mean, concretely, to be “self-aligned”. I do know what it would mean for a human to systematize their values—but as I argue above, it’s neither desirable to fully systematize them nor to fully conserve them.
That’s interesting—in contrast, I have a pretty clear intuitive sense of a direction where some people have a lot of internal conflict and as a result their actions are less coherent, and some people have less of that.

In contrast I think in case of humans who you would likely describe as ‘having systematized there values’ … I often doubt what’s going on. A lot people who describe themselves as hardcore utilitarians seem to be … actually not that, but more resemble a system where somewhat confused verbal part fights with other parts, which are sometimes suppressed.
Identifying whether there’s a “correct” amount of systematization to do feels like it will require a theory of cognition and morality that we don’t yet have.
That’s where I think looking at what human brains are doing seems interesting. Even if you believe the low-level / “the former” is not what’s going with human theories of morality, the technical problem seems very similar and the same math possibly applies