Disclaimer: I didn’t read this post past TLDR, but will try to say something that Beren may find valuable.
It seems important to distinguish between multiple things that are called values:
Objects implemented in the heads of concrete people, leveraging the intelligence makeup and “mind tools” (aka psycho-technologies”), such as language. Beren seems to discuss these values in the post.
More abstract, evolutionarily and game-theoretically informed patterns of behaviour. These are discussed by Oliver Scott Curry and to a degree, Jon Haidt. These are present not only in humans but in non-linguistic animals. These values (sometimes “value bases”) are the basis to more specific, linguistic objects in point 1.
Very abstract, “ideal” values, or values “as they should be”, as discussed by academic philosophers, ethicists and axiologists. These values could be divorced from the “evolutionary, game-theoretic” values and “linguistic values”; it’s a statement of meta-ethics whether these abstract values should or should not coincide with or be grounded by these more “concrete” values, emerged in the particular systems that we (or animals) find ourselves in, such as the systems fundamentally defined by scarcity and/or fundamentally informed by our (human) emotional and intelligence make-up.
When people talk about “aligning AI with human values”, the phrase seems misguided to me because when systems change (in fact, it will probably change radically and rapidly exactly because of the advent of AI), “linguistic” and “evolutionary” values may also change. So we should directly think about aligning AIs with “abstract” values.
If I understand Beren’s model correctly, [2] falls within the innate, hardcoded reward circuits in the hypothalamus and the brainstem.
Beren touches on [3] discussing them as attempted generalizations/extrapolations of [1] into a coherent framework, such as consequences, utility, adherence to some set of explicitly stated virtues, etc.
I would distinct between 1), the “actual encoding”, and 2), the “morality that is implied by the rules of the game”, to which the “encoded” stuff would converge in some number of tens or hundreds of thousands of years through biological evolution, if the rules of the game didn’t change recently on the timescales that are orders of magnitude shorter :)
Disclaimer: I didn’t read this post past TLDR, but will try to say something that Beren may find valuable.
It seems important to distinguish between multiple things that are called values:
Objects implemented in the heads of concrete people, leveraging the intelligence makeup and “mind tools” (aka psycho-technologies”), such as language. Beren seems to discuss these values in the post.
More abstract, evolutionarily and game-theoretically informed patterns of behaviour. These are discussed by Oliver Scott Curry and to a degree, Jon Haidt. These are present not only in humans but in non-linguistic animals. These values (sometimes “value bases”) are the basis to more specific, linguistic objects in point 1.
Very abstract, “ideal” values, or values “as they should be”, as discussed by academic philosophers, ethicists and axiologists. These values could be divorced from the “evolutionary, game-theoretic” values and “linguistic values”; it’s a statement of meta-ethics whether these abstract values should or should not coincide with or be grounded by these more “concrete” values, emerged in the particular systems that we (or animals) find ourselves in, such as the systems fundamentally defined by scarcity and/or fundamentally informed by our (human) emotional and intelligence make-up.
When people talk about “aligning AI with human values”, the phrase seems misguided to me because when systems change (in fact, it will probably change radically and rapidly exactly because of the advent of AI), “linguistic” and “evolutionary” values may also change. So we should directly think about aligning AIs with “abstract” values.
If I understand Beren’s model correctly, [2] falls within the innate, hardcoded reward circuits in the hypothalamus and the brainstem.
Beren touches on [3] discussing them as attempted generalizations/extrapolations of [1] into a coherent framework, such as consequences, utility, adherence to some set of explicitly stated virtues, etc.
I would distinct between 1), the “actual encoding”, and 2), the “morality that is implied by the rules of the game”, to which the “encoded” stuff would converge in some number of tens or hundreds of thousands of years through biological evolution, if the rules of the game didn’t change recently on the timescales that are orders of magnitude shorter :)
Ok, so what you mean by 2) is something like reflective game-theoretic equilibrium for that particular environment (including social environment)?
Yes