From the AI “engineering” perspective, values/valued states are “rewards” that the agent adds themselves in order to train (in RL style) their reasoning/planning network (i.e., generative model) to produce behaviours that are adaptive but also that they like and find interesting (aesthetics). This RL-style training happens during conscious reflection.
Under this perspective, but also more generally, you cannot distinguish between intrinsic and instrumental values because intrinsic values are instrumental to each other, but also because there is nothing “intrinsic” about self-assigned reward labels. In the end, what matters is the generative model that is able to produce highly adaptive (and, ideally, interesting/beautiful) behaviours in a certain range of circumstances.
I think you confusion about the ontological status of values is further corroborated by this phrase for the post: “people are mostly guided by forces other than their intrinsic values [habits, pleasure, cultural norms]”. Values are not forces, but rather inferences about some features or one’s own generative model (that help to “train” this very model in “simulated runs”, i.e., conscious analysis of plans and reflections). However, the generative model itself is effectively the product of environmental influences, development, culture, physiology (pleasure, pain), etc. Thus, ultimately, values are not somehow distinct from all these “forces”, but are indirectly (through the generative model) derived from these forces.
Under the perspective described above, valuism appears to switch the ultimate objective (“good” behaviour) for “optimisation of metrics” (values). Thus, there is a risk of Goodharting.
You may say that I suggest an infinite regress, because how “good behaviour” is determined, other than through “values”? Well, as I explained above, it couldn’t be through “values”, because values are our own creation within our own ontological/semiotic “map”. Instead, there could be the following guides to “good behaviour”:
Good old adaptivity (survival) [roughly corresponds to so-called “intrinsic value” in expected free energy functional, under Active Inference]
Natural ethics, if exists (see the discussion here: https://www.lesswrong.com/posts/3BPuuNDavJ2drKvGK/scientism-vs-people#The_role_of_philosophy_in_human_activity). If “truly” scale-free ethics couldn’t be derived from basic physics alone, there is still evolutionary/game-theoreric/social/group stage on which we can look for an “optimal” ethics arrangement of agent’s behaviour (and, therefore, values that should help to train these behaviours), whose “optimality”, in turn, is derived either from adaptivity or aesthetics on the higher system level (i.e., group level).
Aesthetics and interestingness: there are objective, information-theoretic ways to measure these, see Schmidhuber’s works. Also, this roughly corresponds to “epistemic value” in expected free energy functional under Active Inference.
If the “ultimate” objective is the physical behaviour itself (happening in the real world), not abstract “values” (which appear only in agent’s mind), I think Valuism could be cast as any philosophy that emphasises creation of a “good life” and “right action”, such as Stoicism, plus some extra emphasis on reflection and meta-awareness, albeit I think Stoicism already puts significant emphasis on these.
“From the AI “engineering” perspective, values/valued states are “rewards” that the agent adds themselves in order to train (in RL style) their reasoning/planning network (i.e., generative model) to produce behaviours that are adaptive but also that they like and find interesting (aesthetics). This RL-style training happens during conscious reflection.”
is just something different than what I’m talking about in my post when I use the phrase “intrinsic values.”
From what I can tell, you seem to be arguing:
[paraphrasing] “In this one line of work, we define values this way”, and then jumping from there to “therefore, you are misunderstanding values,” when actually I think you’re just using the phrase to mean something different than I’m using it to mean.
From the AI “engineering” perspective, values/valued states are “rewards” that the agent adds themselves in order to train (in RL style) their reasoning/planning network (i.e., generative model) to produce behaviours that are adaptive but also that they like and find interesting (aesthetics). This RL-style training happens during conscious reflection.
Under this perspective, but also more generally, you cannot distinguish between intrinsic and instrumental values because intrinsic values are instrumental to each other, but also because there is nothing “intrinsic” about self-assigned reward labels. In the end, what matters is the generative model that is able to produce highly adaptive (and, ideally, interesting/beautiful) behaviours in a certain range of circumstances.
I think you confusion about the ontological status of values is further corroborated by this phrase for the post: “people are mostly guided by forces other than their intrinsic values [habits, pleasure, cultural norms]”. Values are not forces, but rather inferences about some features or one’s own generative model (that help to “train” this very model in “simulated runs”, i.e., conscious analysis of plans and reflections). However, the generative model itself is effectively the product of environmental influences, development, culture, physiology (pleasure, pain), etc. Thus, ultimately, values are not somehow distinct from all these “forces”, but are indirectly (through the generative model) derived from these forces.
Under the perspective described above, valuism appears to switch the ultimate objective (“good” behaviour) for “optimisation of metrics” (values). Thus, there is a risk of Goodharting.
You may say that I suggest an infinite regress, because how “good behaviour” is determined, other than through “values”? Well, as I explained above, it couldn’t be through “values”, because values are our own creation within our own ontological/semiotic “map”. Instead, there could be the following guides to “good behaviour”:
Good old adaptivity (survival) [roughly corresponds to so-called “intrinsic value” in expected free energy functional, under Active Inference]
Natural ethics, if exists (see the discussion here: https://www.lesswrong.com/posts/3BPuuNDavJ2drKvGK/scientism-vs-people#The_role_of_philosophy_in_human_activity). If “truly” scale-free ethics couldn’t be derived from basic physics alone, there is still evolutionary/game-theoreric/social/group stage on which we can look for an “optimal” ethics arrangement of agent’s behaviour (and, therefore, values that should help to train these behaviours), whose “optimality”, in turn, is derived either from adaptivity or aesthetics on the higher system level (i.e., group level).
Aesthetics and interestingness: there are objective, information-theoretic ways to measure these, see Schmidhuber’s works. Also, this roughly corresponds to “epistemic value” in expected free energy functional under Active Inference.
If the “ultimate” objective is the physical behaviour itself (happening in the real world), not abstract “values” (which appear only in agent’s mind), I think Valuism could be cast as any philosophy that emphasises creation of a “good life” and “right action”, such as Stoicism, plus some extra emphasis on reflection and meta-awareness, albeit I think Stoicism already puts significant emphasis on these.
The way you define values in your comment:
“From the AI “engineering” perspective, values/valued states are “rewards” that the agent adds themselves in order to train (in RL style) their reasoning/planning network (i.e., generative model) to produce behaviours that are adaptive but also that they like and find interesting (aesthetics). This RL-style training happens during conscious reflection.”
is just something different than what I’m talking about in my post when I use the phrase “intrinsic values.”
From what I can tell, you seem to be arguing:
[paraphrasing] “In this one line of work, we define values this way”, and then jumping from there to “therefore, you are misunderstanding values,” when actually I think you’re just using the phrase to mean something different than I’m using it to mean.
Reply