A quick glance at what happens when human values get “systematized” and then “optimized super hard for” isn’t immediately encouraging. Thus, here’s Scott Alexander on the difference between the everyday cases (“mediocristan”) on which our morality is trained, and the strange generalizations the resulting moral concepts can imply:
I can’t answer for Yudkowski, but what I think most Utilitarians would say is:
Firstly, it’s not very surprising if humans who never left Mediocrestan have maps that are somewhat inaccurate outside it, so we should explore outside its borders only slowly and carefully (which is the generic solution to Goodhart’s law: apply Bayesianism to theories about what your utility should be function, be aware of the Knightian uncertainty of this, and pessimize over this when optimizing: i.e. tread carefully when stepping out-of-distribution). The extrapolation in Coherent Extrapolated Volition includes learning from experience (which may sometimes be challenging to extrapolate).
Secondly, to the extent that humans have preferences over the map used and thus the maps become elements in the territory, summing over the utility of all humans will apply a lot of averaging. Unsurprisingly, the average position of Richmond, Antioch, Dublin and Warm Springs is roughly at Mount Diablo, which is a fairly sensible-looking extrapolation of where the Bay Bridge is leading to.
We can imagine a view that answers “yes, most humans are paperclippers relative to each other.”
See my post Uploading for exactly that argument. Except I’d say “almost all (including myself)”.
I can’t answer for Yudkowski, but what I think most Utilitarians would say is:
Firstly, it’s not very surprising if humans who never left Mediocrestan have maps that are somewhat inaccurate outside it, so we should explore outside its borders only slowly and carefully (which is the generic solution to Goodhart’s law: apply Bayesianism to theories about what your utility should be function, be aware of the Knightian uncertainty of this, and pessimize over this when optimizing: i.e. tread carefully when stepping out-of-distribution). The extrapolation in Coherent Extrapolated Volition includes learning from experience (which may sometimes be challenging to extrapolate).
Secondly, to the extent that humans have preferences over the map used and thus the maps become elements in the territory, summing over the utility of all humans will apply a lot of averaging. Unsurprisingly, the average position of Richmond, Antioch, Dublin and Warm Springs is roughly at Mount Diablo, which is a fairly sensible-looking extrapolation of where the Bay Bridge is leading to.
See my post Uploading for exactly that argument. Except I’d say “almost all (including myself)”.