harfe comments on [missing post]

harfe 25 Sep 2022 17:58 UTC
1 point
0
(Copied partially from here)

My intuition is that preDCa falls short on the “extrapolated” part in “Coherent extrapolated volition”. PreDCA would extract a utility function from the flawed algorithm implemented by a human brain. This utility function would be coherent, but might not be extrapolated: The extrapolated utility function (ie what humans would value if they would be much smarter) is probably more complicated to formulate than the un-extrapolated utility function.

For example, the policy implemented by an average human brain probably contributes more to total human happiness than most other policies. Lets say $U_{1}$ is an utility function that values human happiness as measured by certain chemical states in the brain, and $U_{2}$ is “extrapolated happiness” (where “putting all humans brains in vat to make it feel happy” would not be good for $U_{2}$ ). Then it is plausible that $K (U_{1}) < K (U_{2})$ . But the policy implemented by an average human brain would do approximately equally well on both utility functions. Thus, $P r [U_{1}] > P r [U_{2}]$ .