My intuition is that preDCa falls short on the “extrapolated” part in “Coherent extrapolated volition”.
PreDCA would extract a utility function from the flawed algorithm implemented by a human brain.
This utility function would be coherent, but might not be extrapolated:
The extrapolated utility function (ie what humans would value if they would be much smarter)
is probably more complicated to formulate than the un-extrapolated utility function.
For example, the policy implemented by an average human brain probably contributes more to total human happiness than most other policies.
Lets say U1 is an utility function that values human happiness as measured by certain chemical states in the brain, and U2 is “extrapolated happiness” (where “putting all humans brains in vat to make it feel happy” would not be good for U2).
Then it is plausible that K(U1)<K(U2). But the policy implemented by an average human brain would do approximately equally well on both utility functions.
Thus, Pr[U1]>Pr[U2].
(Copied partially from here)
My intuition is that preDCa falls short on the “extrapolated” part in “Coherent extrapolated volition”. PreDCA would extract a utility function from the flawed algorithm implemented by a human brain. This utility function would be coherent, but might not be extrapolated: The extrapolated utility function (ie what humans would value if they would be much smarter) is probably more complicated to formulate than the un-extrapolated utility function.
For example, the policy implemented by an average human brain probably contributes more to total human happiness than most other policies. Lets say U1 is an utility function that values human happiness as measured by certain chemical states in the brain, and U2 is “extrapolated happiness” (where “putting all humans brains in vat to make it feel happy” would not be good for U2). Then it is plausible that K(U1)<K(U2). But the policy implemented by an average human brain would do approximately equally well on both utility functions. Thus, Pr[U1]>Pr[U2].