“…human preferences/values/needs/desires/goals/etc. is a necessary but not sufficient condition for achieving alignment.”
I have to agree with you in this regard and most of your other points. My concern however is that Stuart’s communications give the impression that the preferences approach addresses the problem of AI learning things we consider bad when in fact it doesn’t.
The model of AI learning our preferences by observing our behavior and then proceeding with uncertainty makes sense to me. However just as Asimov’s robot characters eventually decide there is a fourth rule that overrides the other three Stuart’s “Three Principles” model seems incomplete. Preferences do not appear to me, in themselves, to deal with the issue of evil.
“…human preferences/values/needs/desires/goals/etc. is a necessary but not sufficient condition for achieving alignment.”
I have to agree with you in this regard and most of your other points. My concern however is that Stuart’s communications give the impression that the preferences approach addresses the problem of AI learning things we consider bad when in fact it doesn’t.
The model of AI learning our preferences by observing our behavior and then proceeding with uncertainty makes sense to me. However just as Asimov’s robot characters eventually decide there is a fourth rule that overrides the other three Stuart’s “Three Principles” model seems incomplete. Preferences do not appear to me, in themselves, to deal with the issue of evil.