At a glance, it seems that upon reflection I might embrace an extrapolation of the model-based system’s preferences as representing “my values,” and I would reject the outputs of the model-free and Pavlovian systems as the outputs of dumb systems that evolved for their computational simplicity, and can be seen as ways of trying to approximate the full power of a model-based system responsible for goal-directed behavior.
At a glance, I might be more comfortable embracing an extrapolation of the combination of the model-based system’s preferences and the Pavlovian system’s preferences.
Admittedly, a first step in extrapolating the Pavlovian system’s preferences might be to represent its various targets as goals in a model, thereby leaving the extrapolator with a single system to extrapolate, but given that 99% of the work takes place after this point I’m not sure how much I care. Much more important is to not lose track of that stuff accidentally.
At a glance, I might be more comfortable embracing an extrapolation of the combination of the model-based system’s preferences and the Pavlovian system’s preferences.
Admittedly, a first step in extrapolating the Pavlovian system’s preferences might be to represent its various targets as goals in a model, thereby leaving the extrapolator with a single system to extrapolate, but given that 99% of the work takes place after this point I’m not sure how much I care. Much more important is to not lose track of that stuff accidentally.