I think for humans most mentally accessible preferences are instrumental, and the right analogy for them is something like ‘value functions’ instead of ‘reward’ (as in RL).
I agree. As far as I can tell, people seem to be predicting their on-policy Q function when considering different choices. See also attainable utility theory and the gears of impact.
I agree. As far as I can tell, people seem to be predicting their on-policy Q function when considering different choices. See also attainable utility theory and the gears of impact.