TurnTrout comments on Open & Welcome Thread—February 2020

TurnTrout 10 Feb 2020 15:27 UTC
2 points

I think for humans most mentally accessible preferences are instrumental, and the right analogy for them is something like ‘value functions’ instead of ‘reward’ (as in RL).

I agree. As far as I can tell, people seem to be predicting their on-policy Q function when considering different choices. See also attainable utility theory and the gears of impact.