Like seth said, I just mean reinforcement learning. Described in more typical language, people take their feelings of success from whether they’re winning at the player-vs-environment and player-vs-player contests one encounters in everyday life; opportunities to change what contests are possible are unfamiliar. I also think there are decision theory issues[1] humans have. and then of course people do in fact have different preferences and moral values. but even among people where neither issue is in play, I think people have pretty bad self-misalignment as a result of taking what-feels-good-to-succeed-at feedback from circumstances that train them into habits that work well in the original context, and which typically badly fail to produce useful behavior in contexts like “you can massively change things for the better”. Being prepared for unreasonable success is a common phrase referring to this issue, I think.
[1] in case this is useful context: a decision theory is a small mathematical expression which roughly expresses “what part of past, present, and future do you see as you-which-decides-together”, or stated slightly more technically, what’s the expression that defines how you consider counterfactuals when evaluating possible actions you “could [have] take[n]”; I’m pretty sure humans have some native one, and it’s not exactly any of the ones that are typically discussed but rather some thing vaguely in the direction of active inference, though people vary between approximating the typically discussed ones. The commonly discussed ones around these parts are stuff like EDT/CDT/LDTs { FDT, UDT, LIDT, … }
Like seth said, I just mean reinforcement learning. Described in more typical language, people take their feelings of success from whether they’re winning at the player-vs-environment and player-vs-player contests one encounters in everyday life; opportunities to change what contests are possible are unfamiliar. I also think there are decision theory issues[1] humans have. and then of course people do in fact have different preferences and moral values. but even among people where neither issue is in play, I think people have pretty bad self-misalignment as a result of taking what-feels-good-to-succeed-at feedback from circumstances that train them into habits that work well in the original context, and which typically badly fail to produce useful behavior in contexts like “you can massively change things for the better”. Being prepared for unreasonable success is a common phrase referring to this issue, I think.
[1] in case this is useful context: a decision theory is a small mathematical expression which roughly expresses “what part of past, present, and future do you see as you-which-decides-together”, or stated slightly more technically, what’s the expression that defines how you consider counterfactuals when evaluating possible actions you “could [have] take[n]”; I’m pretty sure humans have some native one, and it’s not exactly any of the ones that are typically discussed but rather some thing vaguely in the direction of active inference, though people vary between approximating the typically discussed ones. The commonly discussed ones around these parts are stuff like EDT/CDT/LDTs { FDT, UDT, LIDT, … }