Coding2077 comments on Safety isn’t safety without a social model (or: dispelling the myth of per se technical safety)

Coding2077 28 Jun 2024 17:33 UTC
1 point
0
I found your reply really interesting.
Because I find it so interesting and want to understand it: What does the “RLed” in “Unfortunately it seems to me that humans are RLed pretty hard by doing a lot of playing of these games” mean? That term is not familiar to me.
- the gears to ascension 29 Jun 2024 3:53 UTC
  2 points
  0
  Parent
  Like seth said, I just mean reinforcement learning. Described in more typical language, people take their feelings of success from whether they’re winning at the player-vs-environment and player-vs-player contests one encounters in everyday life; opportunities to change what contests are possible are unfamiliar. I also think there are decision theory issues[1] humans have. and then of course people do in fact have different preferences and moral values. but even among people where neither issue is in play, I think people have pretty bad self-misalignment as a result of taking what-feels-good-to-succeed-at feedback from circumstances that train them into habits that work well in the original context, and which typically badly fail to produce useful behavior in contexts like “you can massively change things for the better”. Being prepared for unreasonable success is a common phrase referring to this issue, I think.
  [1] in case this is useful context: a decision theory is a small mathematical expression which roughly expresses “what part of past, present, and future do you see as you-which-decides-together”, or stated slightly more technically, what’s the expression that defines how you consider counterfactuals when evaluating possible actions you “could [have] take[n]”; I’m pretty sure humans have some native one, and it’s not exactly any of the ones that are typically discussed but rather some thing vaguely in the direction of active inference, though people vary between approximating the typically discussed ones. The commonly discussed ones around these parts are stuff like EDT/CDT/LDTs { FDT, UDT, LIDT, … }
- Seth Herd 29 Jun 2024 0:23 UTC
  2 points
  2
  Parent
  Reinforcement learning.