Charlie Steiner comments on Alignment and Deep Learning

Charlie Steiner 17 Apr 2022 20:07 UTC
6 points
So, if I’m a smoker who wants to quit but finds it hard, I want the AI to learn that I want to quit. But if you didn’t bias the training data towards cases where agents have addictions they don’t want (as opposed to straightforwardly doing what they want, or even complaining about things that they do in fact want), the AI will learn that I want to keep smoking while complaining about it.

Similar things show up for lot of thongs we’d call our biases (loss aversion, my-side bias, etc.). A nonhuman observer of our society probably needs to be able to read our books and articles and apply them to interpreting us. This whole “intepret us how we want to be interpreted” thing is one of the requirements for CEV, yeah.
- Richard_Kennaway 17 Apr 2022 20:29 UTC
  3 points
  Parent
  
  the AI will learn that I want to keep smoking while complaining about it.
  
  A human psychologist might conclude the same thing. :)
  - Viliam 20 Apr 2022 20:41 UTC
    3 points
    Parent
    An economist, definitely.
- Aiyen 17 Apr 2022 20:22 UTC
  3 points
  Parent
  Sounds like there could be at least two approaches here. One would be CEV. The other would be to consider the smoker as wanting to smoke, or at least to avoid withdrawal cravings, and also to avoid the downsides of smoking. A sufficiently powerful agent operating on this model would try to suppress withdrawals, cure lung cancer or otherwise act in the smoker’s interests. On the other hand, a less powerful agent with this model might try to simply keep the smoker smoking. There’s an interesting question here about to what extent revealed preferences are a person’s true preferences, or whether addictions and the like should be considered an unwanted addition to one’s personality.