avturchin comments on Alignment By Default

avturchin 14 Aug 2020 16:45 UTC
2 points
−2
And here is my point: trees actually exist, and they are natural abstract. “Human values” was created by psychologists in the middle of 20th century as one of the ways to describe human mind. They don’t actually exist, but are useful description instruments for some tasks.
There are other ways to describe human mind and human motivations: ethical norms, drives, memes, desires, Freud model, family system etc. An AI may find some other abstractions which will be even better in compressing behaviour, but they will be not human values.
- johnswentworth 14 Aug 2020 18:03 UTC
  2 points
  Parent
  Humans have wanted things, and recognized other humans as wanting things, since long before 20th century psychologists came along and used the phrase “human values”. I don’t particularly care about aligning an AI to whatever some psychologist defines as “human values”, I care about aligning an AI to the things humans want. Those are the “human values” I care about. The very fact that I can talk about that, and other people generally seem to know what I’m talking about without me needing to give a formal definition, is evidence that it is a natural abstraction.
  I would not say there are “other ways to model the human mind”, but rather there are other aspects of the human mind which one can model. (Also there are some models of the human mind which are just outright wrong, e.g. Freudian models.) If a model is to achieve strong general-purpose predictive power, then it needs to handle all of those different aspects, including human values. A model of the human mind may be lower-level than “human values”, e.g. a low-level physics model of the brain, but that will still have human values embedded in it somehow. If a model doesn’t have human values embedded in it somewhere, then it will have poor predictive performance on many problems in which human values are involved.
  - avturchin 14 Aug 2020 18:25 UTC
    4 points
    Parent
    But human “wants” are not actually a good thing which AI should follow. If I am fasting, I obviously want to eat, but me decision is not eating today. And if I have a robot helping me, I prefer it care about my decisions, not my “wants”. This distinction between desires and decisions was obvious for last 2.5 thousand years, and “human values” is obscure and not natural idea.
    - johnswentworth 14 Aug 2020 20:09 UTC
      2 points
      Parent
      You are using the word “want” differently than I was. I’m pretty sure I’m trying to point to exactly the same thing you are pointing to. And the fact that we’re both trying to point to the same thing is exactly the evidence that the thing we’re trying to point to is a natural abstraction.
      (The fact that the distinction between desires and decisions was obvious for the last 2.5. thousand years is also evidence that both of these things are natural abstractions.)
      And if I have a robot helping me, I prefer it care about my decisions, not my “wants”.
      This is a bad idea. You should really, really want the robot to care about something besides your decisions, because the decisions are not enough to determine your values.