Stuart_Armstrong comments on Values at compile time

Stuart_Armstrong 27 Mar 2015 13:57 UTC
2 points
3 is the general problem of AI’s behaving badly. The way that this approach is supposed to avoid that is by having constructing a “human interpretation module” that is maximally accurate, and then using that module+human instructions to be the motivation of the AI.

Basically I’m using a lot of the module approach (and the “false miracle” stuff to get counterfactuals): the AI that builds the human interpretation module will build it for the purpose of making it accurate, and the one that uses it will have it as part of its motivation. The old problems may rear their heads again if the process is ongoing, but “module X” + “human instructions” + “module X’s interpretation of human instructions” seems rather solid as a one-off initial motivation.
- tailcalled 27 Mar 2015 14:15 UTC
  0 points
  Parent
  The problem is that the ‘human interpretation module’ might give the wrong results. For instance, if it convinces people that X is morally obligatory, it might interpret that as X being morally obligatory. It is not entirely obvious to me that it would be useful to have a better model. It probably depends on what the original AI wants to do.
  - Stuart_Armstrong 27 Mar 2015 15:01 UTC
    2 points
    Parent
    The module is supposed to be a predictive model of what humans mean or expect, rather than something that “convinces” or does anything like that.
    - tailcalled 27 Mar 2015 16:35 UTC
      2 points
      Parent
      I know, but my point is that such a model might be very perverse, such as “Humans do not expect to find out that you presented misleading information.” rather than “Humans do not expect that you present misleading information.”
      - Stuart_Armstrong 30 Mar 2015 14:13 UTC
        0 points
        Parent
        You’re right. This thing can come up in terms of “predicting human behaviour”, if the AI is sneaky enough. It wouldn’t come up in “compare human models of the world to reality”. So there are subtle nuances there to dig into...