Stuart_Armstrong comments on Values at compile time

Stuart_Armstrong 27 Mar 2015 15:01 UTC
2 points
The module is supposed to be a predictive model of what humans mean or expect, rather than something that “convinces” or does anything like that.
- tailcalled 27 Mar 2015 16:35 UTC
  2 points
  Parent
  I know, but my point is that such a model might be very perverse, such as “Humans do not expect to find out that you presented misleading information.” rather than “Humans do not expect that you present misleading information.”
  - Stuart_Armstrong 30 Mar 2015 14:13 UTC
    0 points
    Parent
    You’re right. This thing can come up in terms of “predicting human behaviour”, if the AI is sneaky enough. It wouldn’t come up in “compare human models of the world to reality”. So there are subtle nuances there to dig into...