Charlie Steiner comments on [AN #69] Stuart Russell’s new book on why we need to replace the standard model of AI

Charlie Steiner 20 Oct 2019 0:26 UTC
LW: 4 AF: 3
AF
As with the previous paper, this argument is only really a problem when the agent’s belief about the reward function is wrong: if it is correct, then at the point where there is no more information to gain, the agent should already know that humans don’t like to be killed, do like to be happy, etc.
There’s also the scenario where the AI models the world in a way that has as good or better predictive power than our intentional stance model, but this weird model assigns undesirable values to the AI’s co-player in the CIRL game. We can’t rely on the agent “already knowing that humans don’t like to be killed,” because the AI doesn’t have to be using the level of abstraction on which “human” or “killed” are natural categories.
- Rohin Shah 21 Oct 2019 5:58 UTC
  LW: 4 AF: 3
  AF Parent
  I certainly would count an ontological failure in the reward function as an incorrect belief about the reward function.
  - Charlie Steiner 21 Oct 2019 21:21 UTC
    LW: 2 AF: 1
    AF Parent
    I’m just a little leery of calling things “wrong” when it makes the same predictions about observations as being “right.” I don’t want people to think that we can avoid “wrong ontologies” by starting with some reasonable-sounding universal prior and then updating on lots of observational data. Or that something “wrong” will be doing something systematically stupid, probably due to some mistake or limitation that of course the reader would never program into their AI.