Don’t worry, I’m going to be adding depth to the model. But note that the AI’s predictive accuracy is never in doubt. This is sort of a reverse “can’t derive an ought from as is”; here, you can’t derive a wants from a did. The learning agent will only get the correct human motivation (if such a thing exists) if it has the correct model of what counts as desires for a human. Or some way of learning this model, which is what I’m looking at (again, there’s a distinction between learning a model that gives correct prediction of human actions, and learning a mode that gives what we would call a correct model of human motivation).
According to its model, the AI is not being manipulative here, simply doing what the human desires indicate it should.
Don’t worry, I’m going to be adding depth to the model. But note that the AI’s predictive accuracy is never in doubt. This is sort of a reverse “can’t derive an ought from as is”; here, you can’t derive a wants from a did. The learning agent will only get the correct human motivation (if such a thing exists) if it has the correct model of what counts as desires for a human. Or some way of learning this model, which is what I’m looking at (again, there’s a distinction between learning a model that gives correct prediction of human actions, and learning a mode that gives what we would call a correct model of human motivation).
According to its model, the AI is not being manipulative here, simply doing what the human desires indicate it should.