The idea of that more information can make an AI’s inferences worse is surprising. But the idea that the assumption that humans have a unchanging, neatly hierarchical UF is known to be a bad idea, so it is not so surprising that it leads to bad results. In short, this is still a bit clown-car-ish.
Would you tell an AI that Heroin is Bad, but not tell here that Manipulation is Bad?
Don’t worry, I’m going to be adding depth to the model. But note that the AI’s predictive accuracy is never in doubt. This is sort of a reverse “can’t derive an ought from as is”; here, you can’t derive a wants from a did. The learning agent will only get the correct human motivation (if such a thing exists) if it has the correct model of what counts as desires for a human. Or some way of learning this model, which is what I’m looking at (again, there’s a distinction between learning a model that gives correct prediction of human actions, and learning a mode that gives what we would call a correct model of human motivation).
According to its model, the AI is not being manipulative here, simply doing what the human desires indicate it should.
The idea of that more information can make an AI’s inferences worse is surprising. But the idea that the assumption that humans have a unchanging, neatly hierarchical UF is known to be a bad idea, so it is not so surprising that it leads to bad results. In short, this is still a bit clown-car-ish.
Would you tell an AI that Heroin is Bad, but not tell here that Manipulation is Bad?
Don’t worry, I’m going to be adding depth to the model. But note that the AI’s predictive accuracy is never in doubt. This is sort of a reverse “can’t derive an ought from as is”; here, you can’t derive a wants from a did. The learning agent will only get the correct human motivation (if such a thing exists) if it has the correct model of what counts as desires for a human. Or some way of learning this model, which is what I’m looking at (again, there’s a distinction between learning a model that gives correct prediction of human actions, and learning a mode that gives what we would call a correct model of human motivation).
According to its model, the AI is not being manipulative here, simply doing what the human desires indicate it should.