This isn’t really the full explanation of why I think the AI can’t just be given a human model and told to fill it in, though. For starters, there’s also the issue about whether the human model should “live” in the AI’s native ontology, or whether it should live in its own separate, “fictional” ontology.
I’ve become more convinced of the latter—that if you tell the AI to figure out “human values” in a model that’s interacting with whatever its best-predicting ontology is, it will come up with values that include things as strange as “Charlie wants to emit CO2″ (though not necessarily in the same direction). Instead, its model of my values might need to be described in a special ontology in which human-level concepts are simple but the AI’s overall predictions are worse, in order for a predictive human model to actually contain what I’d consider to be my values.
This has prompted me to get off my butt and start publishing the more useful bits of what I’ve been thinking about. Long story short, I disagree with you while still almost entirely agreeing with you.
This isn’t really the full explanation of why I think the AI can’t just be given a human model and told to fill it in, though. For starters, there’s also the issue about whether the human model should “live” in the AI’s native ontology, or whether it should live in its own separate, “fictional” ontology.
I’ve become more convinced of the latter—that if you tell the AI to figure out “human values” in a model that’s interacting with whatever its best-predicting ontology is, it will come up with values that include things as strange as “Charlie wants to emit CO2″ (though not necessarily in the same direction). Instead, its model of my values might need to be described in a special ontology in which human-level concepts are simple but the AI’s overall predictions are worse, in order for a predictive human model to actually contain what I’d consider to be my values.