The post is about the project of how an AI might infer human goals in whatever representation (i.e., ambitious value learning). This is different from how to imitate human behavior because in that case “behave like the human” is the goal, whereas in the case of ambitious value learning, the goal is “figure out what the human wants and then do it better.”
The fundamental problem is just the messiness of human values. The assumption of infinite data corresponds to the idea that we can place a human with an arbitrary memory in an arbitrary situation as often as we want and then observe her actions (because whatever representation of goals we have is allowed to be a function of the history). This is called the “easy goal inference problem” and is still hard. Primarily (this comes back to the difference between initiation and value learning), you need to model human mistakes, i.e., figure out whether an action was a mistake or not.
(I’m already familiar with the punchline that, for any action, there are infinitely many (rationality, goal) pairs that lead to that action, so this can’t be solved without making assumptions about rationality—but we also know it’s possible to make such assumptions that give reasonable performance because humans can infer other humans’ goals better than random.)
Attempted Summary:
The post is about the project of how an AI might infer human goals in whatever representation (i.e., ambitious value learning). This is different from how to imitate human behavior because in that case “behave like the human” is the goal, whereas in the case of ambitious value learning, the goal is “figure out what the human wants and then do it better.”
The fundamental problem is just the messiness of human values. The assumption of infinite data corresponds to the idea that we can place a human with an arbitrary memory in an arbitrary situation as often as we want and then observe her actions (because whatever representation of goals we have is allowed to be a function of the history). This is called the “easy goal inference problem” and is still hard. Primarily (this comes back to the difference between initiation and value learning), you need to model human mistakes, i.e., figure out whether an action was a mistake or not.
(I’m already familiar with the punchline that, for any action, there are infinitely many (rationality, goal) pairs that lead to that action, so this can’t be solved without making assumptions about rationality—but we also know it’s possible to make such assumptions that give reasonable performance because humans can infer other humans’ goals better than random.)