We want the AI to take the “right” action. In the IRL framework, we think of getting there by a series of ~4 steps - (observations of human behavior) → (inferred human decision in model) → (inferred human values) → (right action).
Going from step 1 to 2 is hard, and ditto with 2 to 3, and we’ll probably learn new reasons why 3 to 4 is hard when try to do it more realistically. You mostly use model mis-specification to illustrate this—because very different models of step 2 can predict similar step 1, the inference is hard in a certain way. Because very different models of step 3 can predict similar step 2, that inference is also hard.
So, to sum up (?):
We want the AI to take the “right” action. In the IRL framework, we think of getting there by a series of ~4 steps - (observations of human behavior) → (inferred human decision in model) → (inferred human values) → (right action).
Going from step 1 to 2 is hard, and ditto with 2 to 3, and we’ll probably learn new reasons why 3 to 4 is hard when try to do it more realistically. You mostly use model mis-specification to illustrate this—because very different models of step 2 can predict similar step 1, the inference is hard in a certain way. Because very different models of step 3 can predict similar step 2, that inference is also hard.