The goal is to get superhuman performance aligned with human values Rh. How might we achieve this?
By learning the human values.Then we can
use a perfect planner p⋆ to find the best actions to align the world
with the human values. This will have superhuman performance, because
humans’ planning algorithms are not perfect. They don’t always find the best
actions to align the world with their values.
How do we learn the human values? By observing human behaviour, ie. their actions
in each circumstance. This is modelled as the human policy π(h).
Behaviour is the known outside view of a human, and values+planner is the
unknown inside view. We need to learn both the values and the planner such
that p(R)=π(h).
Unfortunately, this equation is underdetermined. We only know π(h). p and
R can vary independently.
Are there differences among the (p,R) candidates? One thing we could look at is their
Kolmogorov complexity. Maybe the true candidate has the lowest complexity.
But this is not the case, according to the article.
How I understand the main point:
The goal is to get superhuman performance aligned with human values Rh. How might we achieve this? By learning the human values.Then we can use a perfect planner p⋆ to find the best actions to align the world with the human values. This will have superhuman performance, because humans’ planning algorithms are not perfect. They don’t always find the best actions to align the world with their values.
How do we learn the human values? By observing human behaviour, ie. their actions in each circumstance. This is modelled as the human policy π(h).
Behaviour is the known outside view of a human, and values+planner is the unknown inside view. We need to learn both the values and the planner such that p(R)=π(h).
Unfortunately, this equation is underdetermined. We only know π(h). p and R can vary independently.
Are there differences among the (p,R) candidates? One thing we could look at is their Kolmogorov complexity. Maybe the true candidate has the lowest complexity. But this is not the case, according to the article.
Yep, basically that. ^_^