That’s correct that it simultaneously infers the policy and utility function. To avoid the underspecification problem, it uses a prior that favors higher intelligence agents.
This is similar to taking assumptions 1 and 2a from http://proceedings.mlr.press/v97/shah19a/shah19a.pdf
That’s correct that it simultaneously infers the policy and utility function. To avoid the underspecification problem, it uses a prior that favors higher intelligence agents. This is similar to taking assumptions 1 and 2a from http://proceedings.mlr.press/v97/shah19a/shah19a.pdf