It seems though that the reward function might be extremely complicated in general (in fact I suspect that this paper can be used to show that the reward function can be potentially uncomputable).
I agree with jsteinhardt, thanks for the reference.
I agree that the reward functions will vary in complexity. If you do the usual thing in Solomonoff induction, where the plausibility of a reward function decreases exponentially with its size, so far as I can tell you can infer reward fuctions from behavior, if you can infer behavior.
We need to infer a utility function for somebody if we’re going to help them get what they want, since a utility function is the only reasonable description I know of what an agent wants.
Since even irrational agents can be modelled using a utility function, no “reforming” is needed.
How can they be modeled with a utility function?
As explained here:
Thanks for the reference.
It seems though that the reward function might be extremely complicated in general (in fact I suspect that this paper can be used to show that the reward function can be potentially uncomputable).
The whole universe may well be computable—according to the Church–Turing–Deutsch principle. If it isn’t the above analysis may not apply.
I agree with jsteinhardt, thanks for the reference.
I agree that the reward functions will vary in complexity. If you do the usual thing in Solomonoff induction, where the plausibility of a reward function decreases exponentially with its size, so far as I can tell you can infer reward fuctions from behavior, if you can infer behavior.
We need to infer a utility function for somebody if we’re going to help them get what they want, since a utility function is the only reasonable description I know of what an agent wants.