The problem can be ameliorated by constraining to instrumental reward functions. This gives us agents that are, in some sense, optimizing the state of the environment rather than an arbitrary function of their own behavior. I think this is a better model of what it means to be “goal-directed” than classical reward functions.
Another thing we can do is just applying Occam’s razor, i.e requiring the utility function (and prior) to have low description complexity. This can be interpreted as, taking the intentional stance towards a system is only useful if it results in compression.
The problem can be ameliorated by constraining to instrumental reward functions. This gives us agents that are, in some sense, optimizing the state of the environment rather than an arbitrary function of their own behavior. I think this is a better model of what it means to be “goal-directed” than classical reward functions.
Another thing we can do is just applying Occam’s razor, i.e requiring the utility function (and prior) to have low description complexity. This can be interpreted as, taking the intentional stance towards a system is only useful if it results in compression.
Those seem to be roughly the same thing—knowing about an environment allows us greater understanding/ability to predict agents in the environment.