our actions will change the amount of utility we expect to be available. This is not because of the class of potential utility functions exactly, but because the action space/utility function class combination are such that the actions and changes in magnitude of available utility are always linked
What we need to find, for a given agent to be constrained by being a ‘utility maximiser’ is to consider it as having a member of a class of utility functions where the actions that are available to it systematically alter the expected utility available to it—for all utility functions within this class. This is a necessary condition for utility functions to restrict behaviour, not a sufficient one.
It turns out that available utility (canonically, attainable utility, or AU) tracks with other important questions of when and why we can constrain our beliefs about an agent’s actions. See, shifting from thinking about utility to ability to get utility lets us formally understand instrumental convergence (sequence upcoming, so no citation yet). Eg using MDPs as the abstraction, the thing happening when AU changes also causes instrumental convergence to arise from the structural properties of the environment. I think this holds for quite a few distributions over reward functions, including at least the uniform distribution. So, it feels like you’re onto something with the restriction you point at.
Note that within almost any natural class there will be the degenerate utility function in which all results result in equal utility and therefore all actions are permissible—this must be deliberately excluded to make predictions.
The note seems unnecessary (if I read correctly), as the AU doesn’t change for those utility functions?
shifting from thinking about utility to ability to get utility lets us formally understand instrumental convergence (sequence upcoming, so no citation yet)
really looking forward to this! Strongly agree that it seems important.
These thoughts feel important:
It turns out that available utility (canonically, attainable utility, or AU) tracks with other important questions of when and why we can constrain our beliefs about an agent’s actions. See, shifting from thinking about utility to ability to get utility lets us formally understand instrumental convergence (sequence upcoming, so no citation yet). Eg using MDPs as the abstraction, the thing happening when AU changes also causes instrumental convergence to arise from the structural properties of the environment. I think this holds for quite a few distributions over reward functions, including at least the uniform distribution. So, it feels like you’re onto something with the restriction you point at.
The note seems unnecessary (if I read correctly), as the AU doesn’t change for those utility functions?
really looking forward to this! Strongly agree that it seems important.