I read a bit more about empowerment, and it’s unclear to me what it outputs, when the input is an agent without a clear utility function.
I realize this probably isn’t quite responding to what you meant, but—Empowerment doesn’t require or use a utility function, so you can estimate it for any ‘agent’ which has some conceptual output action channel. You could even compute it for output channels which aren’t really controlled by agents, and it would still compute empowerment as if that output channel was controlled by an agent. However the more nuanced versions which one probably would need to use for human-level agents probably need to consider the agent’s actual planning ability, for reasons mentioned in the cartesian objections section.
An agent can have multiple goals which come to the fore under complicated conditions, an agent can even want contrary things at different times. We should look for thought-experiments which test whether empowerment really does resolve such conflicts in an acceptable way.
Yeah, a human brain clearly consists of sub-modules which one could consider sub-agents to some degree. For example the decision to splurge a few hundred dollars on an expensive meal is largely a tradeoff between immediate hedonic utility and long term optionality—and does seem to be implemented as two neural sub-populations competitively ‘bidding’ for the different decisions as arbitrated in the basal ganglia.
Empowerment always favors the long term optionality. So it’s clearly not a fully general tight approximation of human values in practice, but it is a reasonable approximation of the long term component which seems to be most of the difficulty for value learning.
External empowerment is the first/only reasonably simple and theoretically computable utility function that seems to not only keep humans alive, but also plausibly would step down and hand over control to posthumans (with the key caveat that it may want to change/influence posthuman designs in ways we would dislike).
I realize this probably isn’t quite responding to what you meant, but—Empowerment doesn’t require or use a utility function, so you can estimate it for any ‘agent’ which has some conceptual output action channel. You could even compute it for output channels which aren’t really controlled by agents, and it would still compute empowerment as if that output channel was controlled by an agent. However the more nuanced versions which one probably would need to use for human-level agents probably need to consider the agent’s actual planning ability, for reasons mentioned in the cartesian objections section.
Yeah, a human brain clearly consists of sub-modules which one could consider sub-agents to some degree. For example the decision to splurge a few hundred dollars on an expensive meal is largely a tradeoff between immediate hedonic utility and long term optionality—and does seem to be implemented as two neural sub-populations competitively ‘bidding’ for the different decisions as arbitrated in the basal ganglia.
Empowerment always favors the long term optionality. So it’s clearly not a fully general tight approximation of human values in practice, but it is a reasonable approximation of the long term component which seems to be most of the difficulty for value learning.
External empowerment is the first/only reasonably simple and theoretically computable utility function that seems to not only keep humans alive, but also plausibly would step down and hand over control to posthumans (with the key caveat that it may want to change/influence posthuman designs in ways we would dislike).