A steelman of the claim that a human has a utility function is that agents that make coherent decisions have utility functions, therefore we may consider the utility function of a hypothetical AGI aligned with a human. That is, assignment of utility functions to humans reduces to alignment, by assigning the utility function of an aligned AGI to a human.
The problem is, of course, that any possible set of behaviors can be construed as maximizing some utility function. The question is whether doing so actually simplifies the task of reasoning and making predictions about the agent in question, or whether mapping the agent’s actual motivational schema to a utility function only adds unwieldy complications.
In the case of humans, I would say it’s far more useful to model us as generating and pursuing arbitrary goal states/trajectories over time. These goals are continuously learned through interactions with the environment and its impact on pain and pleasure signals, deviations from homeostatic set points, and aesthetic and social instincts. You might be able to model this as a utility function with a recursive hidden state, but would that be helpful?
any possible set of behaviors can be construed as maximizing some utility function
(Edit: What do you mean? This calls to mind a basic introduction to what utility functions do, given below, but on second thought that’s probably not what the claim is about. I’ll leave the rest of the comment here, as it could be useful for someone.)
A utility function describes decisions between lotteries, which are mixtures of outcomes, or more generally events in a sample space. The setting assumes uncertainty, outcomes are only known to be within some event, not individually. So a situation where a decision can be made is a collection of events/lotteries, one of which gets to be chosen, the choice is the behavior assigned to this situation. This makes situations reuse parts of each other, they are not defined independently. As a result, it becomes possible to act incoherently, for example pick A from (A, B), pick B from (B, C) and pick C from (A, C). Only satisfying certain properties of collections of behaviors allows existence of a probability measure and a utility function such that agent’s choice among the collection of events in any situation coincides with picking the event that has the highest expected utility.
Put differently, the issue is that behavior described by a utility function is actually behavior in all possible and counterfactual situations, not in some specific situation. Existence of a utility function says something about which behaviors in different situations can coexist. Without a utility function, each situation could get an arbitrary response/behavior of its own, independently from the responses given for other situations. But requiring a utility function makes that impossible, some behaviors become incompatible with the other behaviors.
In the grandparent comment, I’m treating utility functions more loosely, but their role in constraining collections of behaviors assigned to different situations is the same.
The problem is, of course, that any possible set of behaviors can be construed as maximizing some utility function. The question is whether doing so actually simplifies the task of reasoning and making predictions about the agent in question, or whether mapping the agent’s actual motivational schema to a utility function only adds unwieldy complications.
In the case of humans, I would say it’s far more useful to model us as generating and pursuing arbitrary goal states/trajectories over time. These goals are continuously learned through interactions with the environment and its impact on pain and pleasure signals, deviations from homeostatic set points, and aesthetic and social instincts. You might be able to model this as a utility function with a recursive hidden state, but would that be helpful?
(Edit: What do you mean? This calls to mind a basic introduction to what utility functions do, given below, but on second thought that’s probably not what the claim is about. I’ll leave the rest of the comment here, as it could be useful for someone.)
A utility function describes decisions between lotteries, which are mixtures of outcomes, or more generally events in a sample space. The setting assumes uncertainty, outcomes are only known to be within some event, not individually. So a situation where a decision can be made is a collection of events/lotteries, one of which gets to be chosen, the choice is the behavior assigned to this situation. This makes situations reuse parts of each other, they are not defined independently. As a result, it becomes possible to act incoherently, for example pick A from (A, B), pick B from (B, C) and pick C from (A, C). Only satisfying certain properties of collections of behaviors allows existence of a probability measure and a utility function such that agent’s choice among the collection of events in any situation coincides with picking the event that has the highest expected utility.
Put differently, the issue is that behavior described by a utility function is actually behavior in all possible and counterfactual situations, not in some specific situation. Existence of a utility function says something about which behaviors in different situations can coexist. Without a utility function, each situation could get an arbitrary response/behavior of its own, independently from the responses given for other situations. But requiring a utility function makes that impossible, some behaviors become incompatible with the other behaviors.
In the grandparent comment, I’m treating utility functions more loosely, but their role in constraining collections of behaviors assigned to different situations is the same.