Jon Garcia answers On utility functions

Jon Garcia 10 Feb 2023 7:34 UTC
1 point
0
No, utility functions are not a property of computer programs in general. They are a property of (a certain class of) agents.

A utility function is just a way for an agent to evaluate states, where positive values are good (for states the agent wants to achieve), negative values are bad (for states the agent wants to avoid), and neutral values are neutral (for states the agent doesn’t care about one way or the other). This mapping from states to utilities can be anything in principle: a measure of how close to homeostasis the agent’s internal state is, a measure of how many smiles exist on human faces, a measure of the number of paperclips in the universe, etc. It all depends on how you program the agent (or how our genes and culture program us).

Utility functions drive decision-making. Behavioral policies and actions that tend to lead to states of high utility will get positively reinforced, such that the agent will learn to do those things more often. And policies/actions that tend to lead to states of low (or negative) utility will get negatively reinforced, such that the agent learns to do them less often. Eventually, the agent learns to steer the world toward states of maximum utility.

Depending on how aligned an AI’s utility function is with humanity’s, this could be good or bad. It turns out that for highly capable agents, this tends to be bad far more often than good (e.g., maximizing smiles or paperclips will lead to a universe devoid of value for humans).

Nondeterminism really has nothing to do with this. Agents that can modify their own code could in principle optimize for their utility functions even more strongly than if they were stuck at a certain level of capability, but a utility function still needs to be specified in some way regardless.