It may be better to ask “Is a utility function a useful abstraction to describe how X makes decisions?” (Does it allow you to compress your description of X’s decisions?) Recall that utility functions are just a representation derived from preferences that are structured in a particular way. But not all ways of deciding on a preferred outcome are structured in that way[1], and not all decision algorithms work by preferring outcomes, so thinking in terms of utility functions is not always helpful.
Even if it’s a useful abstraction, it’s only an abstraction. You can’t make an AI safe by changing the it’s UF unless it’s UF is a distinct component at the engineering level, not just an abstraction.
Its not really an abstraction at all in this case, it literally has a utility function. What rates highest on its utility function is returning whatever token is ‘most likely’ given it’s training data.
It may be better to ask “Is a utility function a useful abstraction to describe how X makes decisions?” (Does it allow you to compress your description of X’s decisions?) Recall that utility functions are just a representation derived from preferences that are structured in a particular way. But not all ways of deciding on a preferred outcome are structured in that way[1], and not all decision algorithms work by preferring outcomes, so thinking in terms of utility functions is not always helpful.
See for example:
Aumann, R. J. (1962). Utility theory without the completeness axiom. Econometrica: Journal of the Econometric Society, 445-462.
Bewley, T. F. (2002). Knightian decision theory. Part I. Decisions in economics and finance, 25(2), 79-110.
Even if it’s a useful abstraction, it’s only an abstraction. You can’t make an AI safe by changing the it’s UF unless it’s UF is a distinct component at the engineering level, not just an abstraction.
And you can’t determine if it’s safe by examining or understanding it’s utility function, if the abstraction is so loose as to not be align-able.
Its not really an abstraction at all in this case, it literally has a utility function. What rates highest on its utility function is returning whatever token is ‘most likely’ given it’s training data.