I found janus’s post Simulators to address this question very well. Much of AGI discussion revolves around agentic AIs (see the section Agentic GPT for discussion of this), but this does not model large language models very well. janus suggests that one should instead think of LLMs such as GPT-3 as “simulators”. Simulators are not very agentic themselves or well described as having a utility function, though they may create simulacra that are agentic (e.g. GPT-3 writes a story where the main character is agentic).
We can specify some types of outer objectives using a ground truth distribution that we cannot with a utility function. As in the case of GPT, there is no difficulty in incentivizing a model to predict actions that are corrigible, incoherent, stochastic, irrational, or otherwise anti-natural to expected utility maximization. All you need is evidence of a distribution exhibiting these properties.
For instance, during GPT’s training, sometimes predicting the next token coincides with predicting agentic behavior, but:
The actions of agents described in the data are rarely optimal for their goals; humans, for instance, are computationally bounded, irrational, normative, habitual, fickle, hallucinatory, etc.
Different prediction steps involve mutually incoherent goals, as human text records a wide range of differently-motivated agentic behavior
Many prediction steps don’t correspond to the action of any consequentialist agent but are better described as reporting on the structure of reality, e.g. the year in a timestamp. These transitions incentivize GPT to improve its model of the world, orthogonally to agentic objectives.
Everything can be trivially modeled as a utility maximizer, but for these reasons, a utility function is not a good explanation or compression of GPT’s training data, and its optimal predictor is not well-described as a utility maximizer. However, just because information isn’t compressed well by a utility function doesn’t mean it can’t be compressed another way. The Mandelbrot set is a complicated pattern compressed by a very simple generative algorithm which makes no reference to future consequences and doesn’t involve argmaxxing anything (except vacuously being the way it is). Likewise the set of all possible rollouts of Conway’s Game of Life – some automata may be well-described as agents, but they are a minority of possible patterns, and not all agentic automata will share a goal. Imagine trying to model Game of Life as an expected utility maximizer!
This makes the same point as cfoster0′s comment on this post—and that self-supervised learning is a method of AI specification that does not require “choosing a utility function”, even implicitly, since the resulting policy won’t necessarily be well-described as a utility maximizer at all.
It’s utility function is pretty simple and explicitly programmed. It wants to find the best token, where ‘best’ is mostly the same as ‘the most likely according to the data I’m trained on’. With a few other particulars (where you can adjust how ‘creative’ vs plagiarizer-y it should be.)
That’s a utility function. GPT is what’s called a hill climbing algorithm. It must have a simple straight forward utility function hard coded right in there for it to assess if a given choice is ‘climbing’ or not.
That’s the training signal, not the utility function. Those are different things. (I believe this point was made in Reward is not the Optimization Target, though I could be wrong since I never actually read this post; corrections welcome.)
I found janus’s post Simulators to address this question very well. Much of AGI discussion revolves around agentic AIs (see the section Agentic GPT for discussion of this), but this does not model large language models very well. janus suggests that one should instead think of LLMs such as GPT-3 as “simulators”. Simulators are not very agentic themselves or well described as having a utility function, though they may create simulacra that are agentic (e.g. GPT-3 writes a story where the main character is agentic).
A relevant passage from Simulators:
This makes the same point as cfoster0′s comment on this post—and that self-supervised learning is a method of AI specification that does not require “choosing a utility function”, even implicitly, since the resulting policy won’t necessarily be well-described as a utility maximizer at all.
I’m going to disagree here.
It’s utility function is pretty simple and explicitly programmed. It wants to find the best token, where ‘best’ is mostly the same as ‘the most likely according to the data I’m trained on’. With a few other particulars (where you can adjust how ‘creative’ vs plagiarizer-y it should be.)
That’s a utility function. GPT is what’s called a hill climbing algorithm. It must have a simple straight forward utility function hard coded right in there for it to assess if a given choice is ‘climbing’ or not.
That’s the training signal, not the utility function. Those are different things. (I believe this point was made in Reward is not the Optimization Target, though I could be wrong since I never actually read this post; corrections welcome.)