AIXI isn’t a model of how an AGI might work inside, it’s a model of how an AGI might behave if it is acting optimally. A real AGI would not be expected to act like AIXI, but it would be expected to act somewhat more like AIXI the smarter it is. Since not acting like that is figuratively leaving money on the table.
The point of the whole utility maximization framing isn’t that we necessarily expect AIs to have an explicitly represented utility function internally[1]. It’s that as the AI gets better at getting what it wants and working out the conflicts between its various desires, its behaviorwill be increasingly well-predicted as optimizing some utility function.
If a utility function can’t accurately summarise your desires, that kind of means they’re mutually contradictory. Not in the sense of “I value X, but I also value Y”, but in the sense of “I sometimes act like I want X and don’t care about Y, other times like I want Y and don’t care about X.”
Having contradictory desires is kind of a problem if you want to Pareto optimize for those desires well. You risk sabotaging your own plans and running around in circles. You’re better off if you sit down and commit to things like “I will act as if I valued both X and Y at all times.” If you’re smart, you do this a lot. The more contradictions you resolve like this, the more coherent your desires will become, and the closer the’ll be to being well described as a utility function.
I think you can observe simple proto versions of this in humans sometimes, where people move from optimizing for whatever desire feels salient in the moment when they’re kids (hunger, anger, joy, etc.), to having some impulse control and sticking to a long-term plan, even if it doesn’t always feel good in the moment.
Human adults are still broadly not smart enough to be well described as general utility maximizers. Their desires are a lot more coherent than those of human kids or other animals, but still not that coherent in absolute terms. The point where you’d roughly expect AIs to become well-described as utility maximizers more than humans are would come after they’re broadly smarter than humans are. Specifically, smarter at long-term planning and optimization.
This is precisely what LLMs are still really bad at. Though efforts to make them better at it are ongoing, and seem to be among the highest priorities for the labs. Precisely because long-term consequentialist thinking is so powerful, and most of the really high-value economic activities require it.
Though you could argue that at some superhuman level of capability, having an explicit-ish representation stored somewhere in the system would be likely, even if the function may no actually be used much for most minute-to-minute processing. Knowing what you really want seems handy, even if you rarely actually call it to mind during routine tasks.
AIXI isn’t a model of how an AGI might work inside, it’s a model of how an AGI might behave if it is acting optimally. A real AGI would not be expected to act like AIXI, but it would be expected to act somewhat more like AIXI the smarter it is. Since not acting like that is figuratively leaving money on the table.
The point of the whole utility maximization framing isn’t that we necessarily expect AIs to have an explicitly represented utility function internally[1]. It’s that as the AI gets better at getting what it wants and working out the conflicts between its various desires, its behavior will be increasingly well-predicted as optimizing some utility function.
If a utility function can’t accurately summarise your desires, that kind of means they’re mutually contradictory. Not in the sense of “I value X, but I also value Y”, but in the sense of “I sometimes act like I want X and don’t care about Y, other times like I want Y and don’t care about X.”
Having contradictory desires is kind of a problem if you want to Pareto optimize for those desires well. You risk sabotaging your own plans and running around in circles. You’re better off if you sit down and commit to things like “I will act as if I valued both X and Y at all times.” If you’re smart, you do this a lot. The more contradictions you resolve like this, the more coherent your desires will become, and the closer the’ll be to being well described as a utility function.
I think you can observe simple proto versions of this in humans sometimes, where people move from optimizing for whatever desire feels salient in the moment when they’re kids (hunger, anger, joy, etc.), to having some impulse control and sticking to a long-term plan, even if it doesn’t always feel good in the moment.
Human adults are still broadly not smart enough to be well described as general utility maximizers. Their desires are a lot more coherent than those of human kids or other animals, but still not that coherent in absolute terms. The point where you’d roughly expect AIs to become well-described as utility maximizers more than humans are would come after they’re broadly smarter than humans are. Specifically, smarter at long-term planning and optimization.
This is precisely what LLMs are still really bad at. Though efforts to make them better at it are ongoing, and seem to be among the highest priorities for the labs. Precisely because long-term consequentialist thinking is so powerful, and most of the really high-value economic activities require it.
Though you could argue that at some superhuman level of capability, having an explicit-ish representation stored somewhere in the system would be likely, even if the function may no actually be used much for most minute-to-minute processing. Knowing what you really want seems handy, even if you rarely actually call it to mind during routine tasks.