I agree with this comment. I would add that there is an important sense in which the typical human is not a temporally unstable agent.
It will help to have an example: the typical 9-year-old boy is uninterested in how much the girls in his environment like him and doesn’t necessarily wish to spend time with girls (unless those girls are acting like boys). It is tempting to say that the boy will probably undergo a change in his utility function over the next 5 or so years, but if you want to use the concept of expected utility (defined as the sum of the utility of the various outcome weighted by their probability) then to keep the math simple you must assume that the boy’s utility function does not change with time with the result that you must define the utility function to be not the boy’s current preferences, but rather his current preferences (conscious and unconscious) plus the process by which those preference will change over time.
Humans are even worse at perceiving the process that changes their preferences over time than they are at perceiving their current preferences. (The example of the 9-year-old boy is an exception to that general rule: even the 9-year-old boys tend to know that their preferences around girls are probably going to change in not too many years.) The author of the OP seems to have conflated the goals that the human knows that he has with the human’s utility function whereas they are quite different.
It might be that there is some subtle point the OP is making about temporally unstable agents that I have not addressed in my comment, but if he expects me to hear him out on it, he should write it up in such a way as to make to clear that he not just confused about how the concept of the utility function is being applied to AGIs.
I haven’t explained or shown how or why the assumption that the AGI’s utility function is constant over time simplifies the math—and simplifies an analysis that does not delve into actual math. Briefly, if you want to create a model in which the utility function evolves over time, you have to specify how it evolves—and to keep the model accurate, you have to specify how evidence coming in from the AGI’s senses influences the evolution. But of course, sensory information is not the only things influencing the evolution; we might call the other influence an “outer utility function”. But then why not keep the model simple and assume (define) the goals that the human is aware of to be not terms (terminology?) in a utility function, but rather subgoals? Any intelligent agent will need some machinery to identify and track subgoals. That machinery must modify the priorities of the subgoals in response to evidence coming in from the senses. Why not just require our model to include a model of the subgoal-updating machinery, then equate the things the human perceives as his current goals with subgoals?
Here is another way of seeing it. Since a human being is “implemented” using only deterministic laws of physics, the “seed” of all of the human’s behaviors, choices and actions over a lifetime are already present in the human being at birth! Actually that is not true: maybe the human’s brain is hit by a cosmic ray when the human is 7 years old with the result that the human grows up to like boys whereas if it weren’t for the cosmic ray, he would like girls. (Humans have evolved to be resistant to such “random” influences, but such influences nevertheless do occasionally happen.) But it is true that the “seed” of all of the human’s behaviors, choices and actions over a lifetime are already present at birth! (That sentence is just a copy of a previous sentence omitting the words “in the human being” to take into account the possibility that the “seed” includes a cosmic ray light years away from Earth at the time of the person’s birth.) So for us to assume that the human’s utility function does not vary over time not only simplifies the math, but also is more physically realistic.
If you define the utility function of a human being the way I have recommended above that you do, you must realize that there are many ways in which humans are unaware or uncertain about their own utility function and that the function is very complex (incorporating for example the processes that produce cosmic rays) although maybe all you need is an approximation. Still, that is better than defining your model such that utility function vary over time.
I agree with this comment. I would add that there is an important sense in which the typical human is not a temporally unstable agent.
It will help to have an example: the typical 9-year-old boy is uninterested in how much the girls in his environment like him and doesn’t necessarily wish to spend time with girls (unless those girls are acting like boys). It is tempting to say that the boy will probably undergo a change in his utility function over the next 5 or so years, but if you want to use the concept of expected utility (defined as the sum of the utility of the various outcome weighted by their probability) then to keep the math simple you must assume that the boy’s utility function does not change with time with the result that you must define the utility function to be not the boy’s current preferences, but rather his current preferences (conscious and unconscious) plus the process by which those preference will change over time.
Humans are even worse at perceiving the process that changes their preferences over time than they are at perceiving their current preferences. (The example of the 9-year-old boy is an exception to that general rule: even the 9-year-old boys tend to know that their preferences around girls are probably going to change in not too many years.) The author of the OP seems to have conflated the goals that the human knows that he has with the human’s utility function whereas they are quite different.
It might be that there is some subtle point the OP is making about temporally unstable agents that I have not addressed in my comment, but if he expects me to hear him out on it, he should write it up in such a way as to make to clear that he not just confused about how the concept of the utility function is being applied to AGIs.
I haven’t explained or shown how or why the assumption that the AGI’s utility function is constant over time simplifies the math—and simplifies an analysis that does not delve into actual math. Briefly, if you want to create a model in which the utility function evolves over time, you have to specify how it evolves—and to keep the model accurate, you have to specify how evidence coming in from the AGI’s senses influences the evolution. But of course, sensory information is not the only things influencing the evolution; we might call the other influence an “outer utility function”. But then why not keep the model simple and assume (define) the goals that the human is aware of to be not terms (terminology?) in a utility function, but rather subgoals? Any intelligent agent will need some machinery to identify and track subgoals. That machinery must modify the priorities of the subgoals in response to evidence coming in from the senses. Why not just require our model to include a model of the subgoal-updating machinery, then equate the things the human perceives as his current goals with subgoals?
Here is another way of seeing it. Since a human being is “implemented” using only deterministic laws of physics, the “seed” of all of the human’s behaviors, choices and actions over a lifetime are already present in the human being at birth! Actually that is not true: maybe the human’s brain is hit by a cosmic ray when the human is 7 years old with the result that the human grows up to like boys whereas if it weren’t for the cosmic ray, he would like girls. (Humans have evolved to be resistant to such “random” influences, but such influences nevertheless do occasionally happen.) But it is true that the “seed” of all of the human’s behaviors, choices and actions over a lifetime are already present at birth! (That sentence is just a copy of a previous sentence omitting the words “in the human being” to take into account the possibility that the “seed” includes a cosmic ray light years away from Earth at the time of the person’s birth.) So for us to assume that the human’s utility function does not vary over time not only simplifies the math, but also is more physically realistic.
If you define the utility function of a human being the way I have recommended above that you do, you must realize that there are many ways in which humans are unaware or uncertain about their own utility function and that the function is very complex (incorporating for example the processes that produce cosmic rays) although maybe all you need is an approximation. Still, that is better than defining your model such that utility function vary over time.