It’s true that humans do not have utility functions
Do not have full conscious access to their utility function? Yes. Have an ugly, constantly changing utility function since we don’t guard our values against temporal variance? Yes. Whose values cannot with perfect fidelity be described by a utility function in a pragmatic sense, say with a group of humans attempting to do so? Yes.
Whose actual utility function cannot be approximately described, with some bounded error term epsilon? No. Whose goals cannot in principle be expressed by a utility function? No.
Please approximately describe a utility function of an addict who is calling his dealer for another dose, knowing full well that he is doing harm to himself, that he will feel worse the next day, and already feeling depressed because of that, yet still acting in a way which is guaranteed to negatively impact his happiness. The best I can do is “there are two different people, System 1 and System 2, with utility functions UF1 and UF2, where UF1 determines actions while UF2 determines happiness”.
The question does come down to definition. I do think most people here are on the same page concerning the subject matter, and only differ on what they’re calling a utility function. I’m of the Church-Turing thesis persuasion (the ‘iff’ goes both ways), and don’t see why the aspect of a human governing its behavior should be any different than the world at large.
Whether that’s useful is a different question. No doubt the human post-breakfast has a different utility function than pre-breakfast. Do we then say that the utility function takes as a second parameter t, or do we insist that post-breakfast there exists a different agent (strictly speaking, since it has different values) who merely shares some continuity with its hungry predecessor, who sadly no longer exists (RIP)? If so, what would be the granularity, what kind of fuzziness would still be allowed in our constantly changing utility function, which ebbs and flows with our cortison levels and a myriad of other factors?
If a utility function, even if known, was only applicable in one instant, for one agent, would it even make sense to speak of a global function, if the domain consists of but one action?
In the VNM-sense, it may well be that technically humans don’t have a (VNM!)utility function. But meh, unless there’s uncomputable magic in there somewhere some kind of function mapping all possible stimuli to a human’s behavior should theoretically exist, and I’d call that utility function.
Definitional stuff, which is just wiggly lines fighting each other: squibbles versus squobbles, dictionary fight to the death, for some not[at]ion of death!
ETA: It depends on what you call a utility function, and how ugly a utility function (including assigning different values to different actions each fraction of a second) you’re ready to accept. Is there “a function” assigning values to outcomes which would describe a human’s behavior over his/her lifetime? Yes, of course there is. (There is one describing the whole universe, so there better be one for a paltry human’s behavior. Even if it assigns different values at different times.) Is there a ‘simple’ function (e.g. time-invariant) which also satisfices the VNM criteria? Probably not.
Sorry, I don’t understand your point, beyond your apparently reversing your position and agreeing that humans don’t have a utility function, not even approximately.
In the VNM-sense, it may well be that technically humans don’t have a (VNM!)utility function. But meh, unless there’s uncomputable magic in there somewhere some kind of function mapping all possible stimuli to a human’s behavior should theoretically exist, and I’d call that utility function.
Calling it a utility function does not make it a utility function. A utility function maps decisions to utilities, in an entity which decides among its available choices by evaluating that function for each one and making the decision that maximises the value. Or as Wikipedia puts it, in what seems a perfectly sensible summary definition covering all its more detailed uses, utility is “the (perceived) ability of something to satisfy needs or wants.” That is the definition of utility and utility functions; that is what everyone means by them. It makes no sense to call something completely different by the same name in order to preserve the truth of the sentence “humans have utility functions”. The sentence has remained the same but the proposition it expresses has been changed, and changed into an uninteresting tautology. The original proposition expressed by “humans have utility functions” is still false, or if one is going to argue that it is true, it must be done by showing that humans have utility functions in the generally understood meaning of the term.
some kind of function mapping all possible stimuli to a human’s behavior should theoretically exist
No, it should not; it cannot. Behaviour depends not only on current stimuli but the human’s entire past history, internal and external. Unless you are going to redefine “stimuli” to mean “entire past light-cone” (which of course the word does not mean) this does not work. Furthermore, that entire past history is also causally influenced by the human’s behaviour. Such cyclic patterns of interaction cannot be understood as functions from stimulus to response.
In order to arrive at this subjectively ineluctable (“meh, unless there’s uncomputable magic”) statement, you have redefined the key words to make them mean what no-one ever means by them. It’s the Texas Sharpshooter Utility Function fallacy yet again: look at what the organism does, then label that as having higher “utility” than the things it did not do.
Mostly, I’m concerned that “strictly speaking, humans don’t have VNM-utility functions, so that’s that, full stop” can be interpreted like a stop sign, when in fact humans do have preferences (clearly) and do tend to choose actions to try to satisfice those preferences at least part of the time. To the extent that we’d deny that, we’d deny the existence of any kind of “agent” instantiated in the physical universe. There is predictable behavior for the most part, which can be modelled. And anything that can be computationally modelled can be described by a function. It may not have some of the nice VNM properties, but we take what we can get.
If there’s a more applicable term for the kind of model we need (rather than simply “utility function in a non-VNM sense”), by all means, but then again, “what’s in a name” …
The question is whether AIs can have a fixed UF …specifically whether they can both self modify and maintain their goals. If they can’t, there is no point in loading then with human values upfront (as they won’t stick to them anyway), and the problem of corrigibility becomes one of getting them to go in the direction we want, not of getting them to budge at all.
Which is not to say that goal unstable AIs will be safe, but they do present different problems and require different solutions. Which could do with being looked at some time.
In the face of iinstability, you can rescue the idea of the utility function by feeding in an agent’s entire history, but rescuing the UF is not what is important. Is stability versus instability. I am still against the use of the phrase utility function, because when people read it, they think time independent utility function, which is why, I think, there is so little consideration of unstable AI.
Do not have full conscious access to their utility function? Yes. Have an ugly, constantly changing utility function since we don’t guard our values against temporal variance? Yes. Whose values cannot with perfect fidelity be described by a utility function in a pragmatic sense, say with a group of humans attempting to do so? Yes.
Whose actual utility function cannot be approximately described, with some bounded error term epsilon? No. Whose goals cannot in principle be expressed by a utility function? No.
Please approximately describe a utility function of an addict who is calling his dealer for another dose, knowing full well that he is doing harm to himself, that he will feel worse the next day, and already feeling depressed because of that, yet still acting in a way which is guaranteed to negatively impact his happiness. The best I can do is “there are two different people, System 1 and System 2, with utility functions UF1 and UF2, where UF1 determines actions while UF2 determines happiness”.
The question does come down to definition. I do think most people here are on the same page concerning the subject matter, and only differ on what they’re calling a utility function. I’m of the Church-Turing thesis persuasion (the ‘iff’ goes both ways), and don’t see why the aspect of a human governing its behavior should be any different than the world at large.
Whether that’s useful is a different question. No doubt the human post-breakfast has a different utility function than pre-breakfast. Do we then say that the utility function takes as a second parameter t, or do we insist that post-breakfast there exists a different agent (strictly speaking, since it has different values) who merely shares some continuity with its hungry predecessor, who sadly no longer exists (RIP)? If so, what would be the granularity, what kind of fuzziness would still be allowed in our constantly changing utility function, which ebbs and flows with our cortison levels and a myriad of other factors?
If a utility function, even if known, was only applicable in one instant, for one agent, would it even make sense to speak of a global function, if the domain consists of but one action?
In the VNM-sense, it may well be that technically humans don’t have a (VNM!)utility function. But meh, unless there’s uncomputable magic in there somewhere some kind of function mapping all possible stimuli to a human’s behavior should theoretically exist, and I’d call that utility function.
Definitional stuff, which is just wiggly lines fighting each other: squibbles versus squobbles, dictionary fight to the death, for some not[at]ion of death!
ETA: It depends on what you call a utility function, and how ugly a utility function (including assigning different values to different actions each fraction of a second) you’re ready to accept. Is there “a function” assigning values to outcomes which would describe a human’s behavior over his/her lifetime? Yes, of course there is. (There is one describing the whole universe, so there better be one for a paltry human’s behavior. Even if it assigns different values at different times.) Is there a ‘simple’ function (e.g. time-invariant) which also satisfices the VNM criteria? Probably not.
Sorry, I don’t understand your point, beyond your apparently reversing your position and agreeing that humans don’t have a utility function, not even approximately.
Calling it a utility function does not make it a utility function. A utility function maps decisions to utilities, in an entity which decides among its available choices by evaluating that function for each one and making the decision that maximises the value. Or as Wikipedia puts it, in what seems a perfectly sensible summary definition covering all its more detailed uses, utility is “the (perceived) ability of something to satisfy needs or wants.” That is the definition of utility and utility functions; that is what everyone means by them. It makes no sense to call something completely different by the same name in order to preserve the truth of the sentence “humans have utility functions”. The sentence has remained the same but the proposition it expresses has been changed, and changed into an uninteresting tautology. The original proposition expressed by “humans have utility functions” is still false, or if one is going to argue that it is true, it must be done by showing that humans have utility functions in the generally understood meaning of the term.
No, it should not; it cannot. Behaviour depends not only on current stimuli but the human’s entire past history, internal and external. Unless you are going to redefine “stimuli” to mean “entire past light-cone” (which of course the word does not mean) this does not work. Furthermore, that entire past history is also causally influenced by the human’s behaviour. Such cyclic patterns of interaction cannot be understood as functions from stimulus to response.
In order to arrive at this subjectively ineluctable (“meh, unless there’s uncomputable magic”) statement, you have redefined the key words to make them mean what no-one ever means by them. It’s the Texas Sharpshooter Utility Function fallacy yet again: look at what the organism does, then label that as having higher “utility” than the things it did not do.
I appreciate your point.
Mostly, I’m concerned that “strictly speaking, humans don’t have VNM-utility functions, so that’s that, full stop” can be interpreted like a stop sign, when in fact humans do have preferences (clearly) and do tend to choose actions to try to satisfice those preferences at least part of the time. To the extent that we’d deny that, we’d deny the existence of any kind of “agent” instantiated in the physical universe. There is predictable behavior for the most part, which can be modelled. And anything that can be computationally modelled can be described by a function. It may not have some of the nice VNM properties, but we take what we can get.
If there’s a more applicable term for the kind of model we need (rather than simply “utility function in a non-VNM sense”), by all means, but then again, “what’s in a name” …
The question is whether AIs can have a fixed UF …specifically whether they can both self modify and maintain their goals. If they can’t, there is no point in loading then with human values upfront (as they won’t stick to them anyway), and the problem of corrigibility becomes one of getting them to go in the direction we want, not of getting them to budge at all.
Which is not to say that goal unstable AIs will be safe, but they do present different problems and require different solutions. Which could do with being looked at some time.
In the face of iinstability, you can rescue the idea of the utility function by feeding in an agent’s entire history, but rescuing the UF is not what is important. Is stability versus instability. I am still against the use of the phrase utility function, because when people read it, they think time independent utility function, which is why, I think, there is so little consideration of unstable AI.
Humans do not behave even closely to VNM-rationality, and there’s no clear evidence for some underlying VNM preferences that are being deviated from.