I think the simpler solution is just to use a bounded utility function. There are several things suggesting we do this, and I really don’t see any reason to not do so, instead of going through contortions to make unbounded utility work.
Using a bounded utility function is the what you do if and only if your preferences happen to be bounded in that way. The utility function is not up for grabs. You don’t change the utility function because it makes decision making more convenient (well, unless you have done a lot of homework).
As it happens I don’t make (hypothetical) decisions as if assign linear value to person-lives. That is because as best as I can tell my actual preference really does assign less value to the 3^^^3rd person-life created than to the 5th person-life. However, someone who does care just as much about each additional person would be making an error if they acted as if they had a bounded utility function.
Your argument proves a bit too much, I think. I could equally well reply, “Using a utility function is what you do if and only if your preferences are described by a utility function. Terminal values are not up for grabs. You don’t reduce your terminal values to a utility function just because it makes decision making more convenient.”
The fact of the matter is that our preferences are not naturally described by a utility function; so if we’ve agreed that the AI should use a utility function, well, there must be some reason for that other than “it’s a correct description of our preferences”, i.e., we’ve agreed that such reasons are worth consideration. And I don’t see any such reason that doesn’t also immediately suggest we should use a bounded utility function (at least, if we want to be able to consider infinite gambles).
So I’m having trouble believing that your position is consistent. If you said we should do away with utility functions entirely to better model human terminal values, that would make sense. But why would you throw out the bounded part, and then keep the utility function part? I’m having trouble seeing any line of reasoning that would support both of those simultaneously. (Well unless you want to throw out infinite gambles, which does seems like a consistent position. Note, though, that in that case we also don’t have to do contortions like in this post.)
Edit: Added notes about finite vs. infinite gambles.
Your argument proves a bit too much, I think. I could equally well reply, “Using a utility function is what you do if and only if your preferences are described by a utility function. Terminal values are not up for grabs. You don’t reduce your terminal values to a utility function just because it makes decision making more convenient.”
If you were to generalise it would have to be to something like “only if your preferences can be represented without loss as a utility function”. Even then there are exceptions. However the intricacies of resolving complex and internally inconsistent agents seems rather orthogonal to the issue of how a given agent would behave in the counter-factual scenario presented.
So I’m having trouble believing that your position is consistent. If you said we should do away with utility functions entirely to better model human terminal values, that would make sense. But why would you throw out the bounded part, and then keep the utility function part? I’m having trouble seeing any line of reasoning that would support both of those simultaneously.
Meanwhile, I evaluate your solution to this problem (throw away the utility function and replace it with a different one) to be equivalent to, when encountering Newcomb’s Problem, choosing the response “Self modify into a paperclip maximiser, just for the hell of it, then choose whichever box choice maximises paperclips”. That it seems to be persuasive to readers makes this thread all too surreal for me. Tapping out before candidness causes difficulties.
If you were to generalise it would have to be to something like “only if your preferences can be represented without loss as a utility function”
It’s not clear to me what distinction you are attempting to draw between “Can be described by a utility function” and “can be represented without loss as a utility function”. I don’t think any such distinction can sensibly be drawn. They seem to simply say the same thing.
Even then there are exceptions.
I’d ask you to explain, but, well, I guess you’re not going to.
(throw away the utility function and replace it with a different one)
I’m not throwing out the utility function and replacing it with a different one, because there is no utility function. What there is is a bunch of preferences that don’t satisfy Savage’s axioms (or the VNM axioms or whichever formulation you prefer) and as such cannot actually be described by a utility function. Again—everything you’ve said works perfectly well as an argument against utility functions generally. (“You’re tossing out human preferences and using a utility function? So, what, when presented with Newcomb’s problem, you self-modify into a paperclipper and then pick the paperclip-maximizing box?”)
Perhaps I should explain in more detail how I’m thinking about this.
We want to implement an AI, and we want it to be rational in certain senses—i.e. obey certain axioms—while still implementing human values. Human preferences don’t satisfy these axioms. We could just give it human preferences and not worry about the intransitivity and the dynamic inconsistencies and such, or, we could force it a bit.
So we imagine that we have some (as yet unknown) procedure that takes a general set of preferences and converts it to one satisfying certain requirements (specific to the procedure). Obviously something is lost in the process. Are we OK with this? I don’t know. I’m not making a claim either way about this. But you are going to lose something if you apply this procedure.
OK, so we feed in a set of preferences and we get out one satisfying our requirements. What are our requirements? If they’re Savage’s axioms, we get out something that can be described by a utility function, and a bounded one at that. If they’re Savage’s axioms without axiom 7, or (if we take probability as a primitive) the VNM axioms, then we get out something that for finite gambles can be described by a utility function (not necessarily bounded), but which cannot necessarily be easily described for infinite gambles.
If I’m understanding you correctly, you’re reading me as suggesting a two-step process: First we take human values and force them into a utility function, then take that utility function and force it to be bounded. I am not suggesting that. Rather, I am saying, we take human values and force them to satisfy certain properties, and the result can then necessarily be described by a bounded utility function.
People on this site seem to often just assume that being rational means using a utility function, not remembering that a utility function is just how we describe sets of preferences satisfying certain axioms. It’s not whether you use a utility function or not that it’s important, it’s questions like, are your preferences transitive? Do they obey the sure-thing principle? And so forth. Now, sure, the only way to obey all those requirements is to use a utility function, but it’s important to keep the reason in mind.
If we require the output of our procedure to obey Savage’s axioms, it can be described by a bounded utility function. That’s just a fact. If we leave out axiom 7 (or use the VNM axioms), then it can kind of be described by a utility function—for finite gambles it can be described by a utility function, and it’s not clear what happens for infinite gambles.
So do you include axiom 7 or no? (Well, OK, you might just use a different set of requirements entirely, but let’s assume it’s one of these two sets of requirements for now.) If yes, the output of your procedure will be a bounded utility function, and you don’t run into these problems with nonconvergence. If no, you also don’t run into these problems with nonconvergence—the procedure is required to output a coherent set of preferences, after all! -- but for a different reason: Because the set of preferences it output can only be modeled by a utility function for finite gambles. So if you start taking infinite weighted sums of utilities, the result doesn’t necessarily tell you anything about which one to choose.
So at no point should you be taking infinite sums with an unbounded utility function, because there is no underlying reason to do so. The only reason to do so that I can see is that, for your requirements, you’ve simply declared, “We’re going to require that the output of the procedure can be described by a utility function (including for infinite gambles).” But that’s just a silly set of requirements. As I said above—it’s not failing to use a utility function we should be avoiding; it’s the actual problems this causes we should be avoiding. Declaring at the outset we’re going to use a utility function, instead of that we want to avoid particular problems, is silly. I don’t see why you’d want to run human values through such a poorly motivated procedure.
So again, I’m not claiming you want to run your values through the machine and force them into a bounded utility function; but rather just that, if you want to run them through this one machine, you will get a bounded utility function; and if instead you run them through this other machine, you will get a utility function, kind of, but it won’t necessarily be valid for infinite gambles. Eliezer seems to want to run human values through the machine. Which one will he disprefer less? Well, he always seems to assume that comparing the expected utilities of infinite gambles is a valid operation, so I’m inferring he’d prefer the first one, and that one only outputs bounded utility functions. Maybe I’m wrong. But in that case he should stop assuming that comparing the expected utilities of infinite gambles is a valid operation.
You still get a probability function without Savage’s P6 and P7, you just don’t get a utility function with codomain the reals, and you don’t get expectations over infinite outcome spaces. If we add real-valued probabilities, for example by assuming Savage’s P6′, you even get finite expectations, assuming I haven’t made an error.
You don’t change the utility function because it makes decision making more convenient [..] someone who does care just as much about each additional person would be making an error if they acted as if they had a bounded utility function.
True.
That said, given some statement P about my preferences, such as “I assign linear value to person-lives,” such that P being true makes decision-making inconvenient, if I currently have C confidence in P then depending on C it may be more worthwhile to devote my time to gathering additional evidence for and against P than to developing a decision procedure that works in the inconvenient case.
On the other hand, if I keep gathering evidence about P until I conclude that P is false and then stop, that also has an obvious associated failure mode.
Using a bounded utility function is the what you do if and only if your preferences happen to be bounded in that way. The utility function is not up for grabs. You don’t change the utility function because it makes decision making more convenient (well, unless you have done a lot of homework).
As it happens I don’t make (hypothetical) decisions as if assign linear value to person-lives. That is because as best as I can tell my actual preference really does assign less value to the 3^^^3rd person-life created than to the 5th person-life. However, someone who does care just as much about each additional person would be making an error if they acted as if they had a bounded utility function.
Your argument proves a bit too much, I think. I could equally well reply, “Using a utility function is what you do if and only if your preferences are described by a utility function. Terminal values are not up for grabs. You don’t reduce your terminal values to a utility function just because it makes decision making more convenient.”
The fact of the matter is that our preferences are not naturally described by a utility function; so if we’ve agreed that the AI should use a utility function, well, there must be some reason for that other than “it’s a correct description of our preferences”, i.e., we’ve agreed that such reasons are worth consideration. And I don’t see any such reason that doesn’t also immediately suggest we should use a bounded utility function (at least, if we want to be able to consider infinite gambles).
So I’m having trouble believing that your position is consistent. If you said we should do away with utility functions entirely to better model human terminal values, that would make sense. But why would you throw out the bounded part, and then keep the utility function part? I’m having trouble seeing any line of reasoning that would support both of those simultaneously. (Well unless you want to throw out infinite gambles, which does seems like a consistent position. Note, though, that in that case we also don’t have to do contortions like in this post.)
Edit: Added notes about finite vs. infinite gambles.
If you were to generalise it would have to be to something like “only if your preferences can be represented without loss as a utility function”. Even then there are exceptions. However the intricacies of resolving complex and internally inconsistent agents seems rather orthogonal to the issue of how a given agent would behave in the counter-factual scenario presented.
Meanwhile, I evaluate your solution to this problem (throw away the utility function and replace it with a different one) to be equivalent to, when encountering Newcomb’s Problem, choosing the response “Self modify into a paperclip maximiser, just for the hell of it, then choose whichever box choice maximises paperclips”. That it seems to be persuasive to readers makes this thread all too surreal for me. Tapping out before candidness causes difficulties.
It’s not clear to me what distinction you are attempting to draw between “Can be described by a utility function” and “can be represented without loss as a utility function”. I don’t think any such distinction can sensibly be drawn. They seem to simply say the same thing.
I’d ask you to explain, but, well, I guess you’re not going to.
I’m not throwing out the utility function and replacing it with a different one, because there is no utility function. What there is is a bunch of preferences that don’t satisfy Savage’s axioms (or the VNM axioms or whichever formulation you prefer) and as such cannot actually be described by a utility function. Again—everything you’ve said works perfectly well as an argument against utility functions generally. (“You’re tossing out human preferences and using a utility function? So, what, when presented with Newcomb’s problem, you self-modify into a paperclipper and then pick the paperclip-maximizing box?”)
Perhaps I should explain in more detail how I’m thinking about this.
We want to implement an AI, and we want it to be rational in certain senses—i.e. obey certain axioms—while still implementing human values. Human preferences don’t satisfy these axioms. We could just give it human preferences and not worry about the intransitivity and the dynamic inconsistencies and such, or, we could force it a bit.
So we imagine that we have some (as yet unknown) procedure that takes a general set of preferences and converts it to one satisfying certain requirements (specific to the procedure). Obviously something is lost in the process. Are we OK with this? I don’t know. I’m not making a claim either way about this. But you are going to lose something if you apply this procedure.
OK, so we feed in a set of preferences and we get out one satisfying our requirements. What are our requirements? If they’re Savage’s axioms, we get out something that can be described by a utility function, and a bounded one at that. If they’re Savage’s axioms without axiom 7, or (if we take probability as a primitive) the VNM axioms, then we get out something that for finite gambles can be described by a utility function (not necessarily bounded), but which cannot necessarily be easily described for infinite gambles.
If I’m understanding you correctly, you’re reading me as suggesting a two-step process: First we take human values and force them into a utility function, then take that utility function and force it to be bounded. I am not suggesting that. Rather, I am saying, we take human values and force them to satisfy certain properties, and the result can then necessarily be described by a bounded utility function.
People on this site seem to often just assume that being rational means using a utility function, not remembering that a utility function is just how we describe sets of preferences satisfying certain axioms. It’s not whether you use a utility function or not that it’s important, it’s questions like, are your preferences transitive? Do they obey the sure-thing principle? And so forth. Now, sure, the only way to obey all those requirements is to use a utility function, but it’s important to keep the reason in mind.
If we require the output of our procedure to obey Savage’s axioms, it can be described by a bounded utility function. That’s just a fact. If we leave out axiom 7 (or use the VNM axioms), then it can kind of be described by a utility function—for finite gambles it can be described by a utility function, and it’s not clear what happens for infinite gambles.
So do you include axiom 7 or no? (Well, OK, you might just use a different set of requirements entirely, but let’s assume it’s one of these two sets of requirements for now.) If yes, the output of your procedure will be a bounded utility function, and you don’t run into these problems with nonconvergence. If no, you also don’t run into these problems with nonconvergence—the procedure is required to output a coherent set of preferences, after all! -- but for a different reason: Because the set of preferences it output can only be modeled by a utility function for finite gambles. So if you start taking infinite weighted sums of utilities, the result doesn’t necessarily tell you anything about which one to choose.
So at no point should you be taking infinite sums with an unbounded utility function, because there is no underlying reason to do so. The only reason to do so that I can see is that, for your requirements, you’ve simply declared, “We’re going to require that the output of the procedure can be described by a utility function (including for infinite gambles).” But that’s just a silly set of requirements. As I said above—it’s not failing to use a utility function we should be avoiding; it’s the actual problems this causes we should be avoiding. Declaring at the outset we’re going to use a utility function, instead of that we want to avoid particular problems, is silly. I don’t see why you’d want to run human values through such a poorly motivated procedure.
So again, I’m not claiming you want to run your values through the machine and force them into a bounded utility function; but rather just that, if you want to run them through this one machine, you will get a bounded utility function; and if instead you run them through this other machine, you will get a utility function, kind of, but it won’t necessarily be valid for infinite gambles. Eliezer seems to want to run human values through the machine. Which one will he disprefer less? Well, he always seems to assume that comparing the expected utilities of infinite gambles is a valid operation, so I’m inferring he’d prefer the first one, and that one only outputs bounded utility functions. Maybe I’m wrong. But in that case he should stop assuming that comparing the expected utilities of infinite gambles is a valid operation.
You still get a probability function without Savage’s P6 and P7, you just don’t get a utility function with codomain the reals, and you don’t get expectations over infinite outcome spaces. If we add real-valued probabilities, for example by assuming Savage’s P6′, you even get finite expectations, assuming I haven’t made an error.
True.
That said, given some statement P about my preferences, such as “I assign linear value to person-lives,” such that P being true makes decision-making inconvenient, if I currently have C confidence in P then depending on C it may be more worthwhile to devote my time to gathering additional evidence for and against P than to developing a decision procedure that works in the inconvenient case.
On the other hand, if I keep gathering evidence about P until I conclude that P is false and then stop, that also has an obvious associated failure mode.