But that’s a situation in which we have a vast number of things that might somewhat-plausibly turn out to be chocolate and severely limited resources. It’s not obvious that we can do better.
“But we do OK if we use one sigmoid utility function and not if we use another!”
No, we do different things depending on our utility function. That isn’t a problem; it’s what utility functions are for. And what’s “OK” depends on what the probabilities are, what your resources are, and how much you value different amounts of chocolate. Which, again, is not a problem but exactly how things should be.
But that’s a situation in which we have a vast number of things that might somewhat-plausibly turn out to be chocolate and severely limited resources. It’s not obvious that we can do better.
It could be a situation where we have very large resources and an exponentially large concept space.
No, we do different things depending on our utility function.
I think the justification for bounded utility functions might not be because they are the true utility functions, but because they avoid certain problems with utility functions that can tend to infinity. I could infinitely value infinite chocolate, and then the AI values all courses of action which have a nonzero probability of infinite chocolate (which is all of them) with infinite expected utility, and cannot make a choice.
In this case the question is not to get the AI to follow our utility function, but to follow one similar enough to lead to a positive outcome, while avoiding problems to do with infinity.
Perhaps the problem here is that one is bounding utility but is allowing arbitrary large concept space. If concept space is bounded by setting all probabilities < epsilon equal to zero for the purposes of utility functions, then this problem would not arise, although I suspect that this approach may cause problems of its own.
It could be a situation where we have very large resources and an exponentially large concept space.
If we have enough uncertainty about what bit of concept space we’re looking for to make a power-law distribution appropriate, then “very large” can still be “severely limited” (and indeed must be to make the amount of resources going to each kind of maybe-chocolate be small).
true utility functions [...] problems with utility functions that can tend to infinity.
Yes. But I wouldn’t characterize this as giving the AI an approximation to our utility function that avoids problems to do with infinity—because I don’t think we have a utility function in a strong enough sense for this to be distinguishable from giving the AI our utility function. We have a vague hazy idea of utility that we can (unreliably, with great effort) by a little bit quantitative about in “small” easy cases; we don’t truly either feel or behave according to any utility function; but we want to give the AI a utility function that will make it do things we approve of, even though its decisions may be influenced by looking at things far beyond our cognitive capacity.
It’s not clear to me that that’s a sensible project at all, but it certainly isn’t anything so simple as taking something that Really Is our utility function but misbehaves “at infinity” and patching it to tame the misbehaviour :-).
I don’t think we have a utility function in a strong enough sense
All the underlying axioms of expected utility theory (EUT) seem self-evident to me. The fact that most people don’t shut up and multiply is something I would regard as more of their problem then a problem with EUT. Having said that, even if mapping emotions onto utility values makes sense from some abstract theoretical point of view, its a lot harder in practice due to reasons such as the complex fragility of human values which have been thoroughly discussed already.
Of course, the degree to which the average LWer approximates EUT in their feelings and behaviour is probably far greater than that of the average person. At non-LW philosophy meetups I have been told I am ‘disturbingly analytical’ for advocating EUT.
It’s not clear to me that that’s a sensible project at all, but it certainly isn’t anything so simple as taking something that Really Is our utility function but misbehaves “at infinity” and patching it to tame the misbehaviour :-).
Well, I suppose there is the option of ‘empathic AI’. Reverse engineering the brain and dialling compassion up to 11 is in many ways easier and more brute-force-able than creating de novo AI and it avoids all these defining utility function problems, the Basilisk, and Lob’s theory. The downsides of course include a far greater unpredictability, the AI would definitely be sentient and some would argue the possibility of catastrophic failure during self-modification.
The fact that most people don’t shut up and multiply is something I would regard as more of their problem than a problem with EUT.
I didn’t say that we shouldn’t have a utility function, I said we don’t. Our actual preferences are incompletely defined, inconsistent, and generally a mess. I suspect this is true even for most LWers, and I’m pretty much certain it’s true for almost all people, and (in so far as it’s meaningful) for the human race as a whole.
Certainly given a utility function and a model, the best thing to do is what it is. The point was to show that some utility functions (eg using the exponential-decay sigmoid) have counterintuitive properties that don’t match what we’d actually want.
Every response to this post that takes the utility function for granted and remarks that the optimum is the optimum is missing the point: we don’t know what kind of utility function is reasonable, and we’re showing evidence that some of them give optima that aren’t what we’d actually want if we were turning the world into chocolate/hedonium.
If it seems strange to you to consider representing what you want by a bounded utility function, a post about that will be forthcoming.
No, it doesn’t seem strange to me to consider representing what I want by a bounded utility function. It seems strange to consider representing what I want by a utility function that converges exponentially fast towards its bound.
I’ll repeat something I said in another comment:
You might say it’s a suboptimal outcome even though it’s a good one, but to make that claim it seems to me you have to do an actual expected-utility calculation. And we know what that expected-utility calculation says: it says that the resource allocation you’re objecting to is, in fact, the optimal one.
Or you might say it’s a suboptimal outcome because you just know that this allocation is bad, or something. Which amounts to saying that actually you know what the utility function should be and it isn’t the one the analysis assumes.
I have some sympathy with that last option. A utility function that not only is bounded but converges exponentially fast towards its bound feels pretty counterintuitive. It’s not a big surprise, surely, if such a counterintuitive choice of utility function yields wrong-looking resource allocations?
(Remark 1: the above is a comment that remarks that the optimum is the optimum but is visibly not missing the point by failing to appreciate that we might be constructing a utility function and trying to make it do good-looking things, rather than approximating a utility function we already have.)
(Remark 2: I think I can imagine situations in which we might consider making the relationship between chocolate and utility converge very fast—in fact, taking “chocolate” literally rather than metaphorically might yield such a situation. But in those situations, I also think the results you get from your exponentially-converging utility function aren’t obviously unreasonable.)
Cool. Regarding bounded utility functions, I didn’t mean you personally, I meant the generic you; as you can see elsewhere in the thread, some people do find it rather strange to think of modelling what you actually want as a bounded utility function.
This is where I thought you were missing the point:
Or you might say it’s a suboptimal outcome because you just know that this allocation is bad, or something. Which amounts to saying that actually you know what the utility function should be and it isn’t the one the analysis assumes.
Sometimes we (seem to) have stronger intuitions about allocations than about the utility function itself, and parlaying that to identify what the utility function should be is what this post is about. This may seem like a non-step to you; in that case you’ve already got it. Cheers! I admit it’s not a difficult point. Or if you always have stronger intuitions about the utility function than about resource allocation, then maybe this is useless to you.
I agree with you that there are some situations where the sublinear allocation (and exponentially-converging utility function) seems wrong and some where it seems fine; perhaps the post should initially have said “person-enjoying-chocolate-tronium” rather than chocolate.
The point was to show that some utility functions (eg using the exponential-decay sigmoid) have counterintuitive properties that don’t match what we’d actually want.
You still haven’t answered my question of why we don’t want those properties. To me, they don’t seem counter-intuitive at all.
But that’s a situation in which we have a vast number of things that might somewhat-plausibly turn out to be chocolate and severely limited resources. It’s not obvious that we can do better.
“But we do OK if we use one sigmoid utility function and not if we use another!”
No, we do different things depending on our utility function. That isn’t a problem; it’s what utility functions are for. And what’s “OK” depends on what the probabilities are, what your resources are, and how much you value different amounts of chocolate. Which, again, is not a problem but exactly how things should be.
It could be a situation where we have very large resources and an exponentially large concept space.
I think the justification for bounded utility functions might not be because they are the true utility functions, but because they avoid certain problems with utility functions that can tend to infinity. I could infinitely value infinite chocolate, and then the AI values all courses of action which have a nonzero probability of infinite chocolate (which is all of them) with infinite expected utility, and cannot make a choice. In this case the question is not to get the AI to follow our utility function, but to follow one similar enough to lead to a positive outcome, while avoiding problems to do with infinity.
Perhaps the problem here is that one is bounding utility but is allowing arbitrary large concept space. If concept space is bounded by setting all probabilities < epsilon equal to zero for the purposes of utility functions, then this problem would not arise, although I suspect that this approach may cause problems of its own.
If we have enough uncertainty about what bit of concept space we’re looking for to make a power-law distribution appropriate, then “very large” can still be “severely limited” (and indeed must be to make the amount of resources going to each kind of maybe-chocolate be small).
Yes. But I wouldn’t characterize this as giving the AI an approximation to our utility function that avoids problems to do with infinity—because I don’t think we have a utility function in a strong enough sense for this to be distinguishable from giving the AI our utility function. We have a vague hazy idea of utility that we can (unreliably, with great effort) by a little bit quantitative about in “small” easy cases; we don’t truly either feel or behave according to any utility function; but we want to give the AI a utility function that will make it do things we approve of, even though its decisions may be influenced by looking at things far beyond our cognitive capacity.
It’s not clear to me that that’s a sensible project at all, but it certainly isn’t anything so simple as taking something that Really Is our utility function but misbehaves “at infinity” and patching it to tame the misbehaviour :-).
All the underlying axioms of expected utility theory (EUT) seem self-evident to me. The fact that most people don’t shut up and multiply is something I would regard as more of their problem then a problem with EUT. Having said that, even if mapping emotions onto utility values makes sense from some abstract theoretical point of view, its a lot harder in practice due to reasons such as the complex fragility of human values which have been thoroughly discussed already.
Of course, the degree to which the average LWer approximates EUT in their feelings and behaviour is probably far greater than that of the average person. At non-LW philosophy meetups I have been told I am ‘disturbingly analytical’ for advocating EUT.
Well, I suppose there is the option of ‘empathic AI’. Reverse engineering the brain and dialling compassion up to 11 is in many ways easier and more brute-force-able than creating de novo AI and it avoids all these defining utility function problems, the Basilisk, and Lob’s theory. The downsides of course include a far greater unpredictability, the AI would definitely be sentient and some would argue the possibility of catastrophic failure during self-modification.
I didn’t say that we shouldn’t have a utility function, I said we don’t. Our actual preferences are incompletely defined, inconsistent, and generally a mess. I suspect this is true even for most LWers, and I’m pretty much certain it’s true for almost all people, and (in so far as it’s meaningful) for the human race as a whole.
Certainly given a utility function and a model, the best thing to do is what it is. The point was to show that some utility functions (eg using the exponential-decay sigmoid) have counterintuitive properties that don’t match what we’d actually want.
Every response to this post that takes the utility function for granted and remarks that the optimum is the optimum is missing the point: we don’t know what kind of utility function is reasonable, and we’re showing evidence that some of them give optima that aren’t what we’d actually want if we were turning the world into chocolate/hedonium.
If it seems strange to you to consider representing what you want by a bounded utility function, a post about that will be forthcoming.
No, it doesn’t seem strange to me to consider representing what I want by a bounded utility function. It seems strange to consider representing what I want by a utility function that converges exponentially fast towards its bound.
I’ll repeat something I said in another comment:
(Remark 1: the above is a comment that remarks that the optimum is the optimum but is visibly not missing the point by failing to appreciate that we might be constructing a utility function and trying to make it do good-looking things, rather than approximating a utility function we already have.)
(Remark 2: I think I can imagine situations in which we might consider making the relationship between chocolate and utility converge very fast—in fact, taking “chocolate” literally rather than metaphorically might yield such a situation. But in those situations, I also think the results you get from your exponentially-converging utility function aren’t obviously unreasonable.)
Cool. Regarding bounded utility functions, I didn’t mean you personally, I meant the generic you; as you can see elsewhere in the thread, some people do find it rather strange to think of modelling what you actually want as a bounded utility function.
This is where I thought you were missing the point:
Sometimes we (seem to) have stronger intuitions about allocations than about the utility function itself, and parlaying that to identify what the utility function should be is what this post is about. This may seem like a non-step to you; in that case you’ve already got it. Cheers! I admit it’s not a difficult point. Or if you always have stronger intuitions about the utility function than about resource allocation, then maybe this is useless to you.
I agree with you that there are some situations where the sublinear allocation (and exponentially-converging utility function) seems wrong and some where it seems fine; perhaps the post should initially have said “person-enjoying-chocolate-tronium” rather than chocolate.
You still haven’t answered my question of why we don’t want those properties. To me, they don’t seem counter-intuitive at all.