But that’s a situation in which we have a vast number of things that might somewhat-plausibly turn out to be chocolate and severely limited resources. It’s not obvious that we can do better.
It could be a situation where we have very large resources and an exponentially large concept space.
No, we do different things depending on our utility function.
I think the justification for bounded utility functions might not be because they are the true utility functions, but because they avoid certain problems with utility functions that can tend to infinity. I could infinitely value infinite chocolate, and then the AI values all courses of action which have a nonzero probability of infinite chocolate (which is all of them) with infinite expected utility, and cannot make a choice.
In this case the question is not to get the AI to follow our utility function, but to follow one similar enough to lead to a positive outcome, while avoiding problems to do with infinity.
Perhaps the problem here is that one is bounding utility but is allowing arbitrary large concept space. If concept space is bounded by setting all probabilities < epsilon equal to zero for the purposes of utility functions, then this problem would not arise, although I suspect that this approach may cause problems of its own.
It could be a situation where we have very large resources and an exponentially large concept space.
If we have enough uncertainty about what bit of concept space we’re looking for to make a power-law distribution appropriate, then “very large” can still be “severely limited” (and indeed must be to make the amount of resources going to each kind of maybe-chocolate be small).
true utility functions [...] problems with utility functions that can tend to infinity.
Yes. But I wouldn’t characterize this as giving the AI an approximation to our utility function that avoids problems to do with infinity—because I don’t think we have a utility function in a strong enough sense for this to be distinguishable from giving the AI our utility function. We have a vague hazy idea of utility that we can (unreliably, with great effort) by a little bit quantitative about in “small” easy cases; we don’t truly either feel or behave according to any utility function; but we want to give the AI a utility function that will make it do things we approve of, even though its decisions may be influenced by looking at things far beyond our cognitive capacity.
It’s not clear to me that that’s a sensible project at all, but it certainly isn’t anything so simple as taking something that Really Is our utility function but misbehaves “at infinity” and patching it to tame the misbehaviour :-).
I don’t think we have a utility function in a strong enough sense
All the underlying axioms of expected utility theory (EUT) seem self-evident to me. The fact that most people don’t shut up and multiply is something I would regard as more of their problem then a problem with EUT. Having said that, even if mapping emotions onto utility values makes sense from some abstract theoretical point of view, its a lot harder in practice due to reasons such as the complex fragility of human values which have been thoroughly discussed already.
Of course, the degree to which the average LWer approximates EUT in their feelings and behaviour is probably far greater than that of the average person. At non-LW philosophy meetups I have been told I am ‘disturbingly analytical’ for advocating EUT.
It’s not clear to me that that’s a sensible project at all, but it certainly isn’t anything so simple as taking something that Really Is our utility function but misbehaves “at infinity” and patching it to tame the misbehaviour :-).
Well, I suppose there is the option of ‘empathic AI’. Reverse engineering the brain and dialling compassion up to 11 is in many ways easier and more brute-force-able than creating de novo AI and it avoids all these defining utility function problems, the Basilisk, and Lob’s theory. The downsides of course include a far greater unpredictability, the AI would definitely be sentient and some would argue the possibility of catastrophic failure during self-modification.
The fact that most people don’t shut up and multiply is something I would regard as more of their problem than a problem with EUT.
I didn’t say that we shouldn’t have a utility function, I said we don’t. Our actual preferences are incompletely defined, inconsistent, and generally a mess. I suspect this is true even for most LWers, and I’m pretty much certain it’s true for almost all people, and (in so far as it’s meaningful) for the human race as a whole.
It could be a situation where we have very large resources and an exponentially large concept space.
I think the justification for bounded utility functions might not be because they are the true utility functions, but because they avoid certain problems with utility functions that can tend to infinity. I could infinitely value infinite chocolate, and then the AI values all courses of action which have a nonzero probability of infinite chocolate (which is all of them) with infinite expected utility, and cannot make a choice. In this case the question is not to get the AI to follow our utility function, but to follow one similar enough to lead to a positive outcome, while avoiding problems to do with infinity.
Perhaps the problem here is that one is bounding utility but is allowing arbitrary large concept space. If concept space is bounded by setting all probabilities < epsilon equal to zero for the purposes of utility functions, then this problem would not arise, although I suspect that this approach may cause problems of its own.
If we have enough uncertainty about what bit of concept space we’re looking for to make a power-law distribution appropriate, then “very large” can still be “severely limited” (and indeed must be to make the amount of resources going to each kind of maybe-chocolate be small).
Yes. But I wouldn’t characterize this as giving the AI an approximation to our utility function that avoids problems to do with infinity—because I don’t think we have a utility function in a strong enough sense for this to be distinguishable from giving the AI our utility function. We have a vague hazy idea of utility that we can (unreliably, with great effort) by a little bit quantitative about in “small” easy cases; we don’t truly either feel or behave according to any utility function; but we want to give the AI a utility function that will make it do things we approve of, even though its decisions may be influenced by looking at things far beyond our cognitive capacity.
It’s not clear to me that that’s a sensible project at all, but it certainly isn’t anything so simple as taking something that Really Is our utility function but misbehaves “at infinity” and patching it to tame the misbehaviour :-).
All the underlying axioms of expected utility theory (EUT) seem self-evident to me. The fact that most people don’t shut up and multiply is something I would regard as more of their problem then a problem with EUT. Having said that, even if mapping emotions onto utility values makes sense from some abstract theoretical point of view, its a lot harder in practice due to reasons such as the complex fragility of human values which have been thoroughly discussed already.
Of course, the degree to which the average LWer approximates EUT in their feelings and behaviour is probably far greater than that of the average person. At non-LW philosophy meetups I have been told I am ‘disturbingly analytical’ for advocating EUT.
Well, I suppose there is the option of ‘empathic AI’. Reverse engineering the brain and dialling compassion up to 11 is in many ways easier and more brute-force-able than creating de novo AI and it avoids all these defining utility function problems, the Basilisk, and Lob’s theory. The downsides of course include a far greater unpredictability, the AI would definitely be sentient and some would argue the possibility of catastrophic failure during self-modification.
I didn’t say that we shouldn’t have a utility function, I said we don’t. Our actual preferences are incompletely defined, inconsistent, and generally a mess. I suspect this is true even for most LWers, and I’m pretty much certain it’s true for almost all people, and (in so far as it’s meaningful) for the human race as a whole.