A utility function sounds like the sort of computery thing an AI programme ought be expected to have, but actual is an idealized way of describing a rational agent that can’t be translated into code,
If your preferences about possible states of the world follow a few very reasonable constraints, then (somewhat surprisingly) your preferences can be modeled by a utility function. An agent with a reasonably coherent set of preferences can be talked about as if it optimizes a utility function, even if that’s not the way it was programmed. See VNM rationality.
I agree with this, but that doesn’t mean the model has to be useful. For example you could say that I have a utility function that assigns a utility of 1 to all the actions I actually take, and a utility of 0 to all the actions that I don’t. But this would be similar to saying that you could make a giant look-up table which would be a model of my responses in conversation. Nonetheless, if you attempt to program an AI with a GLUT for conversation, it will not do well at all in conversation, and if you attempt to program an AI with the above model of human behavior, it will do very badly.
In other words, theoretically there is such a model, but in practice this is not how a human being is made and it shouldn’t be how an AI is made.
Humans can be turned into money pumps. Consequently, the most important point is to make sure that your AI can be turned into a money pump, since if you don’t, it will automatically diverge from human values.
If this is what you are arguing, it would take a lot to convince me of that position.
Here’s the argument I think you’re making:
Don’t make AIs try to optimize stuff without bound. If you try to optimize any fixed objective function without bound, you will end up sacrificing all else that you hold dear.
I agree that optimizing without bound seems likely to kill you. If a safe alternative approach is possible, I don’t know what it would be. My guess would be most alternative approaches are equivalent to an optimization problem.
Right, the second argument is the one that concerns me, since it should be possible to convince people to adjust their preferences in some way that will make them consistent.
My suggestion here was simply to adopt a hard limit to the utility function. So for example instead of valuing lifespan without limit, there would be some value such that the AI is indifferent to extending it even more. This kind of AI might take the lifespan deal up to a certain point, but it would not keep taking it permanently, and in this way it would avoid driving its probability of survival down to a limit of zero.
I think Eliezer does not like this idea because he claims to value life infinitely, assigning ever greater values to longer lifespans and an infinite value to an infinite lifespan. But he is wrong about his own values, because being a limited being he cannot actually care infinitely about anything, and this is why the lifespan dilemma bothers him. If he actually cared infinitely, as he claims, then he would not mind driving his probability of survival down to zero.
I am not saying (as he has elsewhere described this) that “the utility function is up for grabs.” I am saying that if you understand yourself correctly, you will see that you do not yourself assign an infinite value to anything, so it would be a serious and possibly fatal mistake to make a machine that assigns an infinite value to something.
Yeah, I follow. I’ll bring up another wrinkle (which you may already be familiar with): Suppose the objective you’re maximizing never equals or exceeds 20. You can reach to 19.994, 19.9999993, 19.9999999999999995, but never actually reach 20. Then even though your objective function is bounded, you will still try to optimize forever, and may resort to increasingly desperate measures to eek out another .000000000000000000000000001.
Yes, this would happen if you take an unbounded function and simply map it to a bounded function without actually changing it. That is why I am suggesting admitting that you really don’t have an infinite capacity for caring, and describing what you care about as though you did care infinitely is mistaken, whether you describe this with an unbounded or with a bounded function. This requires admitting that scope insensitivity, after a certain point, is not a bias, but just an objective fact that at a certain point you really don’t care anymore.
Good points, shamefully downoted.
A utility function sounds like the sort of computery thing an AI programme ought be expected to have, but actual is an idealized way of describing a rational agent that can’t be translated into code,
If your preferences about possible states of the world follow a few very reasonable constraints, then (somewhat surprisingly) your preferences can be modeled by a utility function. An agent with a reasonably coherent set of preferences can be talked about as if it optimizes a utility function, even if that’s not the way it was programmed. See VNM rationality.
I agree with this, but that doesn’t mean the model has to be useful. For example you could say that I have a utility function that assigns a utility of 1 to all the actions I actually take, and a utility of 0 to all the actions that I don’t. But this would be similar to saying that you could make a giant look-up table which would be a model of my responses in conversation. Nonetheless, if you attempt to program an AI with a GLUT for conversation, it will not do well at all in conversation, and if you attempt to program an AI with the above model of human behavior, it will do very badly.
In other words, theoretically there is such a model, but in practice this is not how a human being is made and it shouldn’t be how an AI is made.
Here’s the argument I was hearing:
If this is what you are arguing, it would take a lot to convince me of that position.
Here’s the argument I think you’re making:
I agree that optimizing without bound seems likely to kill you. If a safe alternative approach is possible, I don’t know what it would be. My guess would be most alternative approaches are equivalent to an optimization problem.
Right, the second argument is the one that concerns me, since it should be possible to convince people to adjust their preferences in some way that will make them consistent.
My suggestion here was simply to adopt a hard limit to the utility function. So for example instead of valuing lifespan without limit, there would be some value such that the AI is indifferent to extending it even more. This kind of AI might take the lifespan deal up to a certain point, but it would not keep taking it permanently, and in this way it would avoid driving its probability of survival down to a limit of zero.
I think Eliezer does not like this idea because he claims to value life infinitely, assigning ever greater values to longer lifespans and an infinite value to an infinite lifespan. But he is wrong about his own values, because being a limited being he cannot actually care infinitely about anything, and this is why the lifespan dilemma bothers him. If he actually cared infinitely, as he claims, then he would not mind driving his probability of survival down to zero.
I am not saying (as he has elsewhere described this) that “the utility function is up for grabs.” I am saying that if you understand yourself correctly, you will see that you do not yourself assign an infinite value to anything, so it would be a serious and possibly fatal mistake to make a machine that assigns an infinite value to something.
Yeah, I follow. I’ll bring up another wrinkle (which you may already be familiar with): Suppose the objective you’re maximizing never equals or exceeds 20. You can reach to 19.994, 19.9999993, 19.9999999999999995, but never actually reach 20. Then even though your objective function is bounded, you will still try to optimize forever, and may resort to increasingly desperate measures to eek out another .000000000000000000000000001.
Yes, this would happen if you take an unbounded function and simply map it to a bounded function without actually changing it. That is why I am suggesting admitting that you really don’t have an infinite capacity for caring, and describing what you care about as though you did care infinitely is mistaken, whether you describe this with an unbounded or with a bounded function. This requires admitting that scope insensitivity, after a certain point, is not a bias, but just an objective fact that at a certain point you really don’t care anymore.