Is the marginal utility of resources something that you can input? It seems to me that since resources have instrumental value (pretty much, that’s what a resource is by definition), their value would be something that has to be outputted by the utility function.
If you tried to input the value of resources, you’d run into difficulties with the meaning of resources. For example, would the AI distinguish “having resources” from “having access to resources” from “having access to the power of having access to resources”? Even if ‘having resources’ has negative utility for the AI, he might enjoy controlling resources in all kinds of ways in exchange for power to satisfy terminal values.
Even if you define power as a type of resource, and give that negative utility, then you will basically be telling the AI to enjoy not being able to satisfy his terminal values. (But yet, put that way, it does suggest some kind of friendly passive/pacifist philosophy.)
There is a difference between giving something negative utility and giving it decreasing marginal utility. It’s sufficient to give the AI exponents strictly between zero and one for all terms in a positive polynomial utility function, for instance. That would be effectively “inputting” the marginal utility of resources, given any current state of the world.
There is a difference between giving something negative utility and giving it decreasing marginal utility.
I was considering the least convenient argument, the one that I imagined would result in the least aggressive AI. (I should explain here that I considered that even a 0 terminal utility for the resource itself would not result in 0 utility for that resource, because that resource would have some instrumental value in achieving things of value.)
(Above edited because I don’t think I was understood.)
But I think the problem in logic identified with inputting the value of an instrumental value remains either way.
You pretty much have to guess about the marginal value of resources. But let’s say the AI’s utility function is “10^10th root of # of paperclips in universe.” Then it probably satisfies the criterion.
EDIT: even better would be U = 1 if the universe contains at least one paperclip, otherwise 0.
(Disclaimer: I don’t know anything about AI.)
Is the marginal utility of resources something that you can input? It seems to me that since resources have instrumental value (pretty much, that’s what a resource is by definition), their value would be something that has to be outputted by the utility function.
If you tried to input the value of resources, you’d run into difficulties with the meaning of resources. For example, would the AI distinguish “having resources” from “having access to resources” from “having access to the power of having access to resources”? Even if ‘having resources’ has negative utility for the AI, he might enjoy controlling resources in all kinds of ways in exchange for power to satisfy terminal values.
Even if you define power as a type of resource, and give that negative utility, then you will basically be telling the AI to enjoy not being able to satisfy his terminal values. (But yet, put that way, it does suggest some kind of friendly passive/pacifist philosophy.)
There is a difference between giving something negative utility and giving it decreasing marginal utility. It’s sufficient to give the AI exponents strictly between zero and one for all terms in a positive polynomial utility function, for instance. That would be effectively “inputting” the marginal utility of resources, given any current state of the world.
I was considering the least convenient argument, the one that I imagined would result in the least aggressive AI. (I should explain here that I considered that even a 0 terminal utility for the resource itself would not result in 0 utility for that resource, because that resource would have some instrumental value in achieving things of value.)
(Above edited because I don’t think I was understood.)
But I think the problem in logic identified with inputting the value of an instrumental value remains either way.
You pretty much have to guess about the marginal value of resources. But let’s say the AI’s utility function is “10^10th root of # of paperclips in universe.” Then it probably satisfies the criterion.
EDIT: even better would be U = 1 if the universe contains at least one paperclip, otherwise 0.