Is able to value things that it does not directly perceive, and in particular cares about other universes.
Has a utility function such that additional resources have diminishing marginal returns.
Such an AI is more likely to participate in trades across universes, possibly with a friendly AI that requests our survival.
[EDIT]: It now occurs to me that an AI that participates in inter-universal trade would also participate in inter-universal terrorism, so I’m no longer confident that my suggestions above are good ones.
Is the marginal utility of resources something that you can input? It seems to me that since resources have instrumental value (pretty much, that’s what a resource is by definition), their value would be something that has to be outputted by the utility function.
If you tried to input the value of resources, you’d run into difficulties with the meaning of resources. For example, would the AI distinguish “having resources” from “having access to resources” from “having access to the power of having access to resources”? Even if ‘having resources’ has negative utility for the AI, he might enjoy controlling resources in all kinds of ways in exchange for power to satisfy terminal values.
Even if you define power as a type of resource, and give that negative utility, then you will basically be telling the AI to enjoy not being able to satisfy his terminal values. (But yet, put that way, it does suggest some kind of friendly passive/pacifist philosophy.)
There is a difference between giving something negative utility and giving it decreasing marginal utility. It’s sufficient to give the AI exponents strictly between zero and one for all terms in a positive polynomial utility function, for instance. That would be effectively “inputting” the marginal utility of resources, given any current state of the world.
There is a difference between giving something negative utility and giving it decreasing marginal utility.
I was considering the least convenient argument, the one that I imagined would result in the least aggressive AI. (I should explain here that I considered that even a 0 terminal utility for the resource itself would not result in 0 utility for that resource, because that resource would have some instrumental value in achieving things of value.)
(Above edited because I don’t think I was understood.)
But I think the problem in logic identified with inputting the value of an instrumental value remains either way.
You pretty much have to guess about the marginal value of resources. But let’s say the AI’s utility function is “10^10th root of # of paperclips in universe.” Then it probably satisfies the criterion.
EDIT: even better would be U = 1 if the universe contains at least one paperclip, otherwise 0.
Can you please elaborate on “trades across universes”? Do you mean something like quantum civilization suicide, as in Nick Bostrom’s paper on that topic?
Personally, I think it’s an interesting idea, but I’m skeptical that it can really work, except maybe in very limited circumstances such as when the trading partners are nearly identical.
Try to build an AI that:
Implements a timeless decision theory.
Is able to value things that it does not directly perceive, and in particular cares about other universes.
Has a utility function such that additional resources have diminishing marginal returns.
Such an AI is more likely to participate in trades across universes, possibly with a friendly AI that requests our survival.
[EDIT]: It now occurs to me that an AI that participates in inter-universal trade would also participate in inter-universal terrorism, so I’m no longer confident that my suggestions above are good ones.
(Disclaimer: I don’t know anything about AI.)
Is the marginal utility of resources something that you can input? It seems to me that since resources have instrumental value (pretty much, that’s what a resource is by definition), their value would be something that has to be outputted by the utility function.
If you tried to input the value of resources, you’d run into difficulties with the meaning of resources. For example, would the AI distinguish “having resources” from “having access to resources” from “having access to the power of having access to resources”? Even if ‘having resources’ has negative utility for the AI, he might enjoy controlling resources in all kinds of ways in exchange for power to satisfy terminal values.
Even if you define power as a type of resource, and give that negative utility, then you will basically be telling the AI to enjoy not being able to satisfy his terminal values. (But yet, put that way, it does suggest some kind of friendly passive/pacifist philosophy.)
There is a difference between giving something negative utility and giving it decreasing marginal utility. It’s sufficient to give the AI exponents strictly between zero and one for all terms in a positive polynomial utility function, for instance. That would be effectively “inputting” the marginal utility of resources, given any current state of the world.
I was considering the least convenient argument, the one that I imagined would result in the least aggressive AI. (I should explain here that I considered that even a 0 terminal utility for the resource itself would not result in 0 utility for that resource, because that resource would have some instrumental value in achieving things of value.)
(Above edited because I don’t think I was understood.)
But I think the problem in logic identified with inputting the value of an instrumental value remains either way.
You pretty much have to guess about the marginal value of resources. But let’s say the AI’s utility function is “10^10th root of # of paperclips in universe.” Then it probably satisfies the criterion.
EDIT: even better would be U = 1 if the universe contains at least one paperclip, otherwise 0.
Can you please elaborate on “trades across universes”? Do you mean something like quantum civilization suicide, as in Nick Bostrom’s paper on that topic?
Here’s Nesov’s elaboration of his trading across possible worlds idea.
Personally, I think it’s an interesting idea, but I’m skeptical that it can really work, except maybe in very limited circumstances such as when the trading partners are nearly identical.
Cool, thanks!