It’s difficult to answer the question of what our utility function is, but easier to answer the question of what it should be.
Suppose we have an AI which can duplicate itself at a small cost. Suppose the AI is about to witness an event which will probably make it happy. (Perhaps the AI was working to get a law passed, and the vote is due soon. Perhaps the AI is maximizing paperclips, and a new factory has opened. Perhaps the AI’s favorite author has just written a new book.)
Does it make sense that the AI would duplicate itself in order to witness this event in greater multiplicity? If not, we need to find a set of utility rules that cause the AI to behave properly.
(I’m not sure what the rule is here for replying to oneself. Apologies if this is considered rude; I’m trying to avoid putting TLDR text in one comment.)
Here is a set of utility-rules that I think would cause an AI to behave properly. (Would I call this “Identical Copy Decision Theory”?)
Suppose that an entity E clones itself, becoming E1 and E2. (We’re being agnostic here about which of E1 and E2 is the “original”. If the clone operation is perfect, the distinction is meaningless.) Before performing the clone, E calculates its expected utility U(E) = (U(E1)+U(E2))/2.
After the cloning operation, E1 and E2 have separate utility functions: E1 does not care about U(E2). “That guy thinks like me, but he isn’t me.”
Suppose that E1 and E2 have some experiences, and then they are merged back into one entity E’ (as described in http://lesswrong.com/lw/19d/the_anthropic_trilemma/ and elsewhere). Assuming this merge operation is possible (because the experiences of E1 and E2 were not too bizarrely disjoint), the utility of E’ is the average: U(E’) = (U(E1) + U(E2))/2.
I think I am happy with how these rules interact with the Anthropic Trilemma problem. But as a simpler test case, consider the following:
An AI walks into a movie theater. “In exchange for 10 utilons worth of cash”, says the owner, “I will show you a movie worth 100 utilons. But we have a special offer: for only 1000 utilons worth of cash, I will clone you ten thousand times, and every copy of you will see that same movie. At the end of the show, since every copy will have had the same experience, I’ll merge all the copies of you back into one.”
Note that, although AIs can be cloned, cash cannot be. ^_^;
I claim that a “sane” AI is one that declines the special offer.
It’s difficult to answer the question of what our utility function is, but easier to answer the question of what it should be.
Suppose we have an AI which can duplicate itself at a small cost. Suppose the AI is about to witness an event which will probably make it happy. (Perhaps the AI was working to get a law passed, and the vote is due soon. Perhaps the AI is maximizing paperclips, and a new factory has opened. Perhaps the AI’s favorite author has just written a new book.)
Does it make sense that the AI would duplicate itself in order to witness this event in greater multiplicity? If not, we need to find a set of utility rules that cause the AI to behave properly.
(I’m not sure what the rule is here for replying to oneself. Apologies if this is considered rude; I’m trying to avoid putting TLDR text in one comment.)
Here is a set of utility-rules that I think would cause an AI to behave properly. (Would I call this “Identical Copy Decision Theory”?)
Suppose that an entity E clones itself, becoming E1 and E2. (We’re being agnostic here about which of E1 and E2 is the “original”. If the clone operation is perfect, the distinction is meaningless.) Before performing the clone, E calculates its expected utility U(E) = (U(E1)+U(E2))/2.
After the cloning operation, E1 and E2 have separate utility functions: E1 does not care about U(E2). “That guy thinks like me, but he isn’t me.”
Suppose that E1 and E2 have some experiences, and then they are merged back into one entity E’ (as described in http://lesswrong.com/lw/19d/the_anthropic_trilemma/ and elsewhere). Assuming this merge operation is possible (because the experiences of E1 and E2 were not too bizarrely disjoint), the utility of E’ is the average: U(E’) = (U(E1) + U(E2))/2.
I think I am happy with how these rules interact with the Anthropic Trilemma problem. But as a simpler test case, consider the following:
An AI walks into a movie theater. “In exchange for 10 utilons worth of cash”, says the owner, “I will show you a movie worth 100 utilons. But we have a special offer: for only 1000 utilons worth of cash, I will clone you ten thousand times, and every copy of you will see that same movie. At the end of the show, since every copy will have had the same experience, I’ll merge all the copies of you back into one.”
Note that, although AIs can be cloned, cash cannot be. ^_^;
I claim that a “sane” AI is one that declines the special offer.