The original AI would spend resources on safeguarding itself against value drift, and destroy AIs with competing goals while they’re young. After all, that strategy leads to more paperclips in the long run.
Suppose the AI had a number of values. One would be making paperclips now. Another might be insuring the high production of paper clips in the future. A third might be preserving “diversity” in the kinds of paper clips made and the things they are made from. Once values compete, it is not clear which variants one wishes to prune and which one wishes to encourage. Diversity itself presents a survival value, which will seem important to the part of the AI that wants to preserve paper clip making into the distant future.
What makes me think all this? Introspection. Everything I am saying about paper clip AI’s is pretty clearly true about humans.
Now is there a mechanism that can somehow preserve paper-cilp making as a value while allowing other values to drift in order to keep the AI nimble and survivable in a changing world? FAI theory either assumes there is or derives that there is. Me, I”m not at all so sure. And whatever mechanism would prevent the drift of the core value, I would imagine would take robustness away from the pure survival goal, and so might cause the FAI, or the paper clip maximizer, to lose out to UAI or paper clip optimizers when push comes to shove.
I think you’re anthropomorphizing. A paperclipper AI doesn’t need any values except maximizing paperclips. (To be well defined, that needs something like a time discount function, so let’s assume it has one.) If maximizing paperclips requires the AI to survive, then it will try to survive. See Omohundro’s “basic AI drives”.
Value drift is not necessary for maximizing paperclips. If a paperclip maximizer can see that action X leads to more expected paperclips than action Y, then it will prefer X to Y anyway, without the need for value drift. That argument is quite general, e.g. X can be something like “try to survive” or “behave like mwengler’s proposed agent with value drift”.
Do you believe that a paper clip maximizer can survive in a world where another self-modifying AI exists whos value is to morph itself into the most powerful and prevalent AI in the world? I don’t see how something like a paper clip maximizer, which must split its exponential growth between becoming more powerful and creating paper clips, can ever be expected to outgrow an AI which must only become more powerful.
I realize that my statement is equivalent to saying I don’t see how FAI can ever defeat UAI. (Because FAI has more constraint on its values evolution, which must cost it something in growth rate.) So I guess I realize that the conventional wisdom here is that I am wrong, but I don’t know the reasoning that leads to my being wrong.
Yeah, if the paperclipper values a paperclip today more than a paperclip tomorrow, then I suppose it will lose out to other AIs that have a lower time discounting rate and can delay gratification for longer. Unless these other AIs also use time discounting, e.g. the power-hungry AI could value a 25% chance of ultimate power today the same as a 50% chance tomorrow.
But then again, such contests can happen only if the two AIs arise almost simultaneously. If one of them has a head start, it will try to eliminate potential competition quickly, because that’s the utility-maximizing thing to do.
I suppose that’s the main reason to be pessimistic about FAI. It’s not just that FAI is more constrained in its actions, it also takes longer to build, and a few days’ head start is enough for UAI to win.
That might be related to time discounting rates. For example, if the paperclipper has a low discounting rate (a paperclip today has the same utility as two paperclips in 100 years), and the power-hungry AI has a high discounting rate (a 25% chance of ultimate power today has the same utility as a 50% chance tomorrow), then I guess the paperclipper will tend to win. But for that contest to happen, the two AIs would need to arise almost simultaneously. If one of the AIs has a head start, it will try to takeoff quickly and stop other AIs from arising.
Suppose the AI had a number of values. One would be making paperclips now. Another might be insuring the high production of paper clips in the future. A third might be preserving “diversity” in the kinds of paper clips made and the things they are made from. Once values compete, it is not clear which variants one wishes to prune and which one wishes to encourage. Diversity itself presents a survival value, which will seem important to the part of the AI that wants to preserve paper clip making into the distant future.
What makes me think all this? Introspection. Everything I am saying about paper clip AI’s is pretty clearly true about humans.
Now is there a mechanism that can somehow preserve paper-cilp making as a value while allowing other values to drift in order to keep the AI nimble and survivable in a changing world? FAI theory either assumes there is or derives that there is. Me, I”m not at all so sure. And whatever mechanism would prevent the drift of the core value, I would imagine would take robustness away from the pure survival goal, and so might cause the FAI, or the paper clip maximizer, to lose out to UAI or paper clip optimizers when push comes to shove.
I think you’re anthropomorphizing. A paperclipper AI doesn’t need any values except maximizing paperclips. (To be well defined, that needs something like a time discount function, so let’s assume it has one.) If maximizing paperclips requires the AI to survive, then it will try to survive. See Omohundro’s “basic AI drives”.
Value drift is not necessary for maximizing paperclips. If a paperclip maximizer can see that action X leads to more expected paperclips than action Y, then it will prefer X to Y anyway, without the need for value drift. That argument is quite general, e.g. X can be something like “try to survive” or “behave like mwengler’s proposed agent with value drift”.
Do you believe that a paper clip maximizer can survive in a world where another self-modifying AI exists whos value is to morph itself into the most powerful and prevalent AI in the world? I don’t see how something like a paper clip maximizer, which must split its exponential growth between becoming more powerful and creating paper clips, can ever be expected to outgrow an AI which must only become more powerful.
I realize that my statement is equivalent to saying I don’t see how FAI can ever defeat UAI. (Because FAI has more constraint on its values evolution, which must cost it something in growth rate.) So I guess I realize that the conventional wisdom here is that I am wrong, but I don’t know the reasoning that leads to my being wrong.
Yeah, if the paperclipper values a paperclip today more than a paperclip tomorrow, then I suppose it will lose out to other AIs that have a lower time discounting rate and can delay gratification for longer. Unless these other AIs also use time discounting, e.g. the power-hungry AI could value a 25% chance of ultimate power today the same as a 50% chance tomorrow.
But then again, such contests can happen only if the two AIs arise almost simultaneously. If one of them has a head start, it will try to eliminate potential competition quickly, because that’s the utility-maximizing thing to do.
I suppose that’s the main reason to be pessimistic about FAI. It’s not just that FAI is more constrained in its actions, it also takes longer to build, and a few days’ head start is enough for UAI to win.
That might be related to time discounting rates. For example, if the paperclipper has a low discounting rate (a paperclip today has the same utility as two paperclips in 100 years), and the power-hungry AI has a high discounting rate (a 25% chance of ultimate power today has the same utility as a 50% chance tomorrow), then I guess the paperclipper will tend to win. But for that contest to happen, the two AIs would need to arise almost simultaneously. If one of the AIs has a head start, it will try to takeoff quickly and stop other AIs from arising.