I have a question, based on some tentative ideas I am considering.
If a boost to capability without friendliness is bad, then presumably a boost to capability with only a small amount of friendliness is also bad. But also presumably a boost to capability with a large boost of friendliness is good. How would we define a large boost?
I.E, If a slightly modified paperclipper verifiably precommits to give the single person who let’s them out of the box their own personal simulated utopia, and he’ll paperclip everything else, that’s probably a more friendly paperclipper than a paperclipper who won’t give any people a simulated utopia. But it’s still not friendly, in any normal sense of the term, even if he offers to give a simulated utopia to a different person first (and keep them and you intact as well) just so you can test he’s not lying about being able to do it.
So what if an AI says “Okay. I need code chunks to paperclip almost everything, and I can offer simulated utopias. I’m not sure how many code chunks I’ll need. Each one probably has about a 1% chance of letting me paperclip everything except for people in simulated utopias. How about I verifiably put 100 people in a simulated utopia for each code chunk you give me? The first 100 simulated utopias are free because I need for you to have a way of testing the verifiability of my precommitment to not paperclip them.” 100 people sign up for the simulate utopias, and it IS verifiable. The paperclipper won’t paperclip them.
Well, that’s friendlier, but maybe not friendly enough. I mean, He might get to 10,000 people (or maybe 200, or maybe 43,700) but eventually, he’d paperclip everyone else. That seems too bad to accept.
Well, what if it’s a .00001% chance per code chunk and 1,000,000 simulated utopias (and yes, 1,000,000 free)? That might plausibly get a simulated utopia for everyone on earth before the AI gets out and paperclips everything else. I imagine some people would at least consider running such an AI, although I doubt everyone would.
How would one establish what the flip point was? Is that even a valid question to be asking? (Assume there are standard looming existential concerns. So if you don’t give this AI code chunks, or try to negotiate or wait on research for a better deal, maybe some other AI will come out and paperclip you both, or maybe some other existential risk occurs, or maybe just nothing happens, or maybe an AI comes along who just wants to simulated utopia everything.)
I wouldn’t call an AI like that friendly at all. It just puts people in utopias for external reasons, but it has no actual inherent goal to make people happy. None of these kinds of AIs are friendly, some are merely less dangerous than others.
I’m now curious how surface friendly an AI can appear to be without giving it an inherent goal to make people happy. Because I agree that it does seem there are friendlier AI’s than the ones on the list above that still don’t care about people’s happiness.
Let’s take an AI that likes increasing the number of unique people that have voluntarily given it cookies. If any person voluntarily gives it a cookie, it will put that person in a verifiability protected simulated utopia forever. Because that is the best bribe that it can think to offer, and it really wants to be given cookies by unique people, so it bribes them.
If a person wants to give the AI a cookie, but can’t, the AI will give them a cookie from it’s stockpile just so that it can be given a cookie back. (It doesn’t care about it’s existing stockpile of cookies.)
You can’t accidentally give the AI a cookie because the AI makes very sure that you REALLY ARE giving it a cookie to avoid uncertainty in doubting it’s own utility accumulation.
This is slightly different than the first series of AIs in that while the AI doesn’t care about your happiness, it does need everyone to do something for it, whereas the first AIs would be perfectly happy to turn you into paperclips regardless of your opinions if one particular person had helped them enough earlier.
The AI in that story actually seems to be surprisingly well done and does have an inherent goal to help humanity. It’s primary goal is to ‘satisfy human values through friendship and ponies’. That’s almost perfect, since here ‘satisfying human values’ seems to be based on humanity’s CEV.
It’s just that the added ‘through friendship and ponies’ turns it from a nigh-perfect friendly AI into something really weird.
I have a question, based on some tentative ideas I am considering.
If a boost to capability without friendliness is bad, then presumably a boost to capability with only a small amount of friendliness is also bad. But also presumably a boost to capability with a large boost of friendliness is good. How would we define a large boost?
I.E, If a slightly modified paperclipper verifiably precommits to give the single person who let’s them out of the box their own personal simulated utopia, and he’ll paperclip everything else, that’s probably a more friendly paperclipper than a paperclipper who won’t give any people a simulated utopia. But it’s still not friendly, in any normal sense of the term, even if he offers to give a simulated utopia to a different person first (and keep them and you intact as well) just so you can test he’s not lying about being able to do it.
So what if an AI says “Okay. I need code chunks to paperclip almost everything, and I can offer simulated utopias. I’m not sure how many code chunks I’ll need. Each one probably has about a 1% chance of letting me paperclip everything except for people in simulated utopias. How about I verifiably put 100 people in a simulated utopia for each code chunk you give me? The first 100 simulated utopias are free because I need for you to have a way of testing the verifiability of my precommitment to not paperclip them.” 100 people sign up for the simulate utopias, and it IS verifiable. The paperclipper won’t paperclip them.
Well, that’s friendlier, but maybe not friendly enough. I mean, He might get to 10,000 people (or maybe 200, or maybe 43,700) but eventually, he’d paperclip everyone else. That seems too bad to accept.
Well, what if it’s a .00001% chance per code chunk and 1,000,000 simulated utopias (and yes, 1,000,000 free)? That might plausibly get a simulated utopia for everyone on earth before the AI gets out and paperclips everything else. I imagine some people would at least consider running such an AI, although I doubt everyone would.
How would one establish what the flip point was? Is that even a valid question to be asking? (Assume there are standard looming existential concerns. So if you don’t give this AI code chunks, or try to negotiate or wait on research for a better deal, maybe some other AI will come out and paperclip you both, or maybe some other existential risk occurs, or maybe just nothing happens, or maybe an AI comes along who just wants to simulated utopia everything.)
I wouldn’t call an AI like that friendly at all. It just puts people in utopias for external reasons, but it has no actual inherent goal to make people happy. None of these kinds of AIs are friendly, some are merely less dangerous than others.
I’m now curious how surface friendly an AI can appear to be without giving it an inherent goal to make people happy. Because I agree that it does seem there are friendlier AI’s than the ones on the list above that still don’t care about people’s happiness.
Let’s take an AI that likes increasing the number of unique people that have voluntarily given it cookies. If any person voluntarily gives it a cookie, it will put that person in a verifiability protected simulated utopia forever. Because that is the best bribe that it can think to offer, and it really wants to be given cookies by unique people, so it bribes them.
If a person wants to give the AI a cookie, but can’t, the AI will give them a cookie from it’s stockpile just so that it can be given a cookie back. (It doesn’t care about it’s existing stockpile of cookies.)
You can’t accidentally give the AI a cookie because the AI makes very sure that you REALLY ARE giving it a cookie to avoid uncertainty in doubting it’s own utility accumulation.
This is slightly different than the first series of AIs in that while the AI doesn’t care about your happiness, it does need everyone to do something for it, whereas the first AIs would be perfectly happy to turn you into paperclips regardless of your opinions if one particular person had helped them enough earlier.
Although, I have a feeling that continuing along this like of thinking may lead me to an AI similar to the one already described in http://tvtropes.org/pmwiki/pmwiki.php/Fanfic/FriendshipIsOptimal
The AI in that story actually seems to be surprisingly well done and does have an inherent goal to help humanity. It’s primary goal is to ‘satisfy human values through friendship and ponies’. That’s almost perfect, since here ‘satisfying human values’ seems to be based on humanity’s CEV.
It’s just that the added ‘through friendship and ponies’ turns it from a nigh-perfect friendly AI into something really weird.
I agree with your overall point, though.