Note: The following post is a cross of humor and seriousness.
After reading another reference to an AI failure, it seems to me that almost every “The AI is an unfriendly failure” story begin with “The Humans are wasting too many resources, which I can more efficiently use for something else.”
I felt like I should also consider potential solutions that look at the next type of failure. My initial reasoning is: Assuming that a bunch of AI researchers are determined to avoid that particular failure mode and only that one, they’re probably going to run into other failure modes as they attempt (and probably fail) to bypass that.
For instance: AI Researchers build an AI that gains utility roughly equivalent to the Square Root(Median Human Prolifigacy) times Human Population times Time, and is dumb about Metaphysics, and has a fixed utility function.
It’s not happier if the top Human doubles his energy consumption. (Note: Median Human Prolifigacy)
It’s happier, but not twice as happy when Humans are using Twice as many Petawatthours per Year (Note: Square Root: This also helps prevent 1 human killing all other humans from space and setting the earth on fire be a good use of energy. This Skyrockets the Median, but it does not skyrocket the Square Root of the Median nearly as much.)
It’s five times as happy if there are five times as many Humans, and ten times as happy when Humans are using the same amount of energy for year for 10 years as opposed to just 1.
Dumb about metaphysics is a reference to the following type of AI failure: “I’m not CERTAIN that there are actually billions of Humans, we might be in the matrix, and if I don’t know that, I don’t know if I’m getting utility, so let me computronium up earth really quick just to run some calculations to be sure of what’s going on.” Assume the AI just disregards those kinds of skeptical hypotheses, because it’s dumb about metaphysics. Also assume it can’t change it’s utility function, because that’s just too easy too combust.
As I stated, this AI has bunches of failure modes. My question is not “Does it Fail?” but “Does it even sound like it avoids having eat humans, make computronium be the most plausible failure? If so, what sounds like a plausible failure?”
Example Hypothetical Plausible Failure: The AI starts murdering environmentalists because it fears that environmentalists will cause an overall degradation in Median human energy use that will lower overall AI utility, and environmentalists also encourage less population growth, which further degrades AI utility, and while the AI does value the environmentalists human energy consumption which boosts utility, they’re environmentalists, so they have a small energy footprint, and it doesn’t value not murdering people in of itself.
After considering that kind of solution, I went up and changed ‘my reasoning’ to ‘my initial reasoning’ Because at some point I realized I was just having fun considering this kind of AI failure analysis and had stopped actually trying to make a point. Also, as Failed Utopia 4-2 points out in http://lesswrong.com/lw/xu/failed_utopia_42/ designing more interesting failures can be fun.
Edit for clarity: I AM NOT IMPLYING THE ABOVE AI IS OR WILL CAUSE A UTOPIA. I don’t think it it could be read that way, but just in case there are inferential gaps, I should close them.
it seems to me that almost every “The AI is an unfriendly failure” story begin with “The Humans are wasting too many resources, which I can more efficiently use for something else.”
Really? I think the one I see most is “I am supposed to make humans happy, but they fight with each other and make themselves unhappy, so I must kill/enslave all of them”. At least in Hollywood. You may be looking in more interesting places.
Per your AI, does it have an obvious incentive to help people below the median energy level?
Really? I think the one I see most is “I am supposed to make humans happy, but they fight with each other and make themselves unhappy, so I must kill/enslave all of them”. At least in Hollywood. You may be looking in more interesting places.
To me, that seems like a very similar story, it’s just their wasting their energy on fighting/unhappiness. I just thought I’d attempt to make an AI that thinks “Human’s wasting energy? Under some caveats, I approve!”
Per your AI, does it have an obvious incentive to help people below the median energy level?
I made a quick sample population to run some numbers about incentives (8 people, using 100, 50, 25,13,6,3,2,1 energy, assuming only one unit of time) and ran some numbers to consider incentives.
The AI got around 5.8 utility from taking 50 energy from the top person, giving 10 energy to use to the bottom 4, and just assuming that the remaining 10 energy either went unused or was used as a transaction cost. However, the AI did also get about .58 more Utility from killing any of the four bottom people, (even assuming their energy vanished)
Of note, roughly doubling the size of everyone’s energy pie does get a greater amount of Utility then either of those two things (Roughly 10.2), except that they aren’t exclusive: You can double the Pie and also redistribute the Pie (and also kill people that would eat the pie in such a way to drag down the Median)
Here’s an even more bizzare note: When I quadrupled the population (giving the same distribution of energy to each people, so 100x4, 50x4, 25x4, 13x4, 6x4,3x4, 2x4, 1x4) The Algorithm gained plenty of additional utility. However, the amount of utility the algorithm gained by murdering the bottom person skyrocketed (to around 13.1) Because while it would still move the Median from 9.5 to 13, the Squareroot of that Median was multiplied by a much greater population than when Median was multiplied by a much greater population. So, if for some reason, the energy gap between the person right below the Median and the person right above the Median is large, the AI has a significant incentive to murder 1 person.
In fact, the way I set it up, the AI even has incentive to murder the bottom 9 people to get the Median up to 25.… but not very much, and each person it murders before the Median shifts is a substantial disutility. The AI would have gained more utility by just implementing the “Tax the 100′s” plan I gave earlier than instituting either of those two plans, but again, they aren’t exclusive.
I somehow got: Murder can be justified, but only of people below the median, and only in those cases where it Jukes the median sufficiently, and in general helping them by taking from people above the median is more effective, but you can do both.
Assuming a smoother distribution of energy expenditures in the population of 32 appeared to limit this problem from happening. Given a smoother energy expenditure, the median does not jitter by so much when a bottom person dies and Murdering bottom people goes back to causing disutility.
However, I have to admit that in terms of Novel ways an algorithm could fail, I did not see the above coming: I knew it was going to fail, but I didn’t realize it might also fail in such an oddly esoteric manner in addition to the obvious failure I already mentioned.
Thank you for encouraging me to look at this in more detail!
Note that killing people is not the only way to raise the median. Another technique is taking resources and redistributing them. The optimal first-level strategy is to only allow minimum-necessary-for-survival to those below the median (which, depending on what it thinks “survival” means, might include just freezing them, or cutting off all unnecessary body parts and feeding them barely nutritious glop while storing them in the dark), and distribute everything else equally between the rest.
Also, given this strategy, the median of human consumption is 2×R/(N-1), where R is the total amount of resources and N is the total amount of humans. The utility function then becomes sqrt(2×R/(N-1)) × N × T. Which means that for the same resources, its utility is maximized if the maximum number of people use them. Thus, the AI will spend its time finding the smallest possible increment above “minimum necessary for survival”, and maximize the number of people it can sustain, keeping (N-1)/2 people at the minimum and (N-1)/2+1 just a tiny bit above it, and making sure it does this for the longest possible time.
Well, even if it turned out to do exactly what it’s designers were thinking (wich it won’t), it’d still be unfriendly for the simple fact that no remotely optimal future likely involve humans with big energy consumptions. The FAI almost certainly should eat all humans for computronium, the only difference is the friendly one will scan their brains first and make emulations.
You get an accurate prediction point for guessing that it wouldn’t do what it’s designers were thinking: Even if the designers assumed it would kill environmentalists (and so assumed it was flawed), A more detailed look as Martin-2 encouraged me to do found that it also finds murder to be a utility benefit in at least some other circumstances.
Note: The following post is a cross of humor and seriousness.
After reading another reference to an AI failure, it seems to me that almost every “The AI is an unfriendly failure” story begin with “The Humans are wasting too many resources, which I can more efficiently use for something else.”
I felt like I should also consider potential solutions that look at the next type of failure. My initial reasoning is: Assuming that a bunch of AI researchers are determined to avoid that particular failure mode and only that one, they’re probably going to run into other failure modes as they attempt (and probably fail) to bypass that.
For instance: AI Researchers build an AI that gains utility roughly equivalent to the Square Root(Median Human Prolifigacy) times Human Population times Time, and is dumb about Metaphysics, and has a fixed utility function.
It’s not happier if the top Human doubles his energy consumption. (Note: Median Human Prolifigacy)
It’s happier, but not twice as happy when Humans are using Twice as many Petawatthours per Year (Note: Square Root: This also helps prevent 1 human killing all other humans from space and setting the earth on fire be a good use of energy. This Skyrockets the Median, but it does not skyrocket the Square Root of the Median nearly as much.)
It’s five times as happy if there are five times as many Humans, and ten times as happy when Humans are using the same amount of energy for year for 10 years as opposed to just 1.
Dumb about metaphysics is a reference to the following type of AI failure: “I’m not CERTAIN that there are actually billions of Humans, we might be in the matrix, and if I don’t know that, I don’t know if I’m getting utility, so let me computronium up earth really quick just to run some calculations to be sure of what’s going on.” Assume the AI just disregards those kinds of skeptical hypotheses, because it’s dumb about metaphysics. Also assume it can’t change it’s utility function, because that’s just too easy too combust.
As I stated, this AI has bunches of failure modes. My question is not “Does it Fail?” but “Does it even sound like it avoids having eat humans, make computronium be the most plausible failure? If so, what sounds like a plausible failure?”
Example Hypothetical Plausible Failure: The AI starts murdering environmentalists because it fears that environmentalists will cause an overall degradation in Median human energy use that will lower overall AI utility, and environmentalists also encourage less population growth, which further degrades AI utility, and while the AI does value the environmentalists human energy consumption which boosts utility, they’re environmentalists, so they have a small energy footprint, and it doesn’t value not murdering people in of itself.
After considering that kind of solution, I went up and changed ‘my reasoning’ to ‘my initial reasoning’ Because at some point I realized I was just having fun considering this kind of AI failure analysis and had stopped actually trying to make a point. Also, as Failed Utopia 4-2 points out in http://lesswrong.com/lw/xu/failed_utopia_42/ designing more interesting failures can be fun.
Edit for clarity: I AM NOT IMPLYING THE ABOVE AI IS OR WILL CAUSE A UTOPIA. I don’t think it it could be read that way, but just in case there are inferential gaps, I should close them.
Really? I think the one I see most is “I am supposed to make humans happy, but they fight with each other and make themselves unhappy, so I must kill/enslave all of them”. At least in Hollywood. You may be looking in more interesting places.
Per your AI, does it have an obvious incentive to help people below the median energy level?
To me, that seems like a very similar story, it’s just their wasting their energy on fighting/unhappiness. I just thought I’d attempt to make an AI that thinks “Human’s wasting energy? Under some caveats, I approve!”
I made a quick sample population to run some numbers about incentives (8 people, using 100, 50, 25,13,6,3,2,1 energy, assuming only one unit of time) and ran some numbers to consider incentives.
The AI got around 5.8 utility from taking 50 energy from the top person, giving 10 energy to use to the bottom 4, and just assuming that the remaining 10 energy either went unused or was used as a transaction cost. However, the AI did also get about .58 more Utility from killing any of the four bottom people, (even assuming their energy vanished)
Of note, roughly doubling the size of everyone’s energy pie does get a greater amount of Utility then either of those two things (Roughly 10.2), except that they aren’t exclusive: You can double the Pie and also redistribute the Pie (and also kill people that would eat the pie in such a way to drag down the Median)
Here’s an even more bizzare note: When I quadrupled the population (giving the same distribution of energy to each people, so 100x4, 50x4, 25x4, 13x4, 6x4,3x4, 2x4, 1x4) The Algorithm gained plenty of additional utility. However, the amount of utility the algorithm gained by murdering the bottom person skyrocketed (to around 13.1) Because while it would still move the Median from 9.5 to 13, the Squareroot of that Median was multiplied by a much greater population than when Median was multiplied by a much greater population. So, if for some reason, the energy gap between the person right below the Median and the person right above the Median is large, the AI has a significant incentive to murder 1 person.
In fact, the way I set it up, the AI even has incentive to murder the bottom 9 people to get the Median up to 25.… but not very much, and each person it murders before the Median shifts is a substantial disutility. The AI would have gained more utility by just implementing the “Tax the 100′s” plan I gave earlier than instituting either of those two plans, but again, they aren’t exclusive.
I somehow got: Murder can be justified, but only of people below the median, and only in those cases where it Jukes the median sufficiently, and in general helping them by taking from people above the median is more effective, but you can do both.
Assuming a smoother distribution of energy expenditures in the population of 32 appeared to limit this problem from happening. Given a smoother energy expenditure, the median does not jitter by so much when a bottom person dies and Murdering bottom people goes back to causing disutility.
However, I have to admit that in terms of Novel ways an algorithm could fail, I did not see the above coming: I knew it was going to fail, but I didn’t realize it might also fail in such an oddly esoteric manner in addition to the obvious failure I already mentioned.
Thank you for encouraging me to look at this in more detail!
Note that killing people is not the only way to raise the median. Another technique is taking resources and redistributing them. The optimal first-level strategy is to only allow minimum-necessary-for-survival to those below the median (which, depending on what it thinks “survival” means, might include just freezing them, or cutting off all unnecessary body parts and feeding them barely nutritious glop while storing them in the dark), and distribute everything else equally between the rest.
Also, given this strategy, the median of human consumption is 2×R/(N-1), where R is the total amount of resources and N is the total amount of humans. The utility function then becomes sqrt(2×R/(N-1)) × N × T. Which means that for the same resources, its utility is maximized if the maximum number of people use them. Thus, the AI will spend its time finding the smallest possible increment above “minimum necessary for survival”, and maximize the number of people it can sustain, keeping (N-1)/2 people at the minimum and (N-1)/2+1 just a tiny bit above it, and making sure it does this for the longest possible time.
Well, even if it turned out to do exactly what it’s designers were thinking (wich it won’t), it’d still be unfriendly for the simple fact that no remotely optimal future likely involve humans with big energy consumptions. The FAI almost certainly should eat all humans for computronium, the only difference is the friendly one will scan their brains first and make emulations.
You get an accurate prediction point for guessing that it wouldn’t do what it’s designers were thinking: Even if the designers assumed it would kill environmentalists (and so assumed it was flawed), A more detailed look as Martin-2 encouraged me to do found that it also finds murder to be a utility benefit in at least some other circumstances.