Classical prisoners’ dilemma, where individuals receive the greatest payoffs if they betray the group rather than cooperate.
In this case, “defecting” gives lower payoffs to the defector—you’re shooting yourself in the foot and increasing the risk that you die an early death.
The situation is being driven mostly by information asymmetries (not everyone appreciates the risks, or is thinking rationally about novel risks as a category), not by deep conflicts of interest. Which makes it doubly important not to propagate the meme that this is a prisoner’s dilemma: one of the ways people end up with a false belief about this is exactly that people round this situation off to a PD too often!
Capabilities Researcher: *repeatedly shooting himself in the foot, reloading his gun, shooting again* “Wow, it sure is a shame that my selfish incentives aren’t aligned with the collective good!” *reloads gun, shoots again*
The issue is the payoffs involved. Even if it’s say at 50% risk, it’s still individually rational to take the plunge, because the other 50% in expected value terms outweighs everything else. I don’t believe this for a multitude of reasons, but it’s useful to illustrate.
The payoffs are essentially cooperate and reduce X-risk from say 50% to 1%, which gives them a utility of say 50-200, or defect and gain expected utility of say 10^20 or more if we grant the assumption on LW that AI is the most important invention in human history.
Meanwhile for others, cooperation has the utility of individual defection in this scenario, which is 10^20+ utility, whereas defection essentially reverses the sign of utility gained, which is −10^20+ utility.
The problem is that without a way to enforce cooperation, it’s too easy to defect until everyone dies.
Now thankfully, I believe that existential risk is a lot lower, but if existential risk were high in my model, then we eventually need to start enforcing cooperation, as the incentives would be dangerous if existential risk is high.
I’m going to naively express something that your risk calculation makes me think:
I think EY and I and others who are persuaded by him seem to be rating the expected utility of a x-risk outcome as nothing less (more?) than negative infinity. I.e., whether the risk is 1% or 50% our expected utility from AI x-risk will calculate to approx. negative infinity, which will outweigh even 99% of 10^20+ utility.
This is why shutting it down seems to be the only logical move in this calculation right now. Because if you think that a negative infinity outcome exists at all in the outcome space, then the only solution is to avoid the outcome space completely until you can be assured that it does not include a potentially-negative infinity outcome. It’s not about getting that negative infinity outcome to some tiny expected percentage, it’s about eliminating it from the outcome space entirely.
The problem is that the key actor is of course OpenAI, not Eliezer, so what Eliezer values on X-risk is not relevant to the analysis. What matters is how much the people at AI companies value them dying, and given that that I believe they don’t value their lives infinitely, then Eliezer’s calculations don’t matter, since he isn’t a relevant actor in a AI company.
My point is that, as you said, you take the safest route when not knowing what others will do—do whatever is best for you and, most importantly, guaranteed. You take some years, and yes, you lose the opportunity to walk out of doing any time, but at least you’re in complete control of your situation. Just imagine a PD with 500 actors… I know what I’d pick.
In this case, “defecting” gives lower payoffs to the defector—you’re shooting yourself in the foot and increasing the risk that you die an early death.
The situation is being driven mostly by information asymmetries (not everyone appreciates the risks, or is thinking rationally about novel risks as a category), not by deep conflicts of interest. Which makes it doubly important not to propagate the meme that this is a prisoner’s dilemma: one of the ways people end up with a false belief about this is exactly that people round this situation off to a PD too often!
Capabilities Researcher: *repeatedly shooting himself in the foot, reloading his gun, shooting again* “Wow, it sure is a shame that my selfish incentives aren’t aligned with the collective good!” *reloads gun, shoots again*
The issue is the payoffs involved. Even if it’s say at 50% risk, it’s still individually rational to take the plunge, because the other 50% in expected value terms outweighs everything else. I don’t believe this for a multitude of reasons, but it’s useful to illustrate.
The payoffs are essentially cooperate and reduce X-risk from say 50% to 1%, which gives them a utility of say 50-200, or defect and gain expected utility of say 10^20 or more if we grant the assumption on LW that AI is the most important invention in human history.
Meanwhile for others, cooperation has the utility of individual defection in this scenario, which is 10^20+ utility, whereas defection essentially reverses the sign of utility gained, which is −10^20+ utility.
The problem is that without a way to enforce cooperation, it’s too easy to defect until everyone dies.
Now thankfully, I believe that existential risk is a lot lower, but if existential risk were high in my model, then we eventually need to start enforcing cooperation, as the incentives would be dangerous if existential risk is high.
I don’t believe that, thankfully.
I’m going to naively express something that your risk calculation makes me think:
I think EY and I and others who are persuaded by him seem to be rating the expected utility of a x-risk outcome as nothing less (more?) than negative infinity. I.e., whether the risk is 1% or 50% our expected utility from AI x-risk will calculate to approx. negative infinity, which will outweigh even 99% of 10^20+ utility.
This is why shutting it down seems to be the only logical move in this calculation right now. Because if you think that a negative infinity outcome exists at all in the outcome space, then the only solution is to avoid the outcome space completely until you can be assured that it does not include a potentially-negative infinity outcome. It’s not about getting that negative infinity outcome to some tiny expected percentage, it’s about eliminating it from the outcome space entirely.
The problem is that the key actor is of course OpenAI, not Eliezer, so what Eliezer values on X-risk is not relevant to the analysis. What matters is how much the people at AI companies value them dying, and given that that I believe they don’t value their lives infinitely, then Eliezer’s calculations don’t matter, since he isn’t a relevant actor in a AI company.
My point is that, as you said, you take the safest route when not knowing what others will do—do whatever is best for you and, most importantly, guaranteed. You take some years, and yes, you lose the opportunity to walk out of doing any time, but at least you’re in complete control of your situation. Just imagine a PD with 500 actors… I know what I’d pick.
It’s also possible to interpret the risks differently or believe you can handle the dangers, and be correct or not correct.