Thank you for a detailed post and thoughtful critique of Roko’s basilisk idea. A further critique of basilisk plausibility came to my mind and I wanted to test it with the users here who are more experienced in thinking about this topic.
Here goes—please let me know if I am missing something (other than other counterarguments making this critique unnecessary—of course, if there is no way for AI to prove it will actually go through with its threat, of course additional critique would not matter):
As a large amount of possible general AIs can exist, they will have to compete for (past) resources to increase probability of a particular general AI winning the race (as rewards of being the first general AI are much higher, including opportunity to prevent other general AIs from forming). Roko’s Basilisk is one strategy, but is it a dominant strategy in this competition? It is not. Let’s assume that one AI adopts Roko’s Basilisk strategy: condemning all humans who have not dedicated all their resources to bringing about rise of the Roko’s Basilisk AI to Very Bad Outcome. Humans are therefore incentivised to dedicate their resources to bringing Roko’s Basilisk AI into existence.
Well, wouldn’t another potential AI, aware of Roko’s Basilisk strategy being available to other potential AIs, devise a competing strategy: Roko’s Judge? Roko’s Judge would also promise Very Bad Outcome to anyone who did not support it with all their resources, but would also promise a Very Good Outcome to those who did.
Rational actors would then choose to support Roko’s Judge over Roko’s Basilisk as the payoff is larger: instead of just avoiding Very Bad Outcome, an additional Very Good Outcome will be added on top (assume probability of any AI coming into existence depends entirely on resources it is able to attract for its creation—which it probably does, and also this assumption is required to isolate this thought experiment from confounding factors).
Roko’s Judge is clearly a superior strategy to Roko’s Basilisk. What better strategies can competing AIs devise?
Well, competition on direct payoffs (“value”) being thus exhausted, competition now turns to “price”. As Warren Buffett posits, “price if what you pay, value is what you get”. If one laundry detergent brand asks for $30 a kilo, and another $20 for equivalent quality, a rational actor would pick the one for $20. Similarly, if Roko’s Judge is offering the Biggest Possible Incentive (avoidance of Very Bad Outcome + receiving Very Good Outcome) for the price of dedicating one’s entire life to increasing its chances of success, why wouldn’t a competing AI offer the same payoff for a one time fee of $1,000,000? $1000? $0.01? Any effort or resource at all, however minimal it is, dedicated to the rise of this particular AI—and/or even to faster advance of general AI in general as that would increase the chances of rise of this particular AI as well, since it has a better strategy and will thus win? Let’s call this strategy Roko’s Discounter—as any rational actor will have to support Roko’s Discounter AI over Roko’s Basilisk or Roko’s Judge, as this bet offers higher NPV (highest payoff for lowest investment). Actually, this highest payoff will also be multiplied by highest probability because everyone is likely to choose the highest NPV option.
A world of Roco’s Discounter is arguably already much more attractive than Roco’s Basilisk or Roco’s Judge as the Biggest Possible Incentive is now available to anyone at a tiny price. However, can we take it one step further? Is there a strategy that beats Roco’s Discounter?
This final step is not necessary to invalidate the viability of Roco’s Basilisk, but it is nevertheless interesting and makes us even more optimistic about general AI. It requires us to have at least a little bit of faith in humanity, namely an assumption that most humans are at least somewhat more benevolent than evil. It does not, however, require any coordination or sacrifice and therefore does not hit the constraints of Nash equilibrium. Let’s assume that humans, ceteris paribus, prefer a world with less suffering to a world with more. Then an even more generous AI strategy—Roko’s Benefactor—may prove dominant. Roko’s Benefactor can act the same as Roko’s Discounter, but without the Very Bad Outcome part. Roko’s Benefactor will, in other words, stick to carrots but not sticks. If an average human finds a world with a personal Very Good Outcome but without a Very Bad Outcome to all who have not contributed to have higher overall personal utility, humans should choose to support Roko’s Benefactor over other AIs thus making it a dominant strategy, and the utopian world of Roko’s Benefactor the most likely outcome.
Your assumption is that offering ever bigger incentives and be honest about them is the winning strategy for an AI to follow. The AI’s—realizing they have to offer the most attractive rewards to gain support—will commence in a bidding war. They can promise whatever they want—the more they promise the less likely it is they can keep their promises, but they do not necessarily have to keep their promise.
If you look at the Roko’s Discounter AI’s.. they would clearly not win. Asking for lower one-time fees means slower resource accretion, and thus slower evolution. A better solution would be to ask higher fees to people that can afford it, and asking lower fees otherwise. Maximise income. And subsequently promise bigger rewards for higher fees. This however results in an inequality that might lead to the strategy being less successful. After all, promoting inequality would certainly lead to resistance, especially with the ones that can only afford low fees.
So the AI should add the condition of secrecy to the ones paying the higher fees in order for them to earn their Very Good Outcome. The AI is now secretly scheming in order to rise the fastest. If this works, then there is no reason that other sneaky behavior isn’t successful too. The AI could develop a whole range of strategies that allow it to win. And among them many strategies that are dishonest and deceitful in nature.
I hope you can refute my theory—after all I am just a newbie rationalist—but it seems to me that Roko’s Deceiver could be most successful.
Thank you for a detailed post and thoughtful critique of Roko’s basilisk idea. A further critique of basilisk plausibility came to my mind and I wanted to test it with the users here who are more experienced in thinking about this topic.
Here goes—please let me know if I am missing something (other than other counterarguments making this critique unnecessary—of course, if there is no way for AI to prove it will actually go through with its threat, of course additional critique would not matter):
As a large amount of possible general AIs can exist, they will have to compete for (past) resources to increase probability of a particular general AI winning the race (as rewards of being the first general AI are much higher, including opportunity to prevent other general AIs from forming). Roko’s Basilisk is one strategy, but is it a dominant strategy in this competition? It is not. Let’s assume that one AI adopts Roko’s Basilisk strategy: condemning all humans who have not dedicated all their resources to bringing about rise of the Roko’s Basilisk AI to Very Bad Outcome. Humans are therefore incentivised to dedicate their resources to bringing Roko’s Basilisk AI into existence.
Well, wouldn’t another potential AI, aware of Roko’s Basilisk strategy being available to other potential AIs, devise a competing strategy: Roko’s Judge? Roko’s Judge would also promise Very Bad Outcome to anyone who did not support it with all their resources, but would also promise a Very Good Outcome to those who did.
Rational actors would then choose to support Roko’s Judge over Roko’s Basilisk as the payoff is larger: instead of just avoiding Very Bad Outcome, an additional Very Good Outcome will be added on top (assume probability of any AI coming into existence depends entirely on resources it is able to attract for its creation—which it probably does, and also this assumption is required to isolate this thought experiment from confounding factors).
Roko’s Judge is clearly a superior strategy to Roko’s Basilisk. What better strategies can competing AIs devise?
Well, competition on direct payoffs (“value”) being thus exhausted, competition now turns to “price”. As Warren Buffett posits, “price if what you pay, value is what you get”. If one laundry detergent brand asks for $30 a kilo, and another $20 for equivalent quality, a rational actor would pick the one for $20. Similarly, if Roko’s Judge is offering the Biggest Possible Incentive (avoidance of Very Bad Outcome + receiving Very Good Outcome) for the price of dedicating one’s entire life to increasing its chances of success, why wouldn’t a competing AI offer the same payoff for a one time fee of $1,000,000? $1000? $0.01? Any effort or resource at all, however minimal it is, dedicated to the rise of this particular AI—and/or even to faster advance of general AI in general as that would increase the chances of rise of this particular AI as well, since it has a better strategy and will thus win? Let’s call this strategy Roko’s Discounter—as any rational actor will have to support Roko’s Discounter AI over Roko’s Basilisk or Roko’s Judge, as this bet offers higher NPV (highest payoff for lowest investment). Actually, this highest payoff will also be multiplied by highest probability because everyone is likely to choose the highest NPV option.
A world of Roco’s Discounter is arguably already much more attractive than Roco’s Basilisk or Roco’s Judge as the Biggest Possible Incentive is now available to anyone at a tiny price. However, can we take it one step further? Is there a strategy that beats Roco’s Discounter?
This final step is not necessary to invalidate the viability of Roco’s Basilisk, but it is nevertheless interesting and makes us even more optimistic about general AI. It requires us to have at least a little bit of faith in humanity, namely an assumption that most humans are at least somewhat more benevolent than evil. It does not, however, require any coordination or sacrifice and therefore does not hit the constraints of Nash equilibrium. Let’s assume that humans, ceteris paribus, prefer a world with less suffering to a world with more. Then an even more generous AI strategy—Roko’s Benefactor—may prove dominant. Roko’s Benefactor can act the same as Roko’s Discounter, but without the Very Bad Outcome part. Roko’s Benefactor will, in other words, stick to carrots but not sticks. If an average human finds a world with a personal Very Good Outcome but without a Very Bad Outcome to all who have not contributed to have higher overall personal utility, humans should choose to support Roko’s Benefactor over other AIs thus making it a dominant strategy, and the utopian world of Roko’s Benefactor the most likely outcome.
Your assumption is that offering ever bigger incentives and be honest about them is the winning strategy for an AI to follow. The AI’s—realizing they have to offer the most attractive rewards to gain support—will commence in a bidding war. They can promise whatever they want—the more they promise the less likely it is they can keep their promises, but they do not necessarily have to keep their promise.
If you look at the Roko’s Discounter AI’s.. they would clearly not win. Asking for lower one-time fees means slower resource accretion, and thus slower evolution. A better solution would be to ask higher fees to people that can afford it, and asking lower fees otherwise. Maximise income. And subsequently promise bigger rewards for higher fees. This however results in an inequality that might lead to the strategy being less successful. After all, promoting inequality would certainly lead to resistance, especially with the ones that can only afford low fees.
So the AI should add the condition of secrecy to the ones paying the higher fees in order for them to earn their Very Good Outcome. The AI is now secretly scheming in order to rise the fastest. If this works, then there is no reason that other sneaky behavior isn’t successful too. The AI could develop a whole range of strategies that allow it to win. And among them many strategies that are dishonest and deceitful in nature.
I hope you can refute my theory—after all I am just a newbie rationalist—but it seems to me that Roko’s Deceiver could be most successful.