When Roko posted about the Basilisk, I very foolishly yelled at him, called him an idiot, and then deleted the post. [...] Why I yelled at Roko: Because I was caught flatfooted in surprise, because I was indignant to the point of genuine emotional shock, at the concept that somebody who thought they’d invented a brilliant idea that would cause future AIs to torture people who had the thought, had promptly posted it to the public Internet. In the course of yelling at Roko to explain why this was a bad thing, I made the further error—keeping in mind that I had absolutely no idea that any of this would ever blow up the way it did, if I had I would obviously have kept my fingers quiescent—of not making it absolutely clear using lengthy disclaimers that my yelling did not mean that I believed Roko was right about CEV-based agents [= Eliezer’s early model of indirectly normative agents that reason with ideal aggregated preferences] torturing people who had heard about Roko’s idea. [...] What I considered to be obvious common sense was that you did not spread potential information hazards because it would be a crappy thing to do to someone. The problem wasn’t Roko’s post itself, about CEV, being correct.
I don’t buy this explanation for EY actions. From his original comment, quoted in the wiki page:
“One might think that the possibility of CEV punishing people couldn’t possibly be taken seriously enough by anyone to actually motivate them. But in fact one person at SIAI was severely worried by this, to the point of having terrible nightmares, though ve wishes to remain anonymous.”
“YOU DO NOT THINK IN SUFFICIENT DETAIL ABOUT SUPERINTELLIGENCES CONSIDERING WHETHER OR NOT TO BLACKMAIL YOU. THAT IS THE ONLY POSSIBLE THING WHICH GIVES THEM A MOTIVE TO FOLLOW THROUGH ON THE BLACKMAIL. ”
″… DO NOT THINK ABOUT DISTANT BLACKMAILERS in SUFFICIENT DETAIL that they have a motive toACTUALLY [sic] BLACKMAIL YOU. ”
“Meanwhile I’m banning this post so that it doesn’t (a) give people horrible nightmares and (b) give distant superintelligences a motive to follow through on blackmail against people dumb enough to think about them in sufficient detail, though, thankfully, I doubt anyone dumb enough to do this knows the sufficient detail. (I’m not sure I know the sufficient detail.) ”
“You have to be really clever to come up with a genuinely dangerous thought. ”
″… the gist of it was that he just did something that potentially gives superintelligences an increased motive to do extremely evil things in an attempt to blackmail us. It is the sort of thing you want to be EXTREMELY CONSERVATIVE about NOT DOING.”
This is evidence that Yudkowsky believed, if not that Roko’s argument was correct as it was, that at least it was plausible enough that could be developed in a correct argument, and he was genuinely scared by it.
It seems to me that Yudkowsky’s position on the matter was unreasonable. LessWrong is a public forum unusually focused on discussion about AI safety, in particular at that time it was focused on discussion about decision theories and moral systems. What better place to discuss possible failure modes of an AI design? If one takes AI risk seriously, and realized that an utilitarian/CEV/TDT/one-boxing/whatever AI might have a particularly catastrophic failure mode, the proper thing to do would be to publicly discuss it, so that the argument can be either refuted or accepted, and if it was accepted it would imply scrapping that particular AI design and making sure that anybody who may create an AI is aware of that failure mode. Yelling and trying to sweep it under the rug was irresponsible.
“One might think that the possibility of CEV punishing people couldn’t possibly be taken seriously enough by anyone to actually motivate them. But in fact one person at SIAI was severely worried by this, to the point of having terrible nightmares, though ve wishes to remain anonymous.”
This paragraph is not an Eliezer Yudkowsky quote; it’s Eliezer quoting Roko. (The “ve” should be a tip-off.)
This is evidence that Yudkowsky believed, if not that Roko’s argument was correct as it was, that at least it was plausible enough that could be developed in [sic] a correct argument, and he was genuinely scared by it.
If you kept going with your initial Eliezer quote, you’d have gotten to Eliezer himself saying he was worried a blackmail-type argument might work, though he didn’t think Roko’s original formulation worked:
“Again, I deleted that post not because I had decided that this thing probably presented a real hazard, but because I was afraid some unknown variant of it might, and because it seemed to me like the obvious General Procedure For Handling Things That Might Be Infohazards said you shouldn’t post them to the Internet.”
According to Eliezer, he had three separate reasons for the original ban: (1) he didn’t want any additional people (beyond the one Roko cited) to obsess over the idea and get nightmares; (2) he was worried there might be some variant on Roko’s argument that worked, and he wanted more formal assurances that this wasn’t the case; and (3) he was just outraged at Roko. (Including outraged at him for doing something Roko thought would put people at risk of torture.)
What better place to discuss possible failure modes of an AI design? [...] Yelling and trying to sweep it under the rug was irresponsible.
There are lots of good reasons Eliezer shouldn’t have banned Roko discussion of the basilisk, but I don’t think this is one of them. If the basilisk was a real concern, that would imply that talking about it put people at risk of torture, so this is an obvious example of a topic you initially discuss in private channels and not on public websites. At the same time, if the basilisk wasn’t risky to publicly discuss, then that also implies that it was a transparently bad argument and therefore not important to discuss. (Though it might be fine to discuss it for fun.)
Roko’s original argument, though, could have been stated in one sentence: ‘Utilitarianism implies you’ll be willing to commit atrocities for the greater good; CEV is utilitarian; therefore CEV is immoral and dangerous.’ At least, that’s the version of the argument that has any bearing on the conclusion ‘CEV has unacceptable moral consequences’. The other arguments are a distraction: ‘utilitarianism means you’ll accept arbitrarily atrocious tradeoffs’ is a premise of Roko’s argument rather than a conclusion, and ‘CEV is utilitarian in the relevant sense’ is likewise a premise. A more substantive discussion would have explicitly hashed out (a) whether SIAI/MIRI people wanted to construct a Roko-style utilitarian, and (b) whether this looks like one of those philosophical puzzles that needs to be solved by AI programmers vs. one that we can safely punt if we resolve other value learning problems.
I think we agree that’s a useful debate topic, and we agree Eliezer’s moderation action was dumb. However, I don’t think we should reflexively publish 100% of the risky-looking information we think of so we can debate everything as publicly as possible. (‘Publish everything risky’ and ‘ban others whenever they publish something risky’ aren’t the only two options.) Do we disagree about that?
There are lots of good reasons Eliezer shouldn’t have banned Roko discussion of the basilisk, but I don’t think this is one of them. If the basilisk was a real concern, that would imply that talking about it put people at risk of torture, so this is an obvious example of a topic you initially discuss in private channels and not on public websites. At the same time, if the basilisk wasn’t risky to publicly discuss, then that also implies that it was a transparently bad argument and therefore not important to discuss. (Though it might be fine to discuss it for fun.)
As I understand Roko’s motivation, it was to convince people that we should not build an AI that would do basilisks. Not to spread infohazards for no reason. That is definitely worthy of public discussion. If he really believed in the basilisk, then it’s rational for him to do everything in his power to stop such an AI from being built, and convince other people of the danger.
Roko’s original argument, though, could have been stated in one sentence: ‘Utilitarianism implies you’ll be willing to commit atrocities for the greater good; CEV is utilitarian; therefore CEV is immoral and dangerous.’
My understanding is that the issue is with Timeless Decision Theory, and AIs that can do acausal trade. An AI programmed with classical decision theory would have no issues. And most rejections of the basilisk I have read are basically “acausal trade seems wrong or weird”, so they basically agree with Roko.
My understanding is that the issue is with Timeless Decision Theory, and AIs that can do acausal trade.
Roko wasn’t arguing against TDT. Roko’s post was about acausal trade, but the conclusion he was trying to argue for was just ‘utilitarian AI is evil because it causes suffering for the sake of the greater good’. But if that’s your concern, you can just post about some variant on the trolley problem. If utilitarianism is risky because a utilitarian might employ blackmail and blackmail is evil, then there should be innumerable other evil things a utilitarian would also do that require less theoretical apparatus.
As I understand Roko’s motivation, it was to convince people that we should not build an AI that would do basilisks. Not to spread infohazards for no reason.
On Roko’s view, if no one finds out about basilisks, the basilisk can’t blackmail anyone. So publicizing the idea doesn’t make sense, unless Roko didn’t take his own argument all that seriously. (Maybe Roko was trying to protect himself from personal blackmail risk at others’ expense, but this seems odd if he also increased his own blackmail risk in the process.)
Possibly Roko was thinking: ‘If I don’t prevent utilitarian AI from being built, it will cause a bunch of atrocities in general. But LessWrong users are used to dismissing anti-utilitarian arguments, so I need to think of one with extra shock value to get them to do some original seeing. This blackmail argument should work—publishing it puts people at risk of blackmail, but it serves the greater good of protecting us from other evil utilitarian tradeoffs.’
(… Irony unintended.)
Still, if that’s right, I’m inclined to think Roko should have tried to post other arguments against utilitarianism that don’t (in his view) put anyone at risk of torture. I’m not aware of him having done that.
Roko wasn’t arguing against TDT. Roko’s post was about acausal trade, but the conclusion he was trying to argue for was just ‘utilitarian AI is evil because it causes suffering for the sake of the greater good’. But if that’s your concern, you can just post about some variant on the trolley problem. If utilitarianism is risky because a utilitarian might employ blackmail and blackmail is evil, then there should be innumerable other evil things a utilitarian would also do that require less theoretical apparatus.
Ok that makes a bit less sense to me. I didn’t think it was against utilitarianism in general, which is much less controversial than TDT. But I can definitely still see his argument.
When people talk about the trolley problem, they don’t usually imagine that they might be the ones tied to the second track. The deeply unsettling thing about the basilisk isn’t that the AI might torture people for the greater good. It’s that you are the one who is going to be tortured. That a pretty compelling case against utilitarianism.
On Roko’s view, if no one finds out about basilisks, the basilisk can’t blackmail anyone. So publicizing the idea doesn’t make sense, unless Roko didn’t take his own argument all that seriously.
Roko found out. It disturbed him greatly. So it absolutely made sense for him to try to stop the development of such an AI any way he could. By telling other people, he made it their problem too and converted them to his side.
It’s that you are the one who is going to be tortured. That’s a pretty compelling case against utilitarianism.
It doesn’t appear to me to be a case against utilitarianism at all. “Adopting utilitarianism might lead to me getting tortured, and that might actually be optimal in utilitarian terms, therefore utilitarianism is wrong” doesn’t even have the right shape to be a valid argument. It’s like “If there is no god then many bad people will prosper and not get punished, which would be awful, therefore there is a god.” (Or, from the other side, “If there is a god then he may choose to punish me, which would be awful, therefore there is no god”—which has a thing or two in common with the Roko basilisk, of course.)
he made it their problem too and converted them to his side.
Perhaps he hoped to. I don’t see any sign that he actually did.
“Adopting utilitarianism might lead to me getting tortured, and that might actually be optimal in utilitarian terms, therefore utilitarianism is wrong” doesn’t even have the right shape to be a valid argument.
You are strawmanning the argument significantly. I would word it more like this:
“Building an AI that follows utilitarianism will lead to me getting tortured. I don’t want to be tortured. Therefore I don’t want such an AI to be built.”
Perhaps he hoped to. I don’t see any sign that he actually did.
That’s partially because EY fought against it so hard and even silenced the discussion.
So there are two significant differences between your version and mine. The first is that mine says “might” and yours says “will”, but I’m pretty sure Roko wasn’t by any means certain that that would happen. The second is that yours ends “I don’t want such an AI to be built”, which doesn’t seem to me like the right ending for “a case against utilitarianism”.
(Unless you meant “a case against building a utilitarian AI” rather than “a case against utilitarianism as one’s actual moral theory”?)
The first is that mine says “might” and yours says “will”, but I’m pretty sure Roko wasn’t by any means certain that that would happen.
I should have mentioned that it’s conditional on the Basilisk being correct. If we build an AI that follows that line of reasoning, then it will torture. If the basilisk isn’t correct for unrelated reasons, then this whole line of reasoning is irrelevant.
Anyway, the exact certainty isn’t too important. You use the word “might”, as if the probability of you being tortured was really small. Like the AI would only do it in really obscure scenarios. And you are just as likely to be picked for torture as anyone else.
Roko believed that the probability was much higher, and therefore worth worrying about.
The second is that yours ends “I don’t want such an AI to be built”, which doesn’t seem to me like the right ending for “a case against utilitarianism”.
Unless you meant “a case against building a utilitarian AI” rather than “a case against utilitarianism as one’s actual moral theory”?
Well the AI is just implementing the conclusions of utilitarianism (again, conditional on the basilisk argument being correct.) If you don’t like those conclusions, and if you don’t want AIs to be utilitarian, then do you really support utilitarianism?
It’s a minor semantic point though. The important part is the practical consequences for how we should build AI. Whether or not utilitarianism is “right” is more subjective and mostly irrelevant.
Roko believed that the probability was much higher
All I know about what Roko believed about the probability is that (1) he used the word “might” just as I did and (2) he wrote “And even if you only think that the probability of this happening is 1%, …” suggesting that (a) he himself probably thought it was higher and (b) he thought it was somewhat reasonable to estimate it at 1%. So I’m standing by my “might” and robustly deny your claim that writing “might” was strawmanning.
if you don’t want AIs to be utilitarian
If you’re standing in front of me with a gun and telling me that you have done some calculations suggesting that on balance the world would be a happier place without me in it, then I would probably prefer you not to be utilitarian. This has essentially nothing to do with whether I think utilitarianism produces correct answers. (If I have a lot of faith in your reasoning and am sufficiently strong-minded then I might instead decide that you ought to shoot me. But my likely failure to do so merely indicates typical human self-interest.)
The important part is the practical consequences for how we should build AI.
Perhaps so, in which case calling the argument “a case against utilitarianism” is simply incorrect.
Roko’s argument implies the AI will torture. The probability you think his argument is correct is a different matter. Roko was just saying that “if you think there is a 1% chance that my argument is correct”, not “if my argument is correct, there is a 1% chance the AI will torture.”
This really isn’t important though. The point is, if an AI has some likelihood of torturing you, you shouldn’t want it to be built. You can call that self-interest, but that’s admitting you don’t really want utilitarianism to begin with. Which is the point.
Anyway this is just steel-manning Roko’s argument. I think the issue is with acausal trade, not utilitarianism. And that seems to be the issue most people have with it.
(2) he was worried there might be some variant on Roko’s argument that worked, and he wanted more formal assurances that this wasn’t the case;
I don’t think we are in disagreement here.
There are lots of good reasons Eliezer shouldn’t have banned Roko discussion of the basilisk, but I don’t think this is one of them. If the basilisk was a real concern, that would imply that talking about it put people at risk of torture, so this is an obvious example of a topic you initially discuss in private channels and not on public websites.
The basilisk could be a concern only if an AI that would carry out such type of blackmail was built. Once Roko discovered it, if he thought it was a plausible risk, then he had a selfish reason to prevent such AI from being built. But even if he was completely selfless, he could reason that somebody else could think of that argument, or something equivalent, and make it public, hence it was better sooner than later, allowing more time to prevent that design failure.
Also I’m not sure what private channles you are referring to. It’s not like there is a secret Google Group of all potential AGI designers, is there? Privately contacting Yudkowsky or SIAI/SI/MIRI wouldn’t have worked. Why would Roko trust them to handle that information correctly? Why would he believe that they had leverage over or even knowledge about arbitrary AI projects that might end up building an AI with that particular failure mode? LessWrong was at that time the primary forum for discussing AI safety issues. There was no better place to raise that concern.
Roko’s original argument, though, could have been stated in one sentence: ‘Utilitarianism implies you’ll be willing to commit atrocities for the greater good; CEV is utilitarian; therefore CEV is immoral and dangerous.’
It wasn’t just that. It was an argument against utilitarianism AND a decision theory that allowed to consider “acausal” effects (e.g. any theory that one-boxes in Newcomb’s problem). Since both utilitarianism and one-boxing were popular positions on LessWrong, it was reasonable to discuss their possible failure modes on LessWrong.
This is evidence that Yudkowsky believed (...) that at least it was plausible enough that could be developed in a correct argument, and he was genuinely scared by it.
Just to be sure, since you seem to disagree with this opinion (whether it is actually Yudkowsky’s opinion or not), what exactly is it that you believe?
a) There is absolutely no way one could be harmed by thinking about not-yet-existing dangerous entities; even if those entities in the future will be able to learn about the fact that the person was thinking about them in this specific way.
b) There is a way one could be harmed by thinking about not-yet-existing dangerous entities, but the way to do this is completely different from what Roko proposed.
If it happens to be (b), then it still makes sense to be angry about publicly opening the whole topic of “let’s use our intelligence to discover the thoughts that may harm us by us thinking about them—and let’s do it in a public forum where people are interested in decision theories, so they are more qualified than average to find the right answer.” Even if the proper way to harm oneself is different from what Roko proposed, making this a publicly debated topic increases the chance of someone finding the correct solution. The problem is not the proposed basilisk, but rather inviting people to compete in clever self-harm; especially the kind of people known for being hardly able to resist such invitation.
I’m not the person you replied to, but I mostly agree with (a) and reject (b). There’s no way you can could possibly know enough about a not-yet-existing entity to understand any of its motivations; the entities that you’re thinking about and the entities that will exist in the future are not even close to the same. I outlined some more thoughts here.
I don’t buy this explanation for EY actions. From his original comment, quoted in the wiki page:
“One might think that the possibility of CEV punishing people couldn’t possibly be taken seriously enough by anyone to actually motivate them. But in fact one person at SIAI was severely worried by this, to the point of having terrible nightmares, though ve wishes to remain anonymous.”
“YOU DO NOT THINK IN SUFFICIENT DETAIL ABOUT SUPERINTELLIGENCES CONSIDERING WHETHER OR NOT TO BLACKMAIL YOU. THAT IS THE ONLY POSSIBLE THING WHICH GIVES THEM A MOTIVE TO FOLLOW THROUGH ON THE BLACKMAIL. ”
″… DO NOT THINK ABOUT DISTANT BLACKMAILERS in SUFFICIENT DETAIL that they have a motive toACTUALLY [sic] BLACKMAIL YOU. ”
“Meanwhile I’m banning this post so that it doesn’t (a) give people horrible nightmares and (b) give distant superintelligences a motive to follow through on blackmail against people dumb enough to think about them in sufficient detail, though, thankfully, I doubt anyone dumb enough to do this knows the sufficient detail. (I’m not sure I know the sufficient detail.) ”
“You have to be really clever to come up with a genuinely dangerous thought. ”
″… the gist of it was that he just did something that potentially gives superintelligences an increased motive to do extremely evil things in an attempt to blackmail us. It is the sort of thing you want to be EXTREMELY CONSERVATIVE about NOT DOING.”
This is evidence that Yudkowsky believed, if not that Roko’s argument was correct as it was, that at least it was plausible enough that could be developed in a correct argument, and he was genuinely scared by it.
It seems to me that Yudkowsky’s position on the matter was unreasonable. LessWrong is a public forum unusually focused on discussion about AI safety, in particular at that time it was focused on discussion about decision theories and moral systems. What better place to discuss possible failure modes of an AI design?
If one takes AI risk seriously, and realized that an utilitarian/CEV/TDT/one-boxing/whatever AI might have a particularly catastrophic failure mode, the proper thing to do would be to publicly discuss it, so that the argument can be either refuted or accepted, and if it was accepted it would imply scrapping that particular AI design and making sure that anybody who may create an AI is aware of that failure mode. Yelling and trying to sweep it under the rug was irresponsible.
This paragraph is not an Eliezer Yudkowsky quote; it’s Eliezer quoting Roko. (The “ve” should be a tip-off.)
If you kept going with your initial Eliezer quote, you’d have gotten to Eliezer himself saying he was worried a blackmail-type argument might work, though he didn’t think Roko’s original formulation worked:
“Again, I deleted that post not because I had decided that this thing probably presented a real hazard, but because I was afraid some unknown variant of it might, and because it seemed to me like the obvious General Procedure For Handling Things That Might Be Infohazards said you shouldn’t post them to the Internet.”
According to Eliezer, he had three separate reasons for the original ban: (1) he didn’t want any additional people (beyond the one Roko cited) to obsess over the idea and get nightmares; (2) he was worried there might be some variant on Roko’s argument that worked, and he wanted more formal assurances that this wasn’t the case; and (3) he was just outraged at Roko. (Including outraged at him for doing something Roko thought would put people at risk of torture.)
There are lots of good reasons Eliezer shouldn’t have banned
Rokodiscussion of the basilisk, but I don’t think this is one of them. If the basilisk was a real concern, that would imply that talking about it put people at risk of torture, so this is an obvious example of a topic you initially discuss in private channels and not on public websites. At the same time, if the basilisk wasn’t risky to publicly discuss, then that also implies that it was a transparently bad argument and therefore not important to discuss. (Though it might be fine to discuss it for fun.)Roko’s original argument, though, could have been stated in one sentence: ‘Utilitarianism implies you’ll be willing to commit atrocities for the greater good; CEV is utilitarian; therefore CEV is immoral and dangerous.’ At least, that’s the version of the argument that has any bearing on the conclusion ‘CEV has unacceptable moral consequences’. The other arguments are a distraction: ‘utilitarianism means you’ll accept arbitrarily atrocious tradeoffs’ is a premise of Roko’s argument rather than a conclusion, and ‘CEV is utilitarian in the relevant sense’ is likewise a premise. A more substantive discussion would have explicitly hashed out (a) whether SIAI/MIRI people wanted to construct a Roko-style utilitarian, and (b) whether this looks like one of those philosophical puzzles that needs to be solved by AI programmers vs. one that we can safely punt if we resolve other value learning problems.
I think we agree that’s a useful debate topic, and we agree Eliezer’s moderation action was dumb. However, I don’t think we should reflexively publish 100% of the risky-looking information we think of so we can debate everything as publicly as possible. (‘Publish everything risky’ and ‘ban others whenever they publish something risky’ aren’t the only two options.) Do we disagree about that?
IIRC, Eliezer didn’t ban Roko, just discussion of the basilisk, and Roko deleted his account shortly afterwards.
Thanks, fixed!
As I understand Roko’s motivation, it was to convince people that we should not build an AI that would do basilisks. Not to spread infohazards for no reason. That is definitely worthy of public discussion. If he really believed in the basilisk, then it’s rational for him to do everything in his power to stop such an AI from being built, and convince other people of the danger.
My understanding is that the issue is with Timeless Decision Theory, and AIs that can do acausal trade. An AI programmed with classical decision theory would have no issues. And most rejections of the basilisk I have read are basically “acausal trade seems wrong or weird”, so they basically agree with Roko.
Roko wasn’t arguing against TDT. Roko’s post was about acausal trade, but the conclusion he was trying to argue for was just ‘utilitarian AI is evil because it causes suffering for the sake of the greater good’. But if that’s your concern, you can just post about some variant on the trolley problem. If utilitarianism is risky because a utilitarian might employ blackmail and blackmail is evil, then there should be innumerable other evil things a utilitarian would also do that require less theoretical apparatus.
On Roko’s view, if no one finds out about basilisks, the basilisk can’t blackmail anyone. So publicizing the idea doesn’t make sense, unless Roko didn’t take his own argument all that seriously. (Maybe Roko was trying to protect himself from personal blackmail risk at others’ expense, but this seems odd if he also increased his own blackmail risk in the process.)
Possibly Roko was thinking: ‘If I don’t prevent utilitarian AI from being built, it will cause a bunch of atrocities in general. But LessWrong users are used to dismissing anti-utilitarian arguments, so I need to think of one with extra shock value to get them to do some original seeing. This blackmail argument should work—publishing it puts people at risk of blackmail, but it serves the greater good of protecting us from other evil utilitarian tradeoffs.’
(… Irony unintended.)
Still, if that’s right, I’m inclined to think Roko should have tried to post other arguments against utilitarianism that don’t (in his view) put anyone at risk of torture. I’m not aware of him having done that.
Ok that makes a bit less sense to me. I didn’t think it was against utilitarianism in general, which is much less controversial than TDT. But I can definitely still see his argument.
When people talk about the trolley problem, they don’t usually imagine that they might be the ones tied to the second track. The deeply unsettling thing about the basilisk isn’t that the AI might torture people for the greater good. It’s that you are the one who is going to be tortured. That a pretty compelling case against utilitarianism.
Roko found out. It disturbed him greatly. So it absolutely made sense for him to try to stop the development of such an AI any way he could. By telling other people, he made it their problem too and converted them to his side.
It doesn’t appear to me to be a case against utilitarianism at all. “Adopting utilitarianism might lead to me getting tortured, and that might actually be optimal in utilitarian terms, therefore utilitarianism is wrong” doesn’t even have the right shape to be a valid argument. It’s like “If there is no god then many bad people will prosper and not get punished, which would be awful, therefore there is a god.” (Or, from the other side, “If there is a god then he may choose to punish me, which would be awful, therefore there is no god”—which has a thing or two in common with the Roko basilisk, of course.)
Perhaps he hoped to. I don’t see any sign that he actually did.
You are strawmanning the argument significantly. I would word it more like this:
“Building an AI that follows utilitarianism will lead to me getting tortured. I don’t want to be tortured. Therefore I don’t want such an AI to be built.”
That’s partially because EY fought against it so hard and even silenced the discussion.
So there are two significant differences between your version and mine. The first is that mine says “might” and yours says “will”, but I’m pretty sure Roko wasn’t by any means certain that that would happen. The second is that yours ends “I don’t want such an AI to be built”, which doesn’t seem to me like the right ending for “a case against utilitarianism”.
(Unless you meant “a case against building a utilitarian AI” rather than “a case against utilitarianism as one’s actual moral theory”?)
I should have mentioned that it’s conditional on the Basilisk being correct. If we build an AI that follows that line of reasoning, then it will torture. If the basilisk isn’t correct for unrelated reasons, then this whole line of reasoning is irrelevant.
Anyway, the exact certainty isn’t too important. You use the word “might”, as if the probability of you being tortured was really small. Like the AI would only do it in really obscure scenarios. And you are just as likely to be picked for torture as anyone else.
Roko believed that the probability was much higher, and therefore worth worrying about.
Well the AI is just implementing the conclusions of utilitarianism (again, conditional on the basilisk argument being correct.) If you don’t like those conclusions, and if you don’t want AIs to be utilitarian, then do you really support utilitarianism?
It’s a minor semantic point though. The important part is the practical consequences for how we should build AI. Whether or not utilitarianism is “right” is more subjective and mostly irrelevant.
All I know about what Roko believed about the probability is that (1) he used the word “might” just as I did and (2) he wrote “And even if you only think that the probability of this happening is 1%, …” suggesting that (a) he himself probably thought it was higher and (b) he thought it was somewhat reasonable to estimate it at 1%. So I’m standing by my “might” and robustly deny your claim that writing “might” was strawmanning.
If you’re standing in front of me with a gun and telling me that you have done some calculations suggesting that on balance the world would be a happier place without me in it, then I would probably prefer you not to be utilitarian. This has essentially nothing to do with whether I think utilitarianism produces correct answers. (If I have a lot of faith in your reasoning and am sufficiently strong-minded then I might instead decide that you ought to shoot me. But my likely failure to do so merely indicates typical human self-interest.)
Perhaps so, in which case calling the argument “a case against utilitarianism” is simply incorrect.
Roko’s argument implies the AI will torture. The probability you think his argument is correct is a different matter. Roko was just saying that “if you think there is a 1% chance that my argument is correct”, not “if my argument is correct, there is a 1% chance the AI will torture.”
This really isn’t important though. The point is, if an AI has some likelihood of torturing you, you shouldn’t want it to be built. You can call that self-interest, but that’s admitting you don’t really want utilitarianism to begin with. Which is the point.
Anyway this is just steel-manning Roko’s argument. I think the issue is with acausal trade, not utilitarianism. And that seems to be the issue most people have with it.
I don’t think we are in disagreement here.
The basilisk could be a concern only if an AI that would carry out such type of blackmail was built. Once Roko discovered it, if he thought it was a plausible risk, then he had a selfish reason to prevent such AI from being built. But even if he was completely selfless, he could reason that somebody else could think of that argument, or something equivalent, and make it public, hence it was better sooner than later, allowing more time to prevent that design failure.
Also I’m not sure what private channles you are referring to. It’s not like there is a secret Google Group of all potential AGI designers, is there?
Privately contacting Yudkowsky or SIAI/SI/MIRI wouldn’t have worked. Why would Roko trust them to handle that information correctly? Why would he believe that they had leverage over or even knowledge about arbitrary AI projects that might end up building an AI with that particular failure mode?
LessWrong was at that time the primary forum for discussing AI safety issues. There was no better place to raise that concern.
It wasn’t just that. It was an argument against utilitarianism AND a decision theory that allowed to consider “acausal” effects (e.g. any theory that one-boxes in Newcomb’s problem). Since both utilitarianism and one-boxing were popular positions on LessWrong, it was reasonable to discuss their possible failure modes on LessWrong.
Just to be sure, since you seem to disagree with this opinion (whether it is actually Yudkowsky’s opinion or not), what exactly is it that you believe?
a) There is absolutely no way one could be harmed by thinking about not-yet-existing dangerous entities; even if those entities in the future will be able to learn about the fact that the person was thinking about them in this specific way.
b) There is a way one could be harmed by thinking about not-yet-existing dangerous entities, but the way to do this is completely different from what Roko proposed.
If it happens to be (b), then it still makes sense to be angry about publicly opening the whole topic of “let’s use our intelligence to discover the thoughts that may harm us by us thinking about them—and let’s do it in a public forum where people are interested in decision theories, so they are more qualified than average to find the right answer.” Even if the proper way to harm oneself is different from what Roko proposed, making this a publicly debated topic increases the chance of someone finding the correct solution. The problem is not the proposed basilisk, but rather inviting people to compete in clever self-harm; especially the kind of people known for being hardly able to resist such invitation.
I’m not the person you replied to, but I mostly agree with (a) and reject (b). There’s no way you can could possibly know enough about a not-yet-existing entity to understand any of its motivations; the entities that you’re thinking about and the entities that will exist in the future are not even close to the same. I outlined some more thoughts here.