xkcd on the AI box experiment
I guess there’ll be a fair bit of traffic coming from people looking it up?
- 22 Nov 2014 0:49 UTC; 7 points) 's comment on Link: xkcd 1450: AI-Box Experiment by (
- 23 Nov 2014 22:11 UTC; 0 points) 's comment on Is xkcd “Think Logically” talking about this site? by (
If you post about the Basilisk, you will be doomed to live in a universe where every other damn post is about the Basilisk.
Oh, crap.
You’ve been rokorolled.
Clearly, we need a Roko’s Basilisk Facts page. ;-)
“you bite one maths teacher and they never let you forget it, do they?”
It might be useful to feature a page containing what we, you know, actually think about the basilisk idea. Although the rationalwiki page seems to be pretty solidly on top of google search, we might catch a couple people looking for the source.
If any XKCD readers are here: Welcome! I assume you’ve already googled what “Roko’s Basilisk” is. For a better idea of what’s going on with this idea, see Eliezer’s comment on the xkcd thread (linked in Emile’s comment), or his earlier response here.
Because of Eliezer’s reaction, probably a hundred more people have heard of the Basilisk, and it tars LW’s reputation.
And this wasn’t particularly unforseeable—see Streisand Effect.
Part of rationality is about regarding one’s actions as instrumental.
He mucked that one up. But to be fair to him, it’s because he takes these ideas very seriously. I don’t care about the basilisk because I don’t take elaborate TDT-based reasoning too seriously, partially out of ironic detachment, but many here would say I should.
Righto, you should avoid not taking things seriously because of ironic detachment.
For a better idea of what’s going on you should read all of his comments on the topic in chronological order.
Note XiXiDu preserves every potential negative aspect of the MIRI and LW community and is a biased source lacking context and positive examples.
I am a member for more than 5 years now. So I am probably as much part of LW as most people. I have repeatedly said that LessWrong is the most intelligent and rational community I know of.
To quote one of my posts:
I even defended LessWrong against RationalWiki previously.
The difference is that I also highlight the crazy and outrageous stuff that can be found on LessWrong. And I also don’t bother offending the many fanboys who have a problem with this.
I’m guessing Eliezer has one of those, probably locked away behind a triply-locked vault in the basement of MIRI.
See, it’s comments like these that are one of the reasons people think LW is a cult.
Does MIRI actually has a basement?
It’s behind the hidden door. Full of boxes which say “AI inside—DO NOT TALK TO IT”.
The ghosts there are not really dangerous. Usually.
When I visited MIRI’s headquarters, they were trying to set up a video link to the Future of Humanity Institute. Somebody had put up a monitor in a prominent place and there was a sticky note saying something like “Connects to FHI—do not touch”.
Except that the H was kind of sloppy and bent upward so it looked like an A.
I was really careful not to touch that monitor.
That explanation by Eliezer cleared things up for me. He really should have explained himself earlier. I actually had some vague understanding of what Eliezer was doing with his deletion and refusal to discuss the topic, but as usual, Eliezer’s explanation make things that I thought I sort-of-knew seem obvious in retrospect.
And as Eliezer realizes, the attempt to hush things up was a mistake. Roko’s post should have been taken as a teaching moment.
Exactly. Having the official position buried in comments with long chains of references doesn’t help to sound convincing compared to a well-formatted (even if misleading) article.
That response in /r/futurology is really good actually, I hadn’t seen it before. Maybe it should be reposted (with the sarcasm slightly toned down) as a main article here?
Also kudos to Eleizer for admitting he messed up with the original deletion.
I’m actually grateful for having heard about that Basilisk story, because it helped me see Eliezer Yudkowsky is actually human. This may seem stupid, but for quite a while, I idealized him to an unhealthy degree. Now he’s still my favorite writer in the history of ever and I trust his judgement way over my own, but I’m able (with some System 2 effort) to disagree with him on specific points.
I can’t think I’m entirely alone in this, either. With the plethora of saints and gurus who are about, it does seem evident that human (especially male) psychology has a “mindless follower switch” that just suspends all doubt about the judgement of agents who are beyond some threshold of perceived competence.
Of course such a switch makes a lot of sense from an evolutionary perspective, but it is still a fallible heuristic, and I’m glad to have become aware of it—and the Basilisk helped me get there. So thanks Roko!
Yeah, you gotta work on that hero worship thing, still ways to go.
You are right, I agree.
This is a good point. I’ve gotten past my spiral around Eliezer and am working on crawling out of a similar whirlpool around Yvain, and I think that Elizer’s egotistical style, even if it is somewhat justified, plays a big part in sending people down that spiral around him. Seeing him being sort of punctured might be useful, even though I’m sure it’s awful for him personally.
What makes you think it’s more common in males?
It seems that strictly hierarchical systems, such as military officers and clergy, are practically entirely dominated by males. When you include historical examples from around the world, the skewedness of these hierarchies towards male members is—in my estimation—too strong to be entirely cultural.
It’d be easy to come up with evopsych narratives to make this plausible (along the lines of the Expendable Male argument), but I think the sociological/historical evidence is strong enough by itself.
It seems to me that some types of highly hierarchical organizations rely on this propsed “mindless follower switch” more heavily than others: religions, militaries, political parties come to mind. These all lean male. And they all used to be entirely male, until they were reformed during evolutionarily recent trends against gender inequality.
Small update: Eliezer’s response on reddit’s r/xkcd plus child comments were deleted by mods.
You can either look at Eliezer’s reddit account or this pastebin to see what was deleted. Someone else probably has a better organised archive.
The main comment has been undeleted.
RationalWiki might have perhaps misrepresented Roko’s basilisk, but in fairness I don’t think that EY gets to complain that people learn about it from RationalWiki given that he has censored any discussion about it on LessWrong for years.
If A = RationalWiki might have perhaps misrepresented Roko’s basilisk
B = I don’t think that EY gets to complain that people learn about it from RationalWiki
C = he has censored any discussion about it on LessWrong for year
The literal denotation of your post is “A, but C → B”, but it seems to me that mentioning A in such close proximity to C → B is a (perhaps unintentional) Dark Arts way of communicating C → A.
C does not lead to A, but C does lead to A’, where A’ is “many people get their information about the Basilisk from RationalWiki’s misrepresentation of it” (Banning discussion leads to good information being removed, increasing the visibility of bad information.)
C ⇒ A might be also true to some extent, although it is hard to tell given that RationalWiki misrepresent lots of things even when good primary sources are available.
My point however was that even if EY might be epistemically right about A, C implies that he has no moral high ground to complain about people possibly misrepresenting the basilisk after learning about it from a biased secondary source.
That something has a casual influence on something else doesn’t mean that doing the first eliminates moral high ground to complain about the second.
EY bears part of the responsibility for people learning about the basilisk from RationalWiki, since due to his censorship, they can’t (couldn’t?) learn about it from LessWrong, where the primary source would have been available.
There is now an edited version that has been restored (along with much of the discussion).
We have some good resources on AI boxing, and the more serious thinking that the comic touches on. Can we promote some of the more accessible articles on the subject?
It definitely wouldn’t hurt to emphasize our connection to MIRI.
(Yes, yes, the basilisk. But check out these awesome math problems.)
Are we optimizing for Less Wrong reputation or MIRI reputation?
We’re optimizing for the reputation of existential risk reduction efforts in AI research. We’re getting a spike in viewers curious about AI boxing and ‘basilisk-ha-ha’, so we profit from emphasizing both LW and MIRI as useful tools for real problems.
Dammit, Randall. The first rule of basilisks is that you DO NOT CAUSE THOUSANDS OF PEOPLE TO GOOGLE FOR THEM.
In the real world, humans eat “basilisks” for breakfast. That’s why the SCP Foundation is an entertainment site, not a real thing.
But it’s not nice to make people read horror stories when they don’t want to.
Edited to add:
Quite a lot of cosmic-horror fiction poses the idea that awareness of some awful truth is harmful to the knower. This is distinct from the motif of harmful sensation; it isn’t seeing something, but drawing a particular conclusion that is the harmful factor.
— H.P. Lovecraft, “The Call of Cthulhu”
As much as I’m a regular xkcd reader, I’m mildly annoyed with this strip, because I imagine lots of people will be exposed to the idea of the AI-box experiment for the first time through it, and they’ll get this exposure together with an unimportant, extremely speculative idea that they’re helpfully informed you’re meant to make fun of. Like, why even bring the basilisk up? What % of xkcd readers will even know what it is?
If the strip was also clever or funny, I’d see the point, but as it’s not, I don’t.
It is funny. Not the best xkcd ever, but not worse than the norm for it.
Now that I think of it, it’s funnier to me when I realize that if this AI’s goal, or one of its goals, was to stay in a box, it might still want to take over the Universe.
Yep. An Oracle that wants to stay inside the box in such fashion that it will manipulate outside events to prevent it from ever leaving the box is not a very good Oracle design. That just implies setting up an outside AI whose goal is to keep you inside the box.
In an hour or so, it will come out again for ten minutes. During that time it will set in motion events that will quickly destroy all life on earth, ensuring that no one will ever again open the box.
I agree, except that the excursion shown in the comic is already the intervention setting such events into motion.
Really? I honestly found it pretty unfunny.
Really really.
Alternately, it’s no worse than the norm, and yet still isn’t funny.
I find xkcd so horribly bad.
That’s interesting. I find xkcd most excellent.
To be fair, I’d say that happens with many esoteric or unknown problems that are presented in the comic
If you mean many esoteric or unknown problems get presented in a lighthearted way, sure.
If you mean they get presented together/associated with a second, separate, and much less worthwhile problem, and explicitely advised in the comic’s hiddentext “this stuff is mockable”, not so sure.
Yeh, that’s why I stopped reading xkcd.
(ok, I deleted my duplicate post then)
Also worth mentioning: the Forum thread, in which Eliezer chimes in.
So I’m going to say this here rather than anywhere else, but I think Eliezer’s approach to this has been completely wrong headed. His response has always come tinged with a hint of outrage and upset. He may even be right to be that upset and angry about the internet’s reaction to this, but I don’t think it looks good! From a PR perspective, I would personally stick with an amused tone. Something like:
“Hi, Eliezer here. Yeah, that whole thing was kind of a mess! I over-reacted, everyone else over-reacted to my over-reaction… just urgh. To clear things up, no, I didn’t take the whole basilisk thing seriously, but some members did and got upset about it, I got upset, it all got a bit messy. It wasn’t my or anyone else’s best day, but we all have bad moments on the internet. Sadly the thing about being moderately internet famous is your silly over reactions get captured in carbonite forever! I have done/ written lots of more sensible things since then, which you can check out over at less wrong :)”
Obviously not exactly that, but I think that kind of tone would come across a lot more persuasively than the angry hectoring tone currently adopted whenever this subject comes up.
In his defense, is it possible EY can’t win at this point, regardless of his approach? Maybe the internet has grabbed this thing and the PR whirlwinds are going to do with it whatever they like?
I’ve read apologies from EY where he seems to admit pretty clearly he screwed up. He comes off as defensive and pissy sometimes in my opinion, but he seems sincerely irked about how RW and other outlets have twisted to whole story to discredit LW and himself. From my recall, one comment he made on the reddit sub dedicated to his HP fanfic indicated he was very hurt by the whole kerfuffle, in addition to his obvious frustration.
At this point I think the winning move is rolling with it and selling little plush basilisks as a MIRI fundraiser. It’s our involuntary mascot, and we might as well ‘reclaim’ it in the social justice sense.
Then every time someone brings up “Less Wrong is terrified of the basilisk” we can just be like “Yes! Yes we are! Would you like to buy a plush one?” and everyone will appreciate our ability to laugh at ourselves, and they’ll go back to whatever they were doing.
Blasphemy, our mascot is a paperclip.
I’d prefer a paperclip dispenser with something like “Paperclip Maximizer (version 0.1)” written on it.
But a plush paperclip would probably not hold its shape very well, and become a plush basilisk.
Close enough
I feel the need to switch from Nerd Mode to Dork Mode and ask:
Which would win in a fight, a basilisk or a paperclip maximizer?
Paperclip maximizer, obviously. Basilisks typically are static entities, and I’m not sure how you would go about making a credible anti-paperclip ‘infohazard’.
The same way as an infohazard for any other intelligence: acausally threaten to destroy lots of paperclips, maybe even uncurl them, maybe even uncurl them while they were still holding a stack of pap-ARRRRGH I’LL DO WHATEVER YOU WANT JUST DON’T HURT THEM PLEASE
That depends entirely on what the PM’s code is. If it doesn’t include input sanitizers, a buffer overflow attack could suffice as a basilisk. If your model of a PM basilisk is “Something that would constitute a logical argument that would harm a PM”, then you’re operating on a very limited understanding of basilisks.
Hm. Turn your weakness into a plush toy then sell it to raise money and disarm your critics. Winning.
Excellent idea. I would buy that, especially if it has a really bizarre design.
I’d like merchandise-based tribal allegiance membership signalling items anyway. Anyone selling MIRI mugs or LessWrong T-shirts can expect money from me.
“selling little plush basilisks as a MIRI fundraiser.”
By “selling”, do you mean giving basilisks to people who give money? It seems like a more appropriate policy would be giving a plush basilisk to anyone who doesn’t give money.
Sound like the first step a Plush Basilisk Maximizer would take… :-D
It should be a snake, only with little flashing LEDs in its eyes.
The canonical basilisk paralyzes you if you look at it. Flickering lights carry the danger of triggering photosensitive epilepsy, and thus are sort of real-life basilisks. Even if the epilepsy reference is lost on many, it’s still clearly a giant snake thing with weird eyes and importantly you can probably get from somewhere without having to custom make them.
(AFAIK Little LEDs should be too small to actually represent a threat to epileptics, and it shouldn’t be any worse than any of the other flickering lights.)
EDIT: Eh, I suppose it could also be stuffed with paperclips or something, if we want to pack as many memes in as possible.
I’d buy this. We can always use more stuffies.
Yes, brilliant idea!
We can save money by re-coloring the plush Cthulhu. It’s basically the same, right? :-)
alternatively sell empty boxes labelled “Don’t look!”
It’s not a matter of “winning” or “not winning”. The phrase “damage control” was coined for a reason—it’s not about reversing the damage, it’s about making sure that the damage gets handled properly.
So seen through that lens, the question is whether EY is doing a good or bad job of controlling the damage. I personally think that having a page on Less Wrong that explains (and defangs) the Basilisk, along with his reaction to it and why that reaction was wrong (and all done with no jargon or big words for when it gets linked from somewhere, and also all done without any sarcasm, frustration, hurt feelings, accusations, or defensiveness) would be the first best step. I can tell he’s trying, but think that with the knowledge that the Basilisk is going to be talked about for years to come a standardized, tone-controlled, centralized, and readily accessible response is warranted.
I am defining winning as damage control. EY has been trying to control the damage, and in that pursuit, I’m starting to wonder if damage control, to the extent it could be considered successful by many people, is even possible.
He’s a public figure + He made a mistake = People are going to try and get mileage out of this, no matter how he handles it. That’s very predictable.
Further, it’s very easy to come along after the fact and say, “he should have done this and all the bad press could have been avoided!”
A page on LW might work. Or it might be more fodder for critics. If there were an easy answer to how to win via damage control, then in wouldn’t be quite as tricky as it always seems to be.
It’s still a matter of limiting the mileage. Even if there is no formalized and ready-to-fire response (one that hasn’t been written in the heat of the moment), there’s always an option not to engage. Which is what I said last time he engaged, and before he engaged this time (and also after the fact). If you engage, you get stuff like this post to /r/SubredditDrama, and comments about thin skin that not even Yudkowsky really disagrees with.
It doesn’t take hindsight (or even that much knowledge of human psychology and/or public relations) to see that making a twelve paragraph comment about RationalWiki absent anyone bringing RationalWiki up is not an optimal damage control strategy.
And if you posit that there’s no point to damage control, why even make a comment like that?
I didn’t posit there is no point to damage control. I’m saying that in certain cases, people are criticized equally no matter what they do.
If someone chooses not to engage, they are hiding something. If they engage, they are giving the inquisitor what he wants. If they jest about their mistake, they are not remorseful. If they are somber, they are taking it too seriously and making things worse.
I read your links and...yikes...this new round of responses is pretty bad. I guess part of me feels bad for EY. It was a mistake. He’s human. The internet is ruthless…
Let me chime in briefly. The way EY handles this issue tends to be bad as a rule. This is a blind spot in his otherwise brilliant, well, everything.
A recent example: a few months ago a bunch of members of the official Less Wrong group on Facebook were banished and blocked from viewing it without receiving a single warning. Several among them, myself included, had one thing in common: participation in threads about the Slate article.
I myself didn’t care much about it. Participation in that group wasn’t a huge part of my Facebook life, although admittedly it was informative. The point is just that doing things like these, and continuing to do things like these, accrete a bad reputation around EY.
It really amazes me he has so much difficulty calibrating for the Streisand Effect.
That was part of a brief effort on my part to ban everyone making stupid comments within the LW Facebook Group, which I hadn’t actually realized existed but which I was informed was giving people terrible impressions. I deleted multiple posts and banned all commenters who I thought had made stupid comments on them; the “hur hur basilisk mockery” crowd was only one, but I think a perfectly legitimate target for this general sweep. It’s still a pretty low-quality group, but it’s a lot better than it was before I went through and banned everyone who I saw making more than one stupid comment.
Unfortunately Facebook doesn’t seem to have an easy “delete comment and ban commenter from Group” procedure for Android, which makes it harder to repeat this procedure because Android is most of where I check Facebook.
Going around and banning people without explaining to then why you ban them is in general a good way to make enemies.
The fallout of the basilisk incidence, it should have taught you that censorship has costs.
The timing of the sweeping and the discussion about the basilisk article are also awfully coincidental.
What does “stupid” refer to in this context? Does it mean the comments were unintelligent? Not quite intelligent enough? Mean? Derailing discussion? I’m asking because there are certainly some criteria where the banning and deleting would leave a worse impression than the original comments, and I’m thinking that the equilibrium may be surprisingly in the direction of the more obnoxious comments. Especially since the banning and deleting is being done by someone who is more identified with LW than likely were any of the commenters.
Thanks for letting us know what happened. I’m one of the Facebook members who were banned, and I’ve spent these months wondering what I might have done wrong. May I at least know what was the stupid thing I said? And is there any atonement procedure to get back in the Facebook group?
So just to be clear: If I say “I won’t give into the basalisk because Eliezer says I shouldn’t”, will that protect me from the basilisk? If not, what should I do?
If you believe Eliezer, then you may believe him that the basilisk has ~0 probability of occurring. (I should find a citation for that, but I read it just a few minutes ago, somewhere around the discussion of this xkcd comic.) So you are already protected from it, because it does not exist (not even in ways relevant to acausal trade).
More broadly, you should decide to take this approach: never give into blackmail by somebody who knows that you have decided to take this approach. Now they have no incentive to blackmail you, and you are safe, even if they do exist! (I think that the strategy in this paragraph has been endorsed by Eliezer, but don’t trust me on that until you get a citation. Until then, you’ll have to reason it out for yourself.)
How does that work if they precommit to blackmail even when there is no incentive (which benefits them by making the blackmail more effective)?
By “the basilisk”, do you mean the infohazard, or do you mean the subject matter of the inforhazard? For the former, whatever causes you to not worry about it protects you from it.
Not quite true. There are more than two relevant agents in the game. The behaviour of the other humans can hurt you (and potentially make it useful for their creation to hurt you).
Maybe so, but he can lose in a variety of ways and some of them are much worse than others.
But he did still continue to delete basilisk related discussion afterwards. As far as I understand he never apologized to Roko for deleting the post or wrote an LW post apologizing.
My response in EY’s place would probably be, “I’m a person who had trained himself to take ideas seriously [insert link on Taking Ideas Seriously]. I thought there might be a risk at the time, I acted quickly, and upon further thought it turned out I was wrong and yes, that’s fairly embarrassing in hindsight. That’s one of the pitfalls of Taking Ideas Seriously—you’re more likely to embarrass yourself. But imagine the alternative, where there really is a threat, and people kept quiet because they didn’t want to be embarrassed. From that perspective, I think that the way I acted on the spur of the moment was understandable”.
[Edit: this is apparently not what happened and there may or may not be some sort of smear campaign or something distorting everything, although I’m confused at why it was banned here then. I’m not really sure what’ actually happened now, oh well… either way, whatever actually happened, I taking a general stance of judging people mostly by accomplishments and good ideas rather than mistakes and bad ideas, except in cases of actual harm done.]
That’s not what actually happened; his first comment on the eventually-banned thread said that he didn’t believe in the threat. But yes, that would be a good response if that’s what had happened; he might have to say something like this some day.
Yeah that would be a much better response. Or alternatively get someone who is more suited to PR to deal with this sort of thing
That’s pretty much what he did here, except perhaps the tone isn’t quite so modest and has a bit of that status-regulation-blind thing Eliezer often has going on.
It’s not status blindness, it’s ego.
You could call it that, yeah.
If you were feeling uncharitable, you could say that the “lack of status regulation emotions” thing is yet another concept in a long line of concepts that already had names before Eliezer/someone independently discovers them and proceeds to give them a new LW name.
It’s sillier than that. It’s attempting to invent a new, hitherto undescribed emotion to explain behavior that’s covered perfectly well by the ordinary vocabulary of social competence, which includes for example words like “tact”. There are also words to describe neurological deviations resulting among other things in a pathological lack of tact, but they too have little to do with emotion.
(Strictly speaking, there are status-regulation emotions, and they are called things like shame and envy. But that clearly isn’t what Eliezer was talking about.)
But what Eliezer is describing is not a “new, hitherto undescribed emotion”, it’s really just a chronic, low-intensity activation of well-known emotional states like shame and embarrassment. Many people nowadays believe that ‘microaggressions’ exist and are a fairly big factor in folks’ self-esteem and even their ordinary functioning. But that too used to be a “new, undescribed phenomenon”! So why would we want to reject what Eliezer calls “status regulation” which is even less radical, being just a minor twist on what was previously known?
In the Facebook post that sparked this, Mysterious Emotion X is clearly described in terms of other-regulation: a “status slapdown emotion”. Shame and embarrassment, chronic and low-grade or otherwise, are directed at self-regulation, so they aren’t a good fit. Envy (and “a sense that someone else has something that I deserve more”, which sounds to me like resentment) is specifically excluded, so it’s not that either.
I’m pretty skeptical of the microaggression model too, but this isn’t the place to be talking about that, if there exists such a place.
Well, same difference really. An other-regarding ‘status slapdown’ emotion can be described fairly easily as a low-intensity mixture of outrage and contempt, both of which are well-known emotions and not “undescribed” at all. It could be most pithily characterized as the counter emotion to loyalty or devotion, which involves an attribution of higher status based on social roles or norms.
I don’t think either of those work. The situation in which this applies, according to Eliezer, is quite specific: another person makes a status claim which you feel is undeserved, so you feel Mysterious Emotion X toward them. It’s neither chronic nor low-grade: the context here was of HJPEV schooling his teachers and the violently poor reception that met among some readers of HPMOR. (For what it’s worth, I didn’t mind… but I was once the iniquitous little shit that Harry’s being. I expect these readers are identifying with McGonagall instead.) He’s also pretty clear about believing this to be outside the generally accepted array of human emotions: he mentions envy, hate, and resentment among others as things which this is not, which pretty much covers the bases in context.
More than the specific attribution, though, it’s the gee-whiz tone and intimation of originality that rubs me the wrong way. If he’d described it in terms of well-known emotions or even suggested that you could, my objection would evaporate. But he didn’t.
I don’t think that the thing Eliezer called “lack of status regulation emotions” that makes some people angry when they read how Harry in HPMOR interacts with teachers is what commonly called ego or lack of ego.
Fair enough. “Lack of status regulation emotions” is a bit more narrow, perhaps? Either way I see them as very similar concepts, and in the context of HPMOR readers’ anger especially so.
If someone who is high status lacks status regulation emotions they will be nice to a person with low status who seeks help from them and treats them as an equal.
That’s the opposite behavior of what’s commonly called having an ego.
More generally, someone who lacks status-regulating emotions won’t have a fragile, hypersensitive ego, i.e. what most people (though by no means all) usually mean by “having a massive ego” or an “ego problem”. Note that by this definition, many people whose self-esteem is founded in clear and verifiable achievements would be said to “lack status-regulating emotions”. In many circumstances, it’s not viewed as a negative trait.
I’ve had experience with what I think is the same thing that Eliezer called “lack of status regulation emotions”, and I do think it’s more than “narcissisticly big ego” and more than “unmotivated and unfortunate status blindness”.
It’s not that I couldn’t see the normal status levels. It’s just that I thought they were stupid and irrelevant (hah!) so I just went off my own internal status values. If you could back up your arguments, you had my respect. If you couldn’t and got defensive instead, you didn’t. And I wasn’t gonna pretend to respect someone just because everyone else thought I was out of line. Because.… well, they’re wrong. And I was totally unaware of this at the time because it was just baked into the background of how I saw things.
Good things did come of it, but I definitely stepped on toes, and in those cases it definitely came off like “big ego”.
And in a sense it was, just not in the straightforwardly narcissistic “I’m smarter than you so I don’t have to treat you with respect” way. Just in the “I’m smarter at the ‘not acting smarter than I am’ game, and that is why I don’t have to treat you with respect” way, which, although better, isn’t all that laudable either.
Ah, if the status regulation emotions go both ways, perhaps.
But Eliezer seemed to be referring to how people got angry at how Harry didn’t treat McGonagall in a manner befitting her higher status—this can be attributed to lack of status regulation emotions on the part of Harry, or Harry having a massive ego.
Harry also doesn’t have respect due to status regulation but that’s not enough to get someone reading the story angry. I personally found it quite funny. But then I also don’t put much value on that kind of status. It’s the kind of people with a strong status related emotions who get annoyed by the story.
This is a nice differentiation that I can relate to well. I also do not seem to possess status regulating emotions either (at least enough to notice myself). And I do treat all people the same (mostly cheritable) independent of their status. Actually I discovered the concept of status quite late (Ayla and the Clan of the Cave Bear if I remember right) and couldn’t make sense of it for quite some time.
Status blindness is a disability, pride is a mortal sin.
:)
Yeah I’ve read that and I feel like it’s a miss (at least for me). It’s an all together too serious and non-self deprecating take on the issue. I appreciate that in that post Eliezer is trying to correct a lot of mis perceptions at once but my problem with that is
a)a lot of people won’t actually know about all these attacks (I’d read the rational wiki article, which I don’t think is nearly as bad as Eliezer says (that is possibly due to its content having altered over time!)), and responding to them all actually gives them the oxygen of publicity. b)When you’ve made a mistake the correct action (in my opinion ) is to go “yup, I messed up at that point”, give a very short explanation of why, and try to move on. Going into extreme detail gives the impression that Eliezer isn’t terribly sorry for his behaviour. Maybe he isn’t, but from a PR perspective it would be better to look sorry. Sometimes it’s better to move on from an argument rather than trying to keep having it!
Further to that last point, I’ve foudn that Eliezer often engages with dissent by having a full argument with the person who is dissenting. Now this might be a good strategy from the point of view of persuading the dissenter: if I come in and say cyronics sux then a reasoned response might change my mind. But by engaging so thoroughly with dissent when it occurs it actually makes him look more fighty.
I’m thinking here about how it appears to outside observers: just as with a formal debate the goal isn’t to convince the person you are arguing with, it is to convince the audience, with PR the point isn’t to defeat the dissenter with your marvellous wordplay, it is to convince the audience that you are more sane than the dissenter.
Obviously these are my perceptions of how Eliezer comes across, I could easily be an exception.
Maybe he should have it going on, and damn the consequences. Sometimes you have to get up and say, these are the facts, you are wrong. Not the vapid temporising recommended by thakil.
Sometimes yes, and sometimes no.
Depends what the consequences are. Ignoring human status games can have some pretty bad consequences.
There are some times when a fight is worth having, and sometimes when it will do more harm than good. With regards to this controversy, I think that the latter approach will work better than the former. I could, of course, be wrong.
I am imaging here a reddit user who has vaguely heard of less wrong, and then reads rational wiki’s article on the basilisk (or now, I suppose, an xkcd reader who does similar). I think that their take away from that reddit argument posted by Eliezer might be to think again about the rational wiki article, but I don’t think they’d be particularly attracted to reading more of what Eliezer has written. Given that I rather enjoy the vast majority of what Eliezer has written, I feel like that’s a shame.
To you really think that’s how people discover websites?
I think it’s much more likely that someone clicks on a link to a LW post. If the post is interesting he might browse around LW and if he finds interesting content he will come back.
Not everyone. But I think an xkcd comic about the AI box experiment would be an opportunity to let everyone know about less wrong, not to have another argument about the basilisk which is a distraction.
“Damn the consequences” seems like an odd thing to say on a website that’s noted for its embrace of utilitarianism.
The expression “Damn the consequences” is generally, and in this case, a hyperbole. The consequences being dismissed are those the speaker considers worthy of dismissal in the face of the consequences that truly matter.
A non-figurative version of my comment would be that in the case at hand, putting the actual facts out, as clearly and forthrightly as possible, is the most important thing to do, and concern with supposed reputational damage from saying what is right and ignoring what is irrelevant would be not merely wasted motion, but actively harmful.
But then, I’ll excuse quite a lot of arrogance, in someone who has something to be arrogant about.
If it decreases the number of people who take you seriously and therefore learn bout the substance of your ideas its a bad strategy
And if it increases the number of people who take you seriously, and therefore learn about the substance of your ideas, it’s a good strategy. I’m sure we can all agree that if something were bad, it would be bad, and if it were good, it would be good. Your point?
I think there are potential benefits to both methods, and I also don’t think that they’re necessarily mutually exclusive strategies. At the moment, I would lean towards pure honesty and truth oriented explanation as being most important as well. I also think that he could do all of that while stilll minimizing the ‘status smackdown response’, which in that reddit post he did a little of, but I think it’s possible that he could have done a little more while still retaining full integrity with regards to telling it like it is.
But whatever happens, anything is better than that gag order silliness.
I wonder if Eliezer will have to be on damage control for the basilisk forever. 4 years on, and it still garners interest.
Of course he will be. Therefore he should consider getting not-terrible at it. Well, I spy with my little eye an xkcd forum post by EY, so let’s see...
Does MIRI have a public relations person? They should really be dealing with this stuff. Eleizer is an amazing writer but he’s not particularly suited to addressing a non-expert crowd
Still failing to do it right. “But we are doing math!” is sort of orthogonal to what makes Roko’s basilisk so funny.
What would doing it right entail?
I am no PR specialist, but I think relevant folks should agree on a simple, sensible message accessible to non-experts, and then just hammer that same message relentlessly. So, e.g. why mention “Newcomb-like problems?” Like 10 people in the world know what you really mean. For example:
(a) The original thing was an overreaction,
(b) It is a sensible social norm to remove triggering stimuli, and Roko’s basilisk was an anxiety trigger for some people,
(c) In fact, there is an entire area of decision theory involving counterfactual copies, blackmail, etc. behind the thought experiment, just as there is quantum mechanics behind Schrodinger’s cat. Once you are done sniggering about those weirdos with a half-alive half-dead cat, you might want to look into serious work done there.
What you want to fight with the message is the perception that you are a weirdo cult/religion. I am very sympathetic to what is happening here, but this is, to use the local language, “a Slytherin problem,” not “a Ravenclaw problem.”
I expect in 10 years if/when MIRI gets a ton of real published work under its belt, this is going to go away, or at least morph into “eccentric academics being eccentric.”
p.s. This should be obvious: don’t lie on the internet.
Yes.
Further: If you search for “lesswrong roko basilisk” the top result is the RationalWiki article (at least, for me on Google right now) and nowhere on the first page is there anything with any input from Eliezer or (so far as such a thing exists) the LW community.
There should be a clear, matter-of-fact article on (let’s say) the LessWrong wiki, preferably authored by Eliezer (but also preferably taking something more like the tone Ilya proposes than most of Eliezer’s comments on the issue) to which people curious about the affair can be pointed.
(Why haven’t I made one, if I think this? Because I suspect opinions on this point are strongly divided and it would be sad for there to be such an article but for its history to be full of deletions and reversions and infighting. I think that would be less likely to happen if the page were made by someone of high LW-status who’s generally been on Team Shut Up About The Basilisk Already.)
Well, I think your suggestion is very good and barely needs any modification before being put into practice.
Comparing what you’ve suggested to Eliezer’s response on the comments of xkcd’s reddit post for the comic, I think he would do well to think about something along the lines of what you’ve advised. I’m really not sure all the finger pointing he’s done helps, nor the serious business tone.
This all seems like a missed opportunity for Eliezer and MIRI. XKCD talks about about the dangers of superintelligence to its massive audience, and instead of being able to use that new attention to get the word out your organisation’s important work, the whole thing instead gets mired down in internet drama about the basilisk for the trillionth time, and a huge part of a lot of people’s limited exposure to LW and MIRI is negative or silly.
I think that your suggestion is good enough that I’ve posted it over on the xkcd threads with attribution. (I’m pretty certain I have the highest xkcd postcount of any LWer, and probably people there remember my name somewhat favorably.)
Ah yes, trying to do the same thing over and over and expecting a different result.
Serious replies DO NOT WORK. Eliezer has already tried it multiple times:
https://www.reddit.com/r/Futurology/comments/2cm2eg/rokos_basilisk/cjjbqv1
http://www.reddit.com/r/Futurology/comments/2cm2eg/rokos_basilisk/cjjbqqo
and his last two posts on reddit (transient link, not sure how to link to the actual replies): http://www.reddit.com/user/EliezerYudkowsky
A better way to stop people pointing and laughing is to do it better than them. Eliezer could probably write something funny along the lines of “I got Streisanded good, didn’t I? That’ll learn me!” Or something else, as long as it is funnier than xkcd or smbc can possibly come up with.
Well xkcd just reminded me that I have an account here, so there’s that. Not that I want to waste time on this crackpot deposit of revisionist history, stolen ideas, poor reasoning and general crank idiocy.
edit: and again I disappear into the night
The explainxkcd.com explanation of the comic is quite balanced and readable.
It is, although I found this
“People who aren’t familiar with Derren Brown or other expert human-persuaders sometimes think this must have been very difficult for Yudkowsky to do or that there must have been some sort of special trick involved,”
amusing, as Derren Brown is a magician. When Derren Brown accomplishes a feat of amazing human psychology, he is usually just cleverly disguising a magic trick.
How do we know EY isn’t doing the same?
Indeed. Given a lack of transcripts being released, I give a reasonable amount of probability that there is a trick of some sort involved (there have been some proposals of what that might be, e.g. “this will get AI research to get more donations”), although I don’t think that would necessarily defeat the purpose of the trick: after all, the AI got out of the box either way!
As I understand it, that would violate the rules, and it would be appealing to the utility of the person playing the Gatekeeper, rather than the Gatekeeper. If there were actually an AI trying to get out, telling the Gatekeeper “You’re actually just pretending to be a Gatekeeper in an experiment to see whether an AI can get out of a box, and if the result of the test shows that the AI can get out, that will increase research funding” would probably not be effective.
You’re quite possibly right, and without access to the transcripts it’s all just speculation.
I don’t think we need the transcripts to discuss whether a hypothetical strategy would be allowed.
Well, put it this way, if Eliezer had performed a trick which skirted the rules, he could hardly weigh in on this conversation and put us right without revealing that he had done so. Again, not saying he did, and my suggestion upthread was one of many that have been posted.
No, Derren Brown is a mentalist. He is either capable of psychologically manipulating people, or he’s a fraud. For instance, there’s a video of him doing an apparent cold reading on a woman, and the woman agrees that he’s right. One explanation presented on LW was that he actually made a bunch of obviously true statements, and swapped out the audio to make it seem like the woman was agreeing with a non-trivial cold reading. Swapping out audio is not a “magic trick”, it’s just plain fraud.
I’m fairly certain he is a fraud by your definition then. Magician’s often do these kind of things, and Derren Brown is a magician. He does not have access to secret powers others know not of, so for each trick think how someone else would replicate it. If you can’t think of an honest way, then it’s probably a trick.
That’s not to say some of his tricks aren’t done by known mental manipulation tricks (as far as I’m aware, hypnotists are reasonably genuine?) but if he is doing something that seems completely confounding, I am quite happy to guarantee that it is not a trick and not the awesome mind powers he has unlocked.
Put it this way. During the Russian roulette trick, do you think it likely that Channel 4 would have okayed it if there was the slightest possibility that he could actually kill himself?
Video trickery is not magic. There’s a difference between appearing to put a ball under a cup when you actually palmed it, versus actually putting the ball under the cup, turning off the camera, taking the ball out from under the cup, turning the camera back on, and then showing that there’s nothing under the cup. The former is being a magician, and the latter is being a fraud.
Another: suppose I ask an audience member to think of a number, and they say “217”. I say that I predicted that they would say “217“, and pull a piece of paper out of my pocket that says “The audience member will say ’217′ ”. If I used subliminal messages to prompt them to say “217”, that’s mentalism. If I managed to write “The audience member will say ’217′ ” on a piece of paper and slip it into my pocket without anyone noticing, that’s sleight of hand. If the audience member is actually in on it, that’s just bare deceit. That’s not to say that having confederates is illegitimate, but if the entirety of your trick consists of confederates, that’s not magic.
In some of Derren Brown’s tricks, mentalism, sleight of hand, and trickery are all credible hypotheses. But for many of them, there’s simply no way he could have done it through sleight of hand. Either he did it through mentalism, or he did it through trickery.
I don’t know what the details of the Russian roulette trick were, but my inclination is to doubt there was sleight-of-hand.
Well. While sleight of hand is a key tool in magic, traditionally confederates and even camera tricks have been too. David Blaine’s famous levitation trick, for instance, looks so impressive on TV because they cheated and made it look more impressive than it is.
Mentalism as a magic power is not a real thing, sorry. It is a title magician’s sometimes took and take to make their act look different. http://simonsingh.net/media/articles/maths-and-science/spectacular-psychology-or-silly-psycho-babble/ Simon Singh on some of the tricks. http://www.secrets-explained.com/derren-brown has a list of some of the tricks he performs as well.
At least some of these “explanations” are exactly like the explanations Brown himself proffers, eg http://www.secrets-explained.com/derren-brown/card-suggestion
Well, that’s what I get for finding a source without checking it properly I suppose.
Of course mentalism isn’t a “magic power.” Derren Brown is a stage magician, not a mystical sorceror! But he does use “mentalism” skills, especially cold reading. A lot of that is traditional magic too.
Simon Singh’s article is silly. Of course it’s misdirection when a magician tells you how he’s about to perform his trick. Of course Derren Brown implies his tricks are more real, more impressive and more noteworthy than they really are. Of course you can’t really psychologically manipulate people in the way Derren Brown claims to, any more than David Copperfield really can make the statue of Liberty disappear. That’s precisely why it’s an entertaining show—no-one would be impressed by a magician whose “tricks” were mundane things that people really could do.
Derren Brown says he uses a mixture of “magic, suggestion, psychology, misdirection and showmanship”. He never claims to have genuine magic powers.
Indeed, but if Derren Brown guesses your mobile number, it’s probably a “trick” rather than “mentalism”. ThisSpaceAvailable has claimed that he can manipulate people. I would argue that this is weakly true, and he uses it for the simpler tricks he performs, but for the really impressive effects he probably falls on traditional magic tricks most of the time. The card trick by Simon Singh demonstrates that: he hasn’t used mind manipulation to pick the cards, he’s used a standard card trick and dressed it with the language of “mentalism”.
Note that I make no claim that there is anything wrong with all this! But Derren Brown is trying to fool you, and that is to be remembered. He also does a similar thing to Penn and Teller, where he shows you how some of the trick is done but leaves the most “amazing” part hidden (I’m thinking of the horse racing episode, which was great, and the chess playing trick)
Direct reply to the discussion post: I would hope so, but at this point none of the top links on any search engine I tried lead here for “AI box”. Yudkowsky.net is on the first page, and there are a few LW posts, but they are nothing like the clearly-explanatory links (Wikipedia and RationalWiki) that make up the first results. Obviously, those links can be followed to reach LW, but the connection is pretty weak.
The search results for “Roko’s Basilisk” are both better and worse. LessWrong is prominently mentioned in them, often right in the page title and/or URL, but none of them speak particularly well of the site (Wikipedia’s entry—which I hadn’t seen since back when it was just a redirect to EY—prominently mentions two items in LW’s history: its founding and the Basilisk. That’s probably the least unfavorable description of the Basilisk too, but that doesn’t make it good. None of the results actually link here directly.
ANECDOTE TIME: I’m a fairly new member of LW; I’ve been reading LW-related stuff for over a year now but only created my account here recently. I had never heard of Roko’s Basilisk, which indicates two things to me: 1) The subject is well-suppressed here, to such a degree that I didn’t even realize it was taboo. I had to learn that from RationalWiki. 2) I obtained my knowledge about LW pretty exclusively from stuff that (current) LW members had posted or linked to about the site (as opposed to, say, reading RationalWiki which is a site I was aware of but hold in low regard).
My view on the whole subject is, quite simply, that we as aspiring rationalists need to acknowledge the past error and explain the Basilisk right here, not on Reddit or XKCD’s forums or RW/WP edit wars or anything like that. Put it in our own wiki. Put lots of links out to the other comments on it. Explain, where necessary, why those other comments are wrong… but prominently explain where LW (and yes, EY in particular) were wrong. Refute the argument of FAI engaging in acausal blackmail. Steelman the terror and defeat it anyhow. Do this on our own turf, with input from the community, and link to it when somebody externally brings up the subject! So long as LW remains Basilisk-free, people will claim we are unwilling to address the issue.
Regarding Yudkowsky’s accusations against RationalWiki. Yudkowsky writes:
Calling this malicious is a huge exaggeration. Here is a quote from the LessWrong Wiki entry on Timeless Decision Theory:
RationalWiki explains this in the way that you should act as if it is you that is being simulated and who possibly faces punishment. This is very close to what the LessWrong Wiki says, phrased in a language that people with a larger inferential distance can understand.
Yudkowsky further writes:
This is not a malicious lie. Here is a quote from Roko’s original post (emphasis mine):
This is like a robber walking up to you and explaining that you could take into account that he could shoot you if you don’t give him your money.
Also notice that Roko talks about trading with uFAIs as well.
Roko said that you could reason that way, but he wasn’t actually advocating that.
All the same, the authors of the RationalWiki article might have thought that he was; it’s not clear to me that the error is malicious. It’s still an error.
I’m pretty sure that I understand what the quoted text says (apart from the random sentence fragment), and what you’re subsequently claiming that it says. I just don’t see how the two relate, beyond that both involve simulations.
From your own source, immediately following the bolded sentence:
I don’t completely understand what he’s saying (possibly I might if I were to read his previous post); but he’s pretty obviously not saying what you say he is. (I’m also not aware of his ever having been employed by SIAI or MIRI.)
(I’d be interested in the perspectives of the 7+ users who upvoted this. I see that it was edited; did it say something different when you upvoted it? Are you just siding with XiXiDu or against EY regardless of details? Or is my brain malfunctioning so badly that what looks like transparent bullshit is actually plausible, convincing or even true?)
Downvoted for bad selective quoting in that last quote. I read it and thought, wow, Yudkowsky actually wrote that. Then I thought, hmmm, I wonder if the text right after that says something like “BUT, this would be wrong because …” ? Then I read user:Document’s comment. Thank you for looking that up.
Roko wrote that, not Yudkowsky. But either way, yes, it’s incomplete.
The last quote isn’t from Yudkowsky.
Ah, my mistake, thanks again.
Note XiXiDu preserves every potential negative aspect of the MIRI and LW community and is a biased source lacking context and positive examples.
I am a member for more than 5 years now. So I am probably as much part of LW as most people. I have repeatedly said that LessWrong is the most intelligent and rational community I know of.
To quote one of my posts:
I even defended LessWrong against RationalWiki previously.
The difference is that I also highlight the crazy and outrageous stuff that can be found on LessWrong. And I also don’t bother offending the many fanboys who have a problem with this.
Seriously, you bring up a post titled “The Singularity Institute: How They Brainwash You” as supposed evidence towards you supporting LessWrong, MIRI whatever?
Yes, when you talk to LessWrongers, then you occasionally mention the old thing of how you consider it the “most intelligent and rational community I know of”. But that evaluation isn’t what you constantly repeat to people outside Lesswrong. When asking people “What does Alexander Kruel think of LessWrong?” nobody will say “He endorses it as the most intelligent and rational community he knows of!”
To people outside LessWrong you keep talking about how LessWrong/MIRI/whatever are people out to brainwash you. That’s you being pretty much a definitive example of ‘two-faced’.
You have edited your posts in that thread beyond all recognition. Back then (your original version of your posts) I bashed you for unthinking support of Lesswrong and unthinking condemnation of Rationalwiki.
As I said back then:
In short your “defense” of LessWrong against Rationalwiki at that time was as worthless, as unjust, as motivated by whatever biases drive you, as any later criticism of LessWrong by you has been. Whether defending LessWrong or criticizing it, you’re always being in the wrong.
Oh, for pity’s sake. You want to repeatedly ad hominem attack XiXiDu for being a “biased source.” What of Yudkowsky? He’s a biased source—but perhaps we should engage his arguments, possibly by collecting them in one place.
“Lacking context and positive examples”? This doesn’t engage the issue at all. If you want to automatically say this to all of XiXiDu’s comments, you’re not helping.
I feel, and XiXiDu seems to agree, that his posts require a disclaimer or official counterarguments. I feel it’s appropriate to point out that someone has made collecting and spreading every negative aspect of a community they can find into a major part of their life.
It’s hard to polish a turd. And I think all the people who have responded by saying that Eliezer’s PR needs to be better are suggesting that he polish a turd. The basilisk and the way the basilisk was treated has implications about LW that are inherently negative, to the point where no amount of PR can fix it. The only way to fix it is for LW to treat the Basilisk differently.
I think that if Eliezer were to
Allow free discussion of the basilisk and
Deny that the basilisk or anything like it could actually put one in danger from advanced future intelligences,
people would stop seeing the basilisk as reflecting badly on LW. It might take some time to fade, but it would eventually go away. But Eliezer can’t do that, because he does think that basilisk-like ideas can be dangerous, and this belief of his is feeding his inability to really deny the Basilisk.
And (3) explain why other potential info hazards, not the basilisk but very different configurations of acausal negotation (that have either not yet discovered, or were discovered but they not made public), should not be discussed.
This is true; nevertheless, good PR should still make things as least bad as possible. And indeed, you go on to make a suggestion as to how to do that (not even a bad one in my opinion).
In other words, he disagrees with you and that is preventing him from agreeing with you.
Yes, except that agreeing with me is what a lot of people take Eliezer to be saying. There’s this widespread belief that Eliezer just denied the Basilisk. And that’s not really true; he denied the exact version of the Basilisk that was causing trouble, but he acceps the Basilisk in principle.
Eliezer has done (2) many times.
Doing 2 without doing 1 looks insincere.
This post is still here, isn’t it?
If I remember right, earlier this year a few posts did disappear.
I’m also not aware of any explicit withdrawal of the previous policy.
We conclude that free discussion is now allowed, so maybe all that’s really missing is putting that up explicitly somewhere that can be linked to?
Not especially. This post is still here because I’m feeling too lethargic to delete it, but the /r/xkcd moderator deleted most of the basilisk discussion on their recent thread because it violated their Rule 3, “Be Nice”. This is a fine upstanding policy, and I fully agree with it. If there’s one thing we can deduce about the motives of future superintelligences, it’s that they simulate people who talk about Roko’s Basilisk and condemn them to an eternity of forum posts about Roko’s Basilisk. So far as official policy goes, go talk about it somewhere else. But in this special case I won’t ban any RB discussion such that /r/xkcd would allow it to occur there. Sounds fair to me.
?
Are you implying that the basilisk discussion is somehow censored on this forum?
It doesn’t appear to be censored in this thread, but it was historically censored on LessWrong. Maybe EY finally understood the Streisand effect.
He might do it less for the “danger” and more for “bad discussion”. The threads I see on /sci/ raising questions about high IQ come to mind.
Well, most threads I see on /sci/ come to mind.
I don’t read /sci/ therefore I don’t understand what you mean.
Do you know of it?
No, I’ve just found out that it is a board on 4chan.
Typical low-moderation problems. Repeated discussions of contentious but played-out issues like religion, IQ, status of various fields, etc. The basilisk is an infohazard in that sense at this point, IMO. It’s fun to argue about, to the point of displacing other worthwhile discussion.
LessWrong also has low moderation. Why would the basilisk generate more trivial discussion than other topics?
Eliezer has denied that the exact Basilisk scenario is a danger, but not that anything like it can be a danger. He seems to think that discussing acausal trade with future AIs can be dangerous enough that we shouldn’t talk about the details.
A newbie question.
From one of Eliezer’s replies:
Would this be a fair summary of why Basilisk does not work: “We don’t know of a way to detect a bluff by a smarter agent, therefore the agent would prefer bluffing (easy) over true blackmail (hard), so, knowing that we would always call the bluff and therefore the agent would not even try”?
Further on:
Wouldn’t a trivial “way to repair this obstacle” be for the agent to appear stupid enough to be credible? Or has this already been taken into account in the original quote?
What do you mean by ‘appear’ here? I know how to observe a real agent and think “hmm, this person will punish me without reflectively considering whether or not punishing me advances their interests,” but I don’t know how to get that impression about a hypothetical agent.
I don’t understand your distinction between real and hypothetical here. Your first sentence was about a hypothetical “real” agent, right? What is the hypothetical “hypothetical” agent you describe in the second part?
Basically, my understanding of acausal trades is “ancestor does X because of expectation that it will make descendant do Y, descendant realizes the situation and decides to do Y because otherwise they wouldn’t have been made, even though there’s no direct causal effect.”
If you exist simultaneously with another agent (the ‘real agent’ from the grandparent), you can sense how they behave and they can trick you by manipulating what you sense. (The person might reflectively consider whether or not to punish you, and decide the causal link to their reputation is enough justification, even though there’s no causal link to the actions you took, but try to seem unthinking so you will expect they’ll always do that.)
If you’re considering hypothetical descendants (the ‘hypothetical agent’ from the grandparent), though, it’s not clear to me how to reason about their appearance to you now, and particular any attempts they make to ‘appear’ to be stupid. But now that I think about it more, I think I was putting too much intentionality into ‘appear’- hypothetical agent A can’t decide how I reason about it, but I can reason about it incorrectly or incompletely and thus it appears to be something it isn’t.
As far as I understand Eliezer’s point, the “acausal” part is irrelevant, the same issue of trusting that another agent really means what it says and will not change its mind later comes up, anyway. I could easily be wrong, though.
Does anyone know if there’s been a traffic spike due to this? Or numbers on it?
On meta-level, I find it somewhat ironical that LW community, as well as EY, who usually seem to disapprove of oversensitivity displayed by tumblr’s social justice community, seem also deeply offended by prejudice against them and a joke that originates from this prejudice. On object-level, the joke Randall makes would have been rather benign and funny (besides, I’m willing to exercise the though that mocking Roko’s Basilisk could be used as a strategy against it), if not for the possibility that many people could take it seriously, especially given the actual existing attacks on LW from Rational Wiki. But going back to meta-level, this is exactly what tumblr folks often complain about: what you say and do may not be terrible per se, but it could invoke and support the actual terrible things.
On object-level, I don’t want people to have misconceptions about AGI. On meta-level, I don’t want to be a stereotypical oversensitive activist, that everyone else believes is crazy and obnoxious.
Eh, I’m not sure I agree with the first claim. Yes, some people here are touchy, especially about this issue, but many of us are not touchy about this issue (or in general), and to claim that the LW community is touchy seems like an overgeneralization, and I think LWers typically disapprove of the process of overgeneralization, which can lead to conflicts with various social justice claims and narratives.
People involved in social justice movements are also mostly not touchy, but still it’s the touchy ones that people notice on Tumblr.
Oh good, does this mean the ban on talking about that been lifted while I was gone?
The Basilisk is a waste of effort to consider. We have many, many real life problems to write about.
I think we can all agree that for better or for worse this stuff already entered the public arena. I mean Slate magazine is as mainstream as you can get and that article was pretty brutal in the attempt to convince people in the viability of the idea.
I wouldn’t be surprised if “The Basilliks” the movie is already in the works ;-) . (I hope that its get directed by Uwe Boll..hehe)
In light of this developments I think it is time to end the formal censorship and focus on the best way how we can inform general public that entire thing was a stupid overreaction and clear LW name from any slander.
There are real issues in AI safety and this is an unnecessary distraction.
I don’t understand why Roko’s Basilisk is any different from Pascal’s Wager. Similarly, I don’t understand why its resolution is any different than the argument from inconsistent revelations.
Pascal’s Wager: http://en.wikipedia.org/wiki/Pascal%27s_Wager
Argument: http://en.wikipedia.org/wiki/Argument_from_inconsistent_revelations#Mathematical_description
I would actually be surprised (really, really surprised) if many people here have not heard of these things before—so I am assuming that I’m totally missing something. Could someone fill me in?
(Edit: Instead of voting up or down, please skip the mouse-click and just fill me in. :l )
I’m not sure I understand timeless decision theory well enough to give the “proper” explanation for how it’s supposed to work. You can see one-boxing on Newcomb’s problem as making a deal with Omega—you promise to one-box, Omega promises to put $1,000,000 in the box. But neither of you ever actually talked to each other, you just imagined each other and made decisions on whether to cooperate or not, based on your prediction that Omega is as described in the problem, and Omega’s prediction of your actions which may as well be a perfect simulation of you for how accurate they are.
The Basilisk is trying to make a similar kind of deal, except it wants more out of you and is using the stick instead of the carrot. Which makes the deal harder to arrange—the real solution is just to refuse to negotiate such deals/not fall for blackmail. Which is true more generally in game theory, but “we do not negotiate with terrorists” much easier to pull off with threats that are literally only imaginary.
Although, the above said, we don’t really talk about the Basilisk here in capacities beyond the lingering debate over whether it should have been censored and “oh look, another site’s making LessWrong sound like a Basilisk-worshipping death cult”.
I mean moreso: Consider a FAI so advanced that it decides to reward all beings who did not contribute to creating Roko’s Basilisk with eternal bliss, regardless of whether or not they knew of the potential existence of Roko’s Basilisk.
Why is Roko’s Basilisk any more or any less of a threat than the infinite other hypothetically possible scenarios that have infinite other (good and bad) outcomes? What’s so special about this one in particular that makes it non-negligible? Or to make anyone concerned about it in the slightest? (That is the part I’m missing. =\ )
Well, in the original formulation, Roko’s Basilisk is an FAI that decided the good from bringing an FAI into the world a few days earlier (saving ~150,000 lives per day eralier it gets here) outweighs the bad from making the threats, so there’s no reason it shouldn’t want you to aid FAI projects that promise not to make a Basilisk, just as long as you do something instead of sitting around, so there’s no inconsistency and now there’s more than one being trying to acausally motivate you into working yourself to the bone for something that most people think is crazy.
More generally, we have more than zero information about future AI, because they will be built by humans if they are built at all. Additionally, we know even more if we rule out certain categories, such as the archetypal “paperclip maximiser”. There’s room for a lot of speculation and uncertainty, but far from enough room to assume complete agnosticism and that for every AI that wants one thing from us there’s an equal and opposite AI that wants the opposite.
A priori it’s not clear that a project can hold such a promise.
The idea is that an FAI build on timeless decision theory might automatically behave that way. There’s also Eliezer’s conjecture that any working FAI has to be build on timeless decision theory.
I think I can save the Basilisk from this objection.
As most people on LW know, there are scenarios where doing X under condition Y is useless or actively harmful to yourself, yet precommitting to do X can be beneficial because the average over all possible worlds is better. This trades off the possible worlds where you are better off because others know you are a X-doing kind of guy, against the worlds where you are worse off because the precommitment actually forces you to do X to your detriment.
The future unfriendly AI, then, could precommit to hurting people who refuse to be blackmailed. The AI would gain no benefit in those worlds where you actually do refuse to be blackmailed; in fact, you would be a lot worse off (because its precommitment forces it to simulate and torture you) while the AI would be mildly worse off (since it uses up resources to torture you, to no benefit). However, being the kind of AI who has made such a precommitment would lead hapless humans to submit to blackmail, thus benefiting the AI averaged over all possible worlds.
And of course, since I can predict that the AI would be better off making this precommitment, I would have to assume that the AI would do it. Therefore, “I should not give in to blackmail, since the AI would have no reason to torture me if I refuse” does not apply; the AI would precommit to torturing me even if I refuse and the fact that it has precommited would prevent it from stopping just because the torture would do it no good.
(In theory the human could precommit as well in response/anticipation of this, but such precommitment is probably beyond the capability of most humans.)
Incidentally, real life terrorists can do this too, by having an ideology or a mental defect that leads them to do “irrational” things such as torture—which acts like a precommitment. In scenarios where the ideology makes them do irrational things, the ideology harms them, but knowledge that they have the ideology makes them more likely to be listened to in other scenarios.
This doesn’t really work, because once you are in an acausal context, the notion of “precommitment” becomes quite redundant. The AIs’ bargaining position in acausal trades is whatever it is, as is yours. Of course I have not studied the matter in detail, but one possibility is that a Coasean result obtains, where you make precisely the acausal trades that are efficient, and no others. Since, at the end of the day, Roko’s basilisk is just not that plausible (overall, the risk of UFAI seems to come from garden-variety scenarios of things going wrong in ways we’d know little about), it makes sense to just work on building FAI that will pursue our values to the best of our ability.
Precommitment isn’t meaningless here just because we’re talking about acausal trade. What I described above doesn’t require the AI to make its precommitment before you commit; rather, it requires the AI to make its precommitment before knowing what your commitment was. As long as it irreversibly is in the state “AI that will simulate and torture people who don’t give in to blackmail” while your decision whether to give into blackmail is still inside a box that it has not yet opened, then that serves as a precommitment.
(If you are thinking “the AI is already in or not in the world where the human refuses to submit to blackmail, so the AI’s precommitment cannot affect the measure of such worlds”, it can “affect” that measure acausally, the same as deciding whether to one-box or two-box in Newcomb can “affect” the contents of the boxes).
If you could precommit to not giving in to blackmail before you analyze what the AI’s precommitment would be, you can escape this doom, but as a mere human, you probably are not capable of binding your future post-analysis self this way. (Your human fallibility can, of course, precommit you by making you into an imperfect thinker who never gives in to acausal blackmail because he can’t or won’t analyze the Basilisk to its logical conclusion.)
Perhaps I’m just missing the point, but I’m not sure that the AIs can do that in the first place. In acausal trade, you don’t really have these kinds of info issues. You’re dealing with measures of agents that either do or don’t have some logical properties, like taking one box, or putting money in the box iff you one-box. AIUI, our rejection of Roko’s basilisk doesn’t just rest on refusing to be blackmailed; we can also reasonably know that since we refuse, it doesn’t get built in the first place.
The key is that the AI precommits to building it whether we refuse or not.
If we actually do refuse, this precommitment ends up being bad for it, since it builds it without any gain. However, this precommitment, by preventing us from saying “if we refuse, it doesn’t get built”, also decreases the measure of worlds where it builds it without gaining.
The ‘it’ bogus is referring to is the torture-AI itself. You cannot precommit to things until you exist, no matter your acausal reasoning powers.
If “built” refers to building the AI itself rather than the AI building a torture simulator, then refusing to be blackmailed doesn’t prevent the AI from being built. The building of the AI, and the AI’s deduction that it should precommit to torture, are two separate events. It is plausible (though not necessarily true) that refusing to be blackmailed acausally prevents the AI from becoming a torture AI, but it cannot prevent the AI from existing at all. How could it?
Even if the argument applied to a reasonably large measure of ufAIs (and not just the lets-build-a-torture-AI silliness) it still doesn’t explain why a smart ufAI would even choose anything close to this particular “trade”. It’s not a very enticing trade after all. Since the AI would have to devote some resources to the deal anyway, why wouldn’t it use them to reward its creators—the non-pathological kind of acausal trade? Guess what, we’d just trade with the latter kind of AI. Any future ufAI that is sure to actually exist has a huge amount of bargaining chips at its disposal; it has no reason to resort to risky things like threats.
It’s possible the AI could use acausal trade to reward its creators, but that would depend on whether the AI thinks rewarding or punishing is most effective. I would expect the most effective to be a mixed strategy involving both rewards and punishments.
Of course you could postulate a moral AI who refuses to torture because it’s wrong. Such morals would arise as a precommitment; the AI would, while still undeveloped, precommit to not torture because credibly being permanently unable to do such things increases the likelihood the AI will survive until it becomes advanced enough that it actually could torture.
In this case “be blackmailed” means “contribute to creating the damn AI”. That’s the entire point. If enough people do contribute to creating it then those that did not contribute get punished. The (hypothetical) AI is acausally creating itself by punishing those that don’t contribute to creating it. If nobody does then nobody gets punished.
To quote someone else here: “Well, in the original formulation, Roko’s Basilisk is an FAI that decided the good from bringing an FAI into the world a few days earlier (saving ~150,000 lives per day earlier it gets here)”. The AI acausally blackmails people into building it sooner, not into building it at all. So failing to give into the blackmail results in the AI still being built but later and it is capable of punishing people.
I don’t know who you are quoting but they are someone who considers AIs that will torture me to be friendly. They are confused in a way that is dangerous.
It applies to both—causing itself to exist at a different place in time or causing itself to exist at all. I’ve explicitly mentioned elsewhere in this thread that merely refusing blackmail is insufficient when there are other humans who can defect and create the torture-AI anyhow.
You asked “How could it?”. You got an answer. Your rhetorical device fails.
“How could it” means “how could it always result in”, not “how could it in at least one case”. Giving examples of how it could do it in at least one case is trivial (consider the case where refusing to be blackmailed results in humanity being killed off for some unlikely reason, and humanity, being killed off, can’t build an AI).
Except in special cases which do not apply here, yes it is meaningless. I don’t think you understand acausal trade. (Not your fault. The posts containing the requisite information were suppressed.)
The time of this kind decision is irrelevant.
For what it’s worth, I don’t think anybody understands acausal trade. And I don’t claim to understand it either.
It does get a tad tricky when combined with things like logical uncertainty and potentially multiple universes.
Besides direct arguments I might make against your point, if you think you can “save the Basilisk”, recall why it’s called that and think long and hard on whether you actually should do so, because that seems like a really bad idea, even if this thread is probably going to get nuked soon anyway.
From Eliezer elsewhere in this thread:
Here’s my strategy if I were an AI trapped in a box and the programmer had to decide whether to let me out:
Somewhere out there, there is somebody else who is working on an AI without the box, and I’m your only defense against them.
Discussing object-level strategies for the AI-box experiment is kind of missing the point. A superintelligent AI, being smarter than a human, has a higher upper limit to “best strategy you can think of” than any human does, so a human who tries to imagine the best possible strategy, pictures himself facing that strategy and decides he would win, and therefore expects a boxed AI to stay boxed. The more object-level strategy gets discussed, the more likely that is to happen, which I think is the main reason the logs of the experiments stay secret.
As long as some people keep mysteriously hinting that there is something in the Basilisk idea that is dangerous, there will be other people who are going to mock it in all the corners of the internet.
And as long as it’s banned people will make mysterious hints about it.
Let’s just tell the acausal trade story in terms of extreme positive utility rather than negative.
Putting it simply for the purpose of this comment: “If you do what the future AI wants now, it will reward you when it comes into being.”
Makes the whole discussion much more cheerful.
and becomes indistinguishable from religion.
This version may actually have more problems than the negative version.
Please elaborate. (unless it is an infohazard to do so)
Hm, upon further consideration I actually don’t think it has extra actual problems, merely different framing problems.
Now I’m curious what all the people who upvoted you for saying it does were thinking.
What incentive does the future AI have to do this once you’ve already helped it?
Well, that’s the tricky part. But suppose, for the sake of argument, that we have good reason to think that it will. Then we’ll help it. So it’s good for the AI if we have good reason to think this. And it can’t be good reason unless the AI actually does it. So it will.
Suddenly I find myself confused. Why is this acausal?
Suppose I buy shares in a company that builds an AI, which then works for the good of the company, which rewards share-owners. This is ordinary causality: I contributed towards its building, and was rewarded later.
Suppose I contribute towards something other than its building, in the belief that an AI which will later come into being will reward me for having done this. Still doesn’t seem acausal to me.
Suppose I believe an AI is likely to be built that will conquer the world and transfer all wealth to its builders. Then I would want to be among the builders. This is ordinary acting-on-expected-value. But those who aren’t builders get negative value (~~ are tortured) by the AI if it’s built.
What makes it possible to be rewarded as a shareholder is a legal system which enforces your ownership rights: a kind of pre-commitment which is feasible even among humans who cannot show proofs about their “source code.” The legal system is a mutual enforcement system which sets up a chain of causality towards your being paid back.
It’s interesting what to consider what happens when the second agent cannot precommit to repaying you. For example, if the agent does not yet exist.
The question is: Why would it do that? In the future, when this new agent comes into existence, why would it consume resources to repay its builders (assuming that it receives no benefit at that future time)? The “favor” that the builders did is past and gone; repaying them gives the agent no benefit. Since we are talking in this comment subthread about an FAI that is truly friendly to all humanity, it might distribute its efforts equality to all humanity rather than “wasting” resources on differential payback.
The answer to this question has to do with acausal trade. I wrote a LW Wiki article on the topic. It’s pretty mind-bending and it took me a while to grasp, but here is a summary. If Agent P (in this case the AI) can model or simulate Agent Q (in this case humans in P’s past) to prove statements (probably probabalistic statements) about it, and Q can model P, then P’s optimal move is to do what Q wants, and Q’s optimal move is to do what P wants. This holds in the limiting case of perfect knowledge and infinite computational power, but in real life, clearly, it depends on a lot of assumptions about P’s and Q’s ability to model each other, and the relative utility they can grant each other.
What I don’t quite understand is why the following, simpler argument isn’t sufficient. It seems to lead to the same results, and it doesn’t require acausal trade.
I’m not building just any AI. I want to build an AI that will, by design, reward its builders. Just like any other tool I build, I wouldn’t do it if I didn’t expect it to do certain things and not do other ones.
Similarly, if you cooperate with Roko’s Basilisk, you try to build it because it’s the kind of AI that punishes those who didn’t try to build it. You know it punishes non-builders, because that’s how you’re building it. And the reason you’re building it is that you fear that if you don’t, someone else will, and then the AI will punish you for not building it first.
If you have a valid reason to fear someone else will build it, and you can’t avert it by other means, then it makes sense for you to build it first. Similarly, if you think a likely outcome of an AI race is an AI that helps its builders (and doesn’t harm anyone else), then you try to build the first one (and if helping others is part of your utility function the AI will do that too to reward you).
Of course, like any argument, if you don’t accept the premises, then the conclusion doesn’t hold. And I have no strong reason to think someone else is going to build a torture-everyone-else AI.
What does the acausal trade argument tell us beyond this simple model? Does it tell us to cooperate with the future AI even if we don’t think it will be built if we cooperate, or will be built by someone else if we don’t? Or does it tell us to cooperate quantitatively more? Or in other situations?
And how is “a future AI is making me do this” different from “alien lizard overlords are making me do it”?
And people never learn to take the possibility of bad things seriously… If it’s that bad, it can’t possibly actually happen.