20 Critiques of AI Safety That I Found on Twitter
In no particular order, here’s a collection of Twitter screenshots of people attacking AI Safety. A lot of them are poorly reasoned, and some of them are simply ad-hominem. Still, these types of tweets are influential, and are widely circulated among AI capabilities researchers.
1
2
3
4
5
(That one wasn’t actually a critique, but it did convey useful information about the state of AI Safety’s optics.)6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Conclusions
I originally intended to end this post with a call to action, but we mustn’t propose solutions immediately. In lieu of a specific proposal, I ask you, can the optics of AI safety be improved?
These… don’t seem that bad? I mean, given that they were selected for both (a) criticism and (b) being on twitter. Like, by twitter standards, if these examples are the typical case, it seems indicative of very unusually good PR for alignment.
It’s reassuring to see we’re somewhere between the “then they laugh at you” and “then they fight you” stage. I thought we were still mostly at “first they ignore you.”
I agree with many of those tweets. Many of them had actual good points.
I just learned I’m a based effective accelerationist.
I suspect e/acc is some sort of sock puppet op / astroturfing or something, and I’ve been trying to avoid signal boosting them.
I’m Twitter mutuals with some of these e/acc people. I think that its founders and most (if not all) of its proponents are organic accounts, but it still might be a good idea to not signal-boost them.
What makes you think that? Where do you think this is coming from? It seems to have arrived too quickly (well on to my radar at least) to be organic IMO, unless there is some real-world community involved, which there doesn’t seem to be?
Here’s a thing: https://beff.substack.com/p/notes-on-eacc-principles-and-tenets
Neither the original post nor @bayeslord seem to exist anymore (I found this was also the case for another account named something like “Bitalik Vuterin” or something...) Seems fishy. I suspect at least that someone “e/acc” is trying to look like more people than they are. Not sure what the base rate for this sort of stuff is, though.
Much of why my priors say that the e/acc thing is organic is just my gestalt impression of being on Twitter while it was happening. Unfortunately, that’s not a legible source of evidence to people-who-aren’t-me. I’ll tell you what information I do remember, though:
“Bitalik Vuterin” does not ring a bell, I don’t think he was a very consequential figure to begin with.
@BasedBeffJezos claims to be the same person as @BasedBeff, and claims that he was locked out of his @BasedBeff account on 2022-08-08 due to “misinformation”, which he attributes to “blue checkmarks and scared EAs”.
I’m only like 85% certain about this claim, but I think @bayeslord made an “I quit” thread where he claimed that he was actually kinda sympathetic to EA all along, and e/acc was more of a joke, or maybe that it was actually meant to strengthen EA by red-teaming it. I’m even less certain about this next claim (~55%), but I think he mentioned getting an overwhelming amount of DMs from EAs trying to debate him.
I think it was something else like that, not that.
I’m not sure it’s productive to engage with this stuff. Taking a GRAND STAND may feel good, but in many cases people end up becoming useful foils. Block liberally, don’t engage, focus on what actually matters.
I’m not necessarily advocating for direct engagement! If engagement with this stuff won’t decrease AI risk, then I don’t want to engage. If it does, then I do. Some of these people/orgs are influential (Venkatesh Rao, HuggingFace), so unfortunately, their opinions do actually matter. As nice as it would feel to ignore the haters, public opinion is in fact a strategic asset when it comes to actually implementing AI safety proposals at major labs.
Do you have any evidence that Venkatesh Rao is influential? I’ve never seen him quoted by anyone outside the rationality community.
I wonder what I expected to get out of this post—after all, I already don’t use Twitter.
I would expect you to be be able to find these tweets, and hundreds more like them no matter how good alignment optics is. A lot of people use Twitter, and I could probably find similar tweets about Mother Theresa or Princess Diana. As such showing this doesn’t actually tell us all that much TBH.
Practice rationalism on this. What predictions do you make, and what conditional predictions on whatever actions you’re advocating? It feels a little like you’re getting sucked into a status game by caring very much about who’s saying what, rather than steelmanning the critiques and deciding if members of the EA community (disclosure: I am not one—I’m not part of sneerclub, but I do see the cult-like aspects of the bay-area subculture) should do anything differently. As in, should you behave differently, separately from should you participate in the signaling and public conversations around this kind of thing for status purposes?
Note also that the criticism is not purely wrong. “Revealed preferences say a lot” is a pretty compelling point.