20 Critiques of AI Safety That I Found on Twitter

dkirmani23 Jun 2022 19:23 UTC

21 points

In no particular order, here’s a collection of Twitter screenshots of people attacking AI Safety. A lot of them are poorly reasoned, and some of them are simply ad-hominem. Still, these types of tweets are influential, and are widely circulated among AI capabilities researchers.

1

2

3

4

5

(That one wasn’t actually a critique, but it did convey useful information about the state of AI Safety’s optics.)

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20 Conclusions

I originally intended to end this post with a call to action, but we mustn’t propose solutions immediately. In lieu of a specific proposal, I ask you, can the optics of AI safety be improved?

dkirmani23 Jun 2022 19:23 UTC

21 points

16 comments1 min readLW link

johnswentworth 23 Jun 2022 18:26 UTC
22 points
These… don’t seem that bad? I mean, given that they were selected for both (a) criticism and (b) being on twitter. Like, by twitter standards, if these examples are the typical case, it seems indicative of very unusually good PR for alignment.
DirectedEvolution 23 Jun 2022 16:27 UTC
18 points
It’s reassuring to see we’re somewhere between the “then they laugh at you” and “then they fight you” stage. I thought we were still mostly at “first they ignore you.”
mukashi 23 Jun 2022 21:37 UTC
16 points
I agree with many of those tweets. Many of them had actual good points.
- Lone Pine 24 Jun 2022 10:07 UTC
  10 points
  Parent
  I just learned I’m a based effective accelerationist.
David Scott Krueger (formerly: capybaralet) 6 Sep 2022 20:22 UTC
9 points
I suspect e/acc is some sort of sock puppet op / astroturfing or something, and I’ve been trying to avoid signal boosting them.
- dkirmani 7 Sep 2022 6:13 UTC
  1 point
  Parent
  I’m Twitter mutuals with some of these e/acc people. I think that its founders and most (if not all) of its proponents are organic accounts, but it still might be a good idea to not signal-boost them.
  - David Scott Krueger (formerly: capybaralet) 7 Sep 2022 12:10 UTC
    2 points
    Parent
    What makes you think that? Where do you think this is coming from? It seems to have arrived too quickly (well on to my radar at least) to be organic IMO, unless there is some real-world community involved, which there doesn’t seem to be?
    - David Scott Krueger (formerly: capybaralet) 7 Sep 2022 12:18 UTC
      4 points
      Parent
      Here’s a thing: https://beff.substack.com/p/notes-on-eacc-principles-and-tenets
      If you are interested in more content on e/acc, check out recent posts by Bayeslord and the original post by Swarthy, or follow any of us on Twitter as we often host Twitter spaces discussing these ideas.
      Neither the original post nor @bayeslord seem to exist anymore (I found this was also the case for another account named something like “Bitalik Vuterin” or something...) Seems fishy. I suspect at least that someone “e/acc” is trying to look like more people than they are. Not sure what the base rate for this sort of stuff is, though.
      - dkirmani 7 Sep 2022 16:17 UTC
        1 point
        Parent
        Much of why my priors say that the e/acc thing is organic is just my gestalt impression of being on Twitter while it was happening. Unfortunately, that’s not a legible source of evidence to people-who-aren’t-me. I’ll tell you what information I do remember, though:
        
        “Bitalik Vuterin” does not ring a bell, I don’t think he was a very consequential figure to begin with.
        @BasedBeffJezos claims to be the same person as @BasedBeff, and claims that he was locked out of his @BasedBeff account on 2022-08-08 due to “misinformation”, which he attributes to “blue checkmarks and scared EAs”.
        I’m only like 85% certain about this claim, but I think @bayeslord made an “I quit” thread where he claimed that he was actually kinda sympathetic to EA all along, and e/acc was more of a joke, or maybe that it was actually meant to strengthen EA by red-teaming it. I’m even less certain about this next claim (~55%), but I think he mentioned getting an overwhelming amount of DMs from EAs trying to debate him.
        David Scott Krueger (formerly: capybaralet) 7 Sep 2022 21:42 UTC
        2 points
        Parent
        “Bitalik Vuterin” does not ring a bell, I don’t think he was a very consequential figure to begin with.
        I think it was something else like that, not that.
Tomás B. 23 Jun 2022 15:39 UTC
9 points
I’m not sure it’s productive to engage with this stuff. Taking a GRAND STAND may feel good, but in many cases people end up becoming useful foils. Block liberally, don’t engage, focus on what actually matters.
- dkirmani 23 Jun 2022 16:11 UTC
  6 points
  Parent
  I’m not necessarily advocating for direct engagement! If engagement with this stuff won’t decrease AI risk, then I don’t want to engage. If it does, then I do. Some of these people/orgs are influential (Venkatesh Rao, HuggingFace), so unfortunately, their opinions do actually matter. As nice as it would feel to ignore the haters, public opinion is in fact a strategic asset when it comes to actually implementing AI safety proposals at major labs.
  - quanticle 25 Jun 2022 21:58 UTC
    2 points
    Parent
    
    Some of these people/orgs are influential (Venkatesh Rao, HuggingFace), so unfortunately, their opinions do actually matter.
    
    Do you have any evidence that Venkatesh Rao is influential? I’ve never seen him quoted by anyone outside the rationality community.
Charlie Steiner 23 Jun 2022 19:15 UTC
8 points
I wonder what I expected to get out of this post—after all, I already don’t use Twitter.
Yair Halberstadt 23 Jun 2022 17:22 UTC
6 points
I would expect you to be be able to find these tweets, and hundreds more like them no matter how good alignment optics is. A lot of people use Twitter, and I could probably find similar tweets about Mother Theresa or Princess Diana. As such showing this doesn’t actually tell us all that much TBH.
Dagon 23 Jun 2022 16:31 UTC
6 points
Practice rationalism on this. What predictions do you make, and what conditional predictions on whatever actions you’re advocating? It feels a little like you’re getting sucked into a status game by caring very much about who’s saying what, rather than steelmanning the critiques and deciding if members of the EA community (disclosure: I am not one—I’m not part of sneerclub, but I do see the cult-like aspects of the bay-area subculture) should do anything differently. As in, should you behave differently, separately from should you participate in the signaling and public conversations around this kind of thing for status purposes?

Note also that the criticism is not purely wrong. “Revealed preferences say a lot” is a pretty compelling point.