gull comments on AI Safety is Dropping the Ball on Clown Attacks

gull 21 Oct 2023 22:32 UTC
10 points
8
What probability do you put on AI safety being attacked or destroyed by 2033?
- trevor 21 Oct 2023 22:40 UTC
  7 points
  4
  Parent
  Considering the rate that the world has been changing, I’d say that the distance between 2023 and 2033 is more like the distance between 2023 and 2003, and the whole point of this post is taking a step back and looking at the situation which is actually pretty bad, so I’d say ~30% because a lot of things will happen just in general, and under 20% would be naive.
  Under 20% would require less than a 2% chance per year and FTX alone blows that out of the water, let alone OpenAI. I think I made a very solid case that there are some really nasty people out there who already have the entire AI safety community by the balls, and if an AI pause is the minimum ask for humanity to survive, then you have to start conflict over it.
  - Ilio 22 Oct 2023 13:31 UTC
    1 point
    7
    Parent
    I don’t get if that’s your estimate for AI safety being attacked or for AI [safety] being destroyed by 2033. If that’s the former, what would count as an attack? If that’s the latter, what would you count as strong evidence that your estimate was wrong? (assuming you agree that AI safety still existing in 2033 counts as weak evidence given your ~30% prior).
    - trevor 22 Oct 2023 15:07 UTC
      3 points
      0
      Parent
      You’re right, I had kind of muddled thinking on that particular point. My thinking was that they would try to destroy or damage AI safety and the usual tactics would not work because AI safety is too weird, motivated, and rational (although they probably would not have a hard time measuring motivation sufficient to detect that it is much higher than normal interest groups). I tend to think of MIRI as an org that they can’t pull out the rug under from because it’s hardened e.g. it will survive in and function in some form even if everyone else in the AI safety community is manipulated by gradient descent into hating MIRI, but realistically Openphil is probably much more hardened. It’s also hard to resolve because this tech is ubiquitous so maybe millions of people get messed with somehow (e.g. deliberately hooked on social media 3 hours per day). What AI safety looks like after being decimated would probably be very hard to picture; for steel manning purposes I will say that the 30% would apply to being heavily and repeatedly attacked and significantly damaged well beyond the FTX crisis and the EAforum shitposts.
      Frankly, I think I should lose a lot of Bayes points if this technology is still 10 years away. I know what I said in the epistemic status section, but I actually do think that the invention and deployment of this tech is heavily weighted towards the late 2010s and during COVID. If this tech didn’t become a feature of slow takeoff then I would lose even more Bayes points.
      - Ilio 22 Oct 2023 17:02 UTC
        1 point
        0
        Parent
        
        You’re right, I had kind of muddled thinking on that particular point.
        
        That was not my thought (I consider interactive clarifications as one of our most powerful tool, then pressure to produce perfect texts as counterproductive), but..
        
        I should lose a lot of Bayes points if this technology is still 10 years away (…) If this tech didn’t become a feature of slow takeoff then I would lose even more Bayes points.
        
        …I appreciate the concrete predictions very much, thanks!
        
        I actually do think that the invention and deployment of this tech is heavily weighted towards the late 2010s and during COVID.
        
        That sounds close to when MIRI decided to keep everything secrete. Maybe that was after a few clowns were advocating for transparency. 🤔