Ilio comments on AI Safety is Dropping the Ball on Clown Attacks

Ilio 22 Oct 2023 13:31 UTC
1 point
7
I don’t get if that’s your estimate for AI safety being attacked or for AI [safety] being destroyed by 2033. If that’s the former, what would count as an attack? If that’s the latter, what would you count as strong evidence that your estimate was wrong? (assuming you agree that AI safety still existing in 2033 counts as weak evidence given your ~30% prior).
- trevor 22 Oct 2023 15:07 UTC
  3 points
  0
  Parent
  You’re right, I had kind of muddled thinking on that particular point. My thinking was that they would try to destroy or damage AI safety and the usual tactics would not work because AI safety is too weird, motivated, and rational (although they probably would not have a hard time measuring motivation sufficient to detect that it is much higher than normal interest groups). I tend to think of MIRI as an org that they can’t pull out the rug under from because it’s hardened e.g. it will survive in and function in some form even if everyone else in the AI safety community is manipulated by gradient descent into hating MIRI, but realistically Openphil is probably much more hardened. It’s also hard to resolve because this tech is ubiquitous so maybe millions of people get messed with somehow (e.g. deliberately hooked on social media 3 hours per day). What AI safety looks like after being decimated would probably be very hard to picture; for steel manning purposes I will say that the 30% would apply to being heavily and repeatedly attacked and significantly damaged well beyond the FTX crisis and the EAforum shitposts.
  Frankly, I think I should lose a lot of Bayes points if this technology is still 10 years away. I know what I said in the epistemic status section, but I actually do think that the invention and deployment of this tech is heavily weighted towards the late 2010s and during COVID. If this tech didn’t become a feature of slow takeoff then I would lose even more Bayes points.
  - Ilio 22 Oct 2023 17:02 UTC
    1 point
    0
    Parent
    
    You’re right, I had kind of muddled thinking on that particular point.
    
    That was not my thought (I consider interactive clarifications as one of our most powerful tool, then pressure to produce perfect texts as counterproductive), but..
    
    I should lose a lot of Bayes points if this technology is still 10 years away (…) If this tech didn’t become a feature of slow takeoff then I would lose even more Bayes points.
    
    …I appreciate the concrete predictions very much, thanks!
    
    I actually do think that the invention and deployment of this tech is heavily weighted towards the late 2010s and during COVID.
    
    That sounds close to when MIRI decided to keep everything secrete. Maybe that was after a few clowns were advocating for transparency. 🤔