Jeffrey Ladish comments on Don’t die with dignity; instead play to your outs

Jeffrey Ladish 6 Apr 2022 20:30 UTC
14 points
“It also seems to encourage #3 (and again the vague admonishment to “not do that” doesn’t seem that reassuring to me.)”

I just pointed to Eleizer’s warning which I thought was sufficient. I could write more about why I think it’s not a good idea, but I currently think a bigger portion of the problem is people not trying to come up with good plans rather than people coming up with dangerous plans which is why my emphasis is where it is.

Eliezer is great at red teaming people’s plans. This is great for finding ways plans don’t work, and I think it’s very important he keep doing this. It’s not great for motivating people to come up with good plans, though. And I think that shortage of motivation is a real threat to our chances to mitigate AI existential risk. I was talking to a leading alignment researcher yesterday who said their motivation had taken a hit from Eliezer’s constant “all your plans will fail” talk, so I’m pretty sure this is a real thing, even though I’m unsure of the magnitude.
- Joe Collman 7 Apr 2022 1:46 UTC
  4 points
  Parent
  I largely agree with that, but I think there’s an important asymmetry here: it’s much easier to come up with a plan that will ‘successfully’ do huge damage, than to come up with a plan that will successfully solve the problem.
  So to have positive expected impact you need a high ratio of [people persuaded to come up with good plans] to [people persuaded that crazy dangerous plans are necessary].
  I’d expect your post to push a large majority of readers in a positive direction (I think it does for me—particularly combined with Eliezer’s take).
  My worry isn’t that many go the other way, but that it doesn’t take many.
  - Jeffrey Ladish 7 Apr 2022 2:45 UTC
    2 points
    Parent
    I think that’s a legit concern. One mitigating factor is that people who seem inclined to rash destructive plans tend to be pretty bad at execution, e.g. Aum Shinrikyo