“It also seems to encourage #3 (and again the vague admonishment to “not do that” doesn’t seem that reassuring to me.)”
I just pointed to Eleizer’s warning which I thought was sufficient. I could write more about why I think it’s not a good idea, but I currently think a bigger portion of the problem is people not trying to come up with good plans rather than people coming up with dangerous plans which is why my emphasis is where it is.
Eliezer is great at red teaming people’s plans. This is great for finding ways plans don’t work, and I think it’s very important he keep doing this. It’s not great for motivating people to come up with good plans, though. And I think that shortage of motivation is a real threat to our chances to mitigate AI existential risk. I was talking to a leading alignment researcher yesterday who said their motivation had taken a hit from Eliezer’s constant “all your plans will fail” talk, so I’m pretty sure this is a real thing, even though I’m unsure of the magnitude.
I largely agree with that, but I think there’s an important asymmetry here: it’s much easier to come up with a plan that will ‘successfully’ do huge damage, than to come up with a plan that will successfully solve the problem.
So to have positive expected impact you need a high ratio of [people persuaded to come up with good plans] to [people persuaded that crazy dangerous plans are necessary].
I’d expect your post to push a large majority of readers in a positive direction (I think it does for me—particularly combined with Eliezer’s take). My worry isn’t that many go the other way, but that it doesn’t take many.
I think that’s a legit concern. One mitigating factor is that people who seem inclined to rash destructive plans tend to be pretty bad at execution, e.g. Aum Shinrikyo
“It also seems to encourage #3 (and again the vague admonishment to “not do that” doesn’t seem that reassuring to me.)”
I just pointed to Eleizer’s warning which I thought was sufficient. I could write more about why I think it’s not a good idea, but I currently think a bigger portion of the problem is people not trying to come up with good plans rather than people coming up with dangerous plans which is why my emphasis is where it is.
Eliezer is great at red teaming people’s plans. This is great for finding ways plans don’t work, and I think it’s very important he keep doing this. It’s not great for motivating people to come up with good plans, though. And I think that shortage of motivation is a real threat to our chances to mitigate AI existential risk. I was talking to a leading alignment researcher yesterday who said their motivation had taken a hit from Eliezer’s constant “all your plans will fail” talk, so I’m pretty sure this is a real thing, even though I’m unsure of the magnitude.
I largely agree with that, but I think there’s an important asymmetry here: it’s much easier to come up with a plan that will ‘successfully’ do huge damage, than to come up with a plan that will successfully solve the problem.
So to have positive expected impact you need a high ratio of [people persuaded to come up with good plans] to [people persuaded that crazy dangerous plans are necessary].
I’d expect your post to push a large majority of readers in a positive direction (I think it does for me—particularly combined with Eliezer’s take).
My worry isn’t that many go the other way, but that it doesn’t take many.
I think that’s a legit concern. One mitigating factor is that people who seem inclined to rash destructive plans tend to be pretty bad at execution, e.g. Aum Shinrikyo