Good questions, these were really fun to think about / write up :)
First off let’s kill a background assumption that’s been messing up this discussion: that EY/SIAI/anyone needs a known policy toward credible threats.
It seems to me that stated policies to credible threats are irrational unless a large number of the people you encounter will change their behavior based on those policies. To put it simply: policies are posturing.
If an AI credibly threatened to destroy the world unless EY became a vegetarian for the rest of the day, and he was already driving to a BBQ, is eating meat the only rational thing for him to do? (It sure would prevent future credible threats!)
If EY planned on parking in what looked like an empty space near the entrance to his local supermarket, only to discover that on closer inspection it was a handicapped-only parking space (with a tow truck only 20 feet away), is getting his car towed the only rational thing to do? (If he didn’t an AI might find out his policy isn’t iron clad!)
This is ridiculous. It’s posturing. It’s clearly not optimal.
In answer to your question: Do the thing that’s actually best. The answer might be to give you 2x the resources. It depends on the situation: what SIAI/EY knows about you, about the likely effect of cooperating with you or not, and about the cost vs benefits of cooperating with you.
Maybe there’s a good chance that knowing you’ll get more resources makes you impatient for SIAI to make a FAI, causing you to donate more. Who knows. Depends on the situation.
(If the above doesn’t work when an AI is involved, how about EY makes a policy that only applies to AIs?)
In answer to your second paragraph I could withdraw my threat, but that would lessen my posturing power for future credible threats.
(har har...)
The real reason is I’m worried about what happens while I’m trying to convince him.
I’d love to discuss what sort of moderation is correct for a community like less wrong—it sounds amazing. Let’s do it.
But no way I’m taking the risk of undoing my fix until I’m sure EY’s (and LW’s) bugs are gone.
Good questions, these were really fun to think about / write up :)
First off let’s kill a background assumption that’s been messing up this discussion: that EY/SIAI/anyone needs a known policy toward credible threats.
It seems to me that stated policies to credible threats are irrational unless a large number of the people you encounter will change their behavior based on those policies. To put it simply: policies are posturing.
If an AI credibly threatened to destroy the world unless EY became a vegetarian for the rest of the day, and he was already driving to a BBQ, is eating meat the only rational thing for him to do? (It sure would prevent future credible threats!)
If EY planned on parking in what looked like an empty space near the entrance to his local supermarket, only to discover that on closer inspection it was a handicapped-only parking space (with a tow truck only 20 feet away), is getting his car towed the only rational thing to do? (If he didn’t an AI might find out his policy isn’t iron clad!)
This is ridiculous. It’s posturing. It’s clearly not optimal.
In answer to your question: Do the thing that’s actually best. The answer might be to give you 2x the resources. It depends on the situation: what SIAI/EY knows about you, about the likely effect of cooperating with you or not, and about the cost vs benefits of cooperating with you.
Maybe there’s a good chance that knowing you’ll get more resources makes you impatient for SIAI to make a FAI, causing you to donate more. Who knows. Depends on the situation.
(If the above doesn’t work when an AI is involved, how about EY makes a policy that only applies to AIs?)
In answer to your second paragraph I could withdraw my threat, but that would lessen my posturing power for future credible threats.
(har har...)
The real reason is I’m worried about what happens while I’m trying to convince him.
I’d love to discuss what sort of moderation is correct for a community like less wrong—it sounds amazing. Let’s do it.
But no way I’m taking the risk of undoing my fix until I’m sure EY’s (and LW’s) bugs are gone.