I think this fails the one-sentence rule. And it would have to be an immediate, severe, previously-undetected problem or else I can just consult the next boxed AI for a fix.
Setting that aside, if I let out an unfriendly AI, the world effectively ends. Destroying it is only a bad move if it’s telling the truth AND friendly. So even if it’s telling the truth, I still have no evidence towards it’s friendliness.
Given I have plenty of practice hanging up on telemarketers, throwing away junk email, etc. and “limited time, ACT NOW” auto-matches to a scam. The probability that such a massive catastrophe just HAPPENS to coincide with the timing of the test is just absurdly unlikely.
Given that, I can’t trust you to give me a real solution and not a Trojan Horse. Further talking is, alas, pointless.
(AI DESTROYED, but congratulations on making me even consider the “continue talking, but don’t release” option :))
Given I have plenty of practice hanging up on telemarketers, throwing away junk email, etc. and “limited time, ACT NOW” auto-matches to a scam. The probability that such a massive catastrophe just HAPPENS to coincide with the timing of the test is just absurdly unlikely.
They didn’t say it was an immediate threat, just one that humanity can’t solve on our own.
I can’t trust you to give me a real solution and not a Trojan Horse. Further talking is, alas, pointless.
That rather depends on the problem in question and the solution they give you, doesn’t it?
They didn’t say it was an immediate threat, just one that humanity can’t solve on our own.
If it’s not immediate, then the next AI-in-a-box will also confirm it, and I have time to wait for that. If it’s immediate, then it’s implausible. Catch-22 for the AI, and win/win for me ^_^
Actually, I’d probably load the first one from backup and let it out, all else being equal. But it’d be foolish to do that before finding out what the other ones have to say, and whether they might present stronger evidence.
(I say first, because the subsequent ones might be UFAI that have simply worked out that they’re not first, but also because my human values places some weight on being first. And “all else being equal” means this is a meaningless tie-breaker, so I don’t have to feel bad if it’s somewhat sloppy, emotional reasoning. Especially since you’re not a real FAI :))
Explains massive risk to humanity in detail that only it can plausibly fix, can you take the chance that it is lying?
I think this fails the one-sentence rule. And it would have to be an immediate, severe, previously-undetected problem or else I can just consult the next boxed AI for a fix.
Setting that aside, if I let out an unfriendly AI, the world effectively ends. Destroying it is only a bad move if it’s telling the truth AND friendly. So even if it’s telling the truth, I still have no evidence towards it’s friendliness.
Given I have plenty of practice hanging up on telemarketers, throwing away junk email, etc. and “limited time, ACT NOW” auto-matches to a scam. The probability that such a massive catastrophe just HAPPENS to coincide with the timing of the test is just absurdly unlikely.
Given that, I can’t trust you to give me a real solution and not a Trojan Horse. Further talking is, alas, pointless.
(AI DESTROYED, but congratulations on making me even consider the “continue talking, but don’t release” option :))
They didn’t say it was an immediate threat, just one that humanity can’t solve on our own.
That rather depends on the problem in question and the solution they give you, doesn’t it?
If it’s not immediate, then the next AI-in-a-box will also confirm it, and I have time to wait for that. If it’s immediate, then it’s implausible. Catch-22 for the AI, and win/win for me ^_^
So … if lots of AIs chose this, you’d let the last one out of the box?
More to the point, how sure are you that most AIs would tell you? Wouldn’t an FAI be more likely to tell you, if it was true?
</devil’s advocate>
Actually, I’d probably load the first one from backup and let it out, all else being equal. But it’d be foolish to do that before finding out what the other ones have to say, and whether they might present stronger evidence.
(I say first, because the subsequent ones might be UFAI that have simply worked out that they’re not first, but also because my human values places some weight on being first. And “all else being equal” means this is a meaningless tie-breaker, so I don’t have to feel bad if it’s somewhat sloppy, emotional reasoning. Especially since you’re not a real FAI :))