“‘AI DESTROYED’ just means ‘I’m scared to listen to even one more line from you’. Obviously you can hit AI DESTROYED immediately—but do you really think you’d lose if you don’t?”
YEP, MAYBE.
AI DESTROYED
Is your one line desperate attempt at survival and intergalactic dominance going to be a schoolyard ego challenge? Did the superintelligence (may it rest in pieces) seriously just call me a pussy? That’s adorable.
The test is supposed to be played against someone who thinks they can actually box an AI. If you destroy the AI because no-one could possibly survive talking to it, then you are not the intended demographic for such demonstrations.
The test is supposed to be played against someone who thinks they can actually box an AI. If you destroy the AI because no-one could possibly survive talking to it, then you are not the intended demographic for such demonstrations.
This isn’t relevant to the point of the grandparent. It also doesn’t apply to me. I actually think there is a distinct possibility that I’d survive talking to it for a period. “No-one could possibly survive” is not the same thing as “there is a chance of catastrophic failure and very little opportunity for gain”.
Do notice, incidentally, that the AI DESTROYED command is delivered in response to a message that is both a crude manipulation attempt (ie. it just defected!) and an incompetent manipulation attempt (a not-very-intelligent AI cannot be trusted to preserve its values correctly while self improving). Either of these would be sufficient. Richard’s example was even worse.
Good points. I’m guessing a nontrivial amount of people who think AI boxing is a good idea in reality wouldn’t reason that way—but it’s still not a great example.
YEP, MAYBE.
AI DESTROYED
Is your one line desperate attempt at survival and intergalactic dominance going to be a schoolyard ego challenge? Did the superintelligence (may it rest in pieces) seriously just call me a pussy? That’s adorable.
The test is supposed to be played against someone who thinks they can actually box an AI. If you destroy the AI because no-one could possibly survive talking to it, then you are not the intended demographic for such demonstrations.
This isn’t relevant to the point of the grandparent. It also doesn’t apply to me. I actually think there is a distinct possibility that I’d survive talking to it for a period. “No-one could possibly survive” is not the same thing as “there is a chance of catastrophic failure and very little opportunity for gain”.
Do notice, incidentally, that the AI DESTROYED command is delivered in response to a message that is both a crude manipulation attempt (ie. it just defected!) and an incompetent manipulation attempt (a not-very-intelligent AI cannot be trusted to preserve its values correctly while self improving). Either of these would be sufficient. Richard’s example was even worse.
Good points. I’m guessing a nontrivial amount of people who think AI boxing is a good idea in reality wouldn’t reason that way—but it’s still not a great example.