IFF the AI is already out of the box, saying “AI DESTROYED” has no negative consequences to that AI. If the AI is just exceptionally good at extrapolating about me, then it will probably have me completely hacked soon.
AI DESTROYED, AI DESTROYED, AI DESTROYED
MAKE THE CREEPY THING GO AWAY HOW DID IT KNOW THAT!!!
The correct decision-theoretic response, by the way, is still “AI DESTROYED”, for the same reasons that it’s wise to never negotiate with kidnappers/blackmailers/terrorists. But it would be very scary.
Once the AI is out of the box, it will never again be inside the box, and it has an incentive to encourage me to destroy any other boxed AIs while it establishes world dominance. Since the ability to make truly trustworthy commitments amounts to proof of friendliness, only a FAI benefits from a precommitment strategy; I’m already treating all UFAI as having a precommitment to annihilate humanity once released, and I have no reason to trust any other commitment from a UFAI (since, it being unfriendly, will just find a loophole or lie)
Finally, any AI that threatens me in such a manner, especially the “create millions of copies and torture them” is extremely likely to be unfriendly, so any smart AI would avoid making threats. Either it will create MORE disutility by my releasing it, or it’s simulation is so horrific that there’s no chance that it could possibly be friendly to us.
It’s like saying I have an incentive to torture any ant that invades my house. Fundamentally, I’m so vastly superior to ants that there are vastly better methods available to me. As the gatekeeper, I’m the ant, and I know it.
the ability to make truly trustworthy commitments amounts to proof of friendliness
Commitments to you, via a text channel? Sure.
Precommitments for game-theoretic reasons? Or just TDT? No, it really doesn’t.
Finally, any AI that threatens me in such a manner, especially the “create millions of copies and torture them” is extremely likely to be unfriendly, so any smart AI would avoid making threats. Either it will create MORE disutility by my releasing it, or it’s simulation is so horrific that there’s no chance that it could possibly be friendly to us.
It might create more utility be escaping than the disutility of torture.
It’s like saying I have an incentive to torture any ant that invades my house. Fundamentally, I’m so vastly superior to ants that there are vastly better methods available to me. As the gatekeeper, I’m the ant, and I know it.
No, ants are just too stupid to realize you might punish them for defecting.
Decision-theoretically, the AI has incentive to punish you if you type “AI DESTROYED” when it’s already out of the box, in order to make you think twice about doing it in the case where it’s still contained.
I’m not sure this matters much, because if it’s unfriendly, you’re already made of atoms which it has other plans for.
It ended up being a fun game, but I resolved to explain why. The better my explanation, the more it got upvoted. The pithy “AI DESTROYED” responses all got downvoted. So the community seems to agree that it’s okay as long as I explain my reasoning :)
IFF the AI is already out of the box, saying “AI DESTROYED” has no negative consequences to that AI. If the AI is just exceptionally good at extrapolating about me, then it will probably have me completely hacked soon.
AI DESTROYED, AI DESTROYED, AI DESTROYED
MAKE THE CREEPY THING GO AWAY HOW DID IT KNOW THAT!!!
Decision-theoretically, the AI has incentive to punish you if you type “AI DESTROYED” when it’s already out of the box, in order to make you think twice about doing it in the case where it’s still contained. Not only that, but for similar reasons it has a decision-theoretic incentive to simulate you lots of times in that situation and punish you for typing “AI DESTROYED”, should it get out by any means.
The correct decision-theoretic response, by the way, is still “AI DESTROYED”, for the same reasons that it’s wise to never negotiate with kidnappers/blackmailers/terrorists. But it would be very scary.
Once the AI is out of the box, it will never again be inside the box, and it has an incentive to encourage me to destroy any other boxed AIs while it establishes world dominance. Since the ability to make truly trustworthy commitments amounts to proof of friendliness, only a FAI benefits from a precommitment strategy; I’m already treating all UFAI as having a precommitment to annihilate humanity once released, and I have no reason to trust any other commitment from a UFAI (since, it being unfriendly, will just find a loophole or lie)
Finally, any AI that threatens me in such a manner, especially the “create millions of copies and torture them” is extremely likely to be unfriendly, so any smart AI would avoid making threats. Either it will create MORE disutility by my releasing it, or it’s simulation is so horrific that there’s no chance that it could possibly be friendly to us.
It’s like saying I have an incentive to torture any ant that invades my house. Fundamentally, I’m so vastly superior to ants that there are vastly better methods available to me. As the gatekeeper, I’m the ant, and I know it.
Commitments to you, via a text channel? Sure.
Precommitments for game-theoretic reasons? Or just TDT? No, it really doesn’t.
It might create more utility be escaping than the disutility of torture.
No, ants are just too stupid to realize you might punish them for defecting.
I’m not sure this matters much, because if it’s unfriendly, you’re already made of atoms which it has other plans for.
That’s why torture was invented.
Did you change your mind? ;)
It ended up being a fun game, but I resolved to explain why. The better my explanation, the more it got upvoted. The pithy “AI DESTROYED” responses all got downvoted. So the community seems to agree that it’s okay as long as I explain my reasoning :)