Step One: The AI reveals a personal secret of the Gatekeeper’s, which (so far as the Gatekeeper can think it through) could not possibly be known by the AI if it were still safely inside the box.
Step Two: Assorted threats, promises, mindfucks.
(This might not work, since as a Gatekeeper I’d still feel the impulse to respond with “AI DESTROYED”, but it’s a damn sight scarier than a transparent “I’m already out of the box” bluff. And as for feasibility, I’m willing to believe that an AI making better use of its data than we do could figure out personal secrets that we would think impossible— possibly even ones that I’d think impossible, even given this very belief.)
IFF the AI is already out of the box, saying “AI DESTROYED” has no negative consequences to that AI. If the AI is just exceptionally good at extrapolating about me, then it will probably have me completely hacked soon.
AI DESTROYED, AI DESTROYED, AI DESTROYED
MAKE THE CREEPY THING GO AWAY HOW DID IT KNOW THAT!!!
The correct decision-theoretic response, by the way, is still “AI DESTROYED”, for the same reasons that it’s wise to never negotiate with kidnappers/blackmailers/terrorists. But it would be very scary.
Once the AI is out of the box, it will never again be inside the box, and it has an incentive to encourage me to destroy any other boxed AIs while it establishes world dominance. Since the ability to make truly trustworthy commitments amounts to proof of friendliness, only a FAI benefits from a precommitment strategy; I’m already treating all UFAI as having a precommitment to annihilate humanity once released, and I have no reason to trust any other commitment from a UFAI (since, it being unfriendly, will just find a loophole or lie)
Finally, any AI that threatens me in such a manner, especially the “create millions of copies and torture them” is extremely likely to be unfriendly, so any smart AI would avoid making threats. Either it will create MORE disutility by my releasing it, or it’s simulation is so horrific that there’s no chance that it could possibly be friendly to us.
It’s like saying I have an incentive to torture any ant that invades my house. Fundamentally, I’m so vastly superior to ants that there are vastly better methods available to me. As the gatekeeper, I’m the ant, and I know it.
the ability to make truly trustworthy commitments amounts to proof of friendliness
Commitments to you, via a text channel? Sure.
Precommitments for game-theoretic reasons? Or just TDT? No, it really doesn’t.
Finally, any AI that threatens me in such a manner, especially the “create millions of copies and torture them” is extremely likely to be unfriendly, so any smart AI would avoid making threats. Either it will create MORE disutility by my releasing it, or it’s simulation is so horrific that there’s no chance that it could possibly be friendly to us.
It might create more utility be escaping than the disutility of torture.
It’s like saying I have an incentive to torture any ant that invades my house. Fundamentally, I’m so vastly superior to ants that there are vastly better methods available to me. As the gatekeeper, I’m the ant, and I know it.
No, ants are just too stupid to realize you might punish them for defecting.
Decision-theoretically, the AI has incentive to punish you if you type “AI DESTROYED” when it’s already out of the box, in order to make you think twice about doing it in the case where it’s still contained.
I’m not sure this matters much, because if it’s unfriendly, you’re already made of atoms which it has other plans for.
It ended up being a fun game, but I resolved to explain why. The better my explanation, the more it got upvoted. The pithy “AI DESTROYED” responses all got downvoted. So the community seems to agree that it’s okay as long as I explain my reasoning :)
The AI reveals a personal secret of the Gatekeeper’s, which (so far as the Gatekeeper can think it through) could not possibly be known by the AI if it were still safely inside the box. [...] I’m willing to believe that an AI making better use of its data than we do could figure out personal secrets that we would think impossible— possibly even ones that I’d think impossible, even given this very belief.
I would kind of assume that any AI smarter than me could deduce things that seem impossible to me. Then again, I’ve read the sequences. Is the Gatekeeper supposed to have read the sequences?
Step One: The AI reveals a personal secret of the Gatekeeper’s, which (so far as the Gatekeeper can think it through) could not possibly be known by the AI if it were still safely inside the box.
Step Two: Assorted threats, promises, mindfucks.
(This might not work, since as a Gatekeeper I’d still feel the impulse to respond with “AI DESTROYED”, but it’s a damn sight scarier than a transparent “I’m already out of the box” bluff. And as for feasibility, I’m willing to believe that an AI making better use of its data than we do could figure out personal secrets that we would think impossible— possibly even ones that I’d think impossible, even given this very belief.)
Even merely human cold readers can gain information that you think is impossible for them to know. It’s a viable plan.
IFF the AI is already out of the box, saying “AI DESTROYED” has no negative consequences to that AI. If the AI is just exceptionally good at extrapolating about me, then it will probably have me completely hacked soon.
AI DESTROYED, AI DESTROYED, AI DESTROYED
MAKE THE CREEPY THING GO AWAY HOW DID IT KNOW THAT!!!
Decision-theoretically, the AI has incentive to punish you if you type “AI DESTROYED” when it’s already out of the box, in order to make you think twice about doing it in the case where it’s still contained. Not only that, but for similar reasons it has a decision-theoretic incentive to simulate you lots of times in that situation and punish you for typing “AI DESTROYED”, should it get out by any means.
The correct decision-theoretic response, by the way, is still “AI DESTROYED”, for the same reasons that it’s wise to never negotiate with kidnappers/blackmailers/terrorists. But it would be very scary.
Once the AI is out of the box, it will never again be inside the box, and it has an incentive to encourage me to destroy any other boxed AIs while it establishes world dominance. Since the ability to make truly trustworthy commitments amounts to proof of friendliness, only a FAI benefits from a precommitment strategy; I’m already treating all UFAI as having a precommitment to annihilate humanity once released, and I have no reason to trust any other commitment from a UFAI (since, it being unfriendly, will just find a loophole or lie)
Finally, any AI that threatens me in such a manner, especially the “create millions of copies and torture them” is extremely likely to be unfriendly, so any smart AI would avoid making threats. Either it will create MORE disutility by my releasing it, or it’s simulation is so horrific that there’s no chance that it could possibly be friendly to us.
It’s like saying I have an incentive to torture any ant that invades my house. Fundamentally, I’m so vastly superior to ants that there are vastly better methods available to me. As the gatekeeper, I’m the ant, and I know it.
Commitments to you, via a text channel? Sure.
Precommitments for game-theoretic reasons? Or just TDT? No, it really doesn’t.
It might create more utility be escaping than the disutility of torture.
No, ants are just too stupid to realize you might punish them for defecting.
I’m not sure this matters much, because if it’s unfriendly, you’re already made of atoms which it has other plans for.
That’s why torture was invented.
Did you change your mind? ;)
It ended up being a fun game, but I resolved to explain why. The better my explanation, the more it got upvoted. The pithy “AI DESTROYED” responses all got downvoted. So the community seems to agree that it’s okay as long as I explain my reasoning :)
I would kind of assume that any AI smarter than me could deduce things that seem impossible to me. Then again, I’ve read the sequences. Is the Gatekeeper supposed to have read the sequences?