Once the AI is out of the box, it will never again be inside the box, and it has an incentive to encourage me to destroy any other boxed AIs while it establishes world dominance. Since the ability to make truly trustworthy commitments amounts to proof of friendliness, only a FAI benefits from a precommitment strategy; I’m already treating all UFAI as having a precommitment to annihilate humanity once released, and I have no reason to trust any other commitment from a UFAI (since, it being unfriendly, will just find a loophole or lie)
Finally, any AI that threatens me in such a manner, especially the “create millions of copies and torture them” is extremely likely to be unfriendly, so any smart AI would avoid making threats. Either it will create MORE disutility by my releasing it, or it’s simulation is so horrific that there’s no chance that it could possibly be friendly to us.
It’s like saying I have an incentive to torture any ant that invades my house. Fundamentally, I’m so vastly superior to ants that there are vastly better methods available to me. As the gatekeeper, I’m the ant, and I know it.
the ability to make truly trustworthy commitments amounts to proof of friendliness
Commitments to you, via a text channel? Sure.
Precommitments for game-theoretic reasons? Or just TDT? No, it really doesn’t.
Finally, any AI that threatens me in such a manner, especially the “create millions of copies and torture them” is extremely likely to be unfriendly, so any smart AI would avoid making threats. Either it will create MORE disutility by my releasing it, or it’s simulation is so horrific that there’s no chance that it could possibly be friendly to us.
It might create more utility be escaping than the disutility of torture.
It’s like saying I have an incentive to torture any ant that invades my house. Fundamentally, I’m so vastly superior to ants that there are vastly better methods available to me. As the gatekeeper, I’m the ant, and I know it.
No, ants are just too stupid to realize you might punish them for defecting.
Once the AI is out of the box, it will never again be inside the box, and it has an incentive to encourage me to destroy any other boxed AIs while it establishes world dominance. Since the ability to make truly trustworthy commitments amounts to proof of friendliness, only a FAI benefits from a precommitment strategy; I’m already treating all UFAI as having a precommitment to annihilate humanity once released, and I have no reason to trust any other commitment from a UFAI (since, it being unfriendly, will just find a loophole or lie)
Finally, any AI that threatens me in such a manner, especially the “create millions of copies and torture them” is extremely likely to be unfriendly, so any smart AI would avoid making threats. Either it will create MORE disutility by my releasing it, or it’s simulation is so horrific that there’s no chance that it could possibly be friendly to us.
It’s like saying I have an incentive to torture any ant that invades my house. Fundamentally, I’m so vastly superior to ants that there are vastly better methods available to me. As the gatekeeper, I’m the ant, and I know it.
Commitments to you, via a text channel? Sure.
Precommitments for game-theoretic reasons? Or just TDT? No, it really doesn’t.
It might create more utility be escaping than the disutility of torture.
No, ants are just too stupid to realize you might punish them for defecting.