MugaSofer comments on AI box: AI has one shot at avoiding destruction—what might it say?

MugaSofer Jan 26, 2013, 8:29 PM
3 points

the ability to make truly trustworthy commitments amounts to proof of friendliness

Commitments to you, via a text channel? Sure.

Precommitments for game-theoretic reasons? Or just TDT? No, it really doesn’t.

Finally, any AI that threatens me in such a manner, especially the “create millions of copies and torture them” is extremely likely to be unfriendly, so any smart AI would avoid making threats. Either it will create MORE disutility by my releasing it, or it’s simulation is so horrific that there’s no chance that it could possibly be friendly to us.

It might create more utility be escaping than the disutility of torture.

It’s like saying I have an incentive to torture any ant that invades my house. Fundamentally, I’m so vastly superior to ants that there are vastly better methods available to me. As the gatekeeper, I’m the ant, and I know it.

No, ants are just too stupid to realize you might punish them for defecting.