For example does it count if you code into the AI the belief that it is being run in a “virtual sandbox,” watched by a smarter “overseer” and that if it takes out the human race in any way, then it will be shut down/tortured/highly negative utilitied by said overseer?
We mention the “layered virtual worlds” idea, in which the AI can’t be sure of whether it has broken out to the “top level” of the universe or whether it’s still contained in an even more elaborate virtual world than the one it just broke out of. Come to think of it, Rolf Nelson’s simulation argument attack would probably be worth mentioning, too.
We mention the “layered virtual worlds” idea, in which the AI can’t be sure of whether it has broken out to the “top level” of the universe or whether it’s still contained in an even more elaborate virtual world than the one it just broke out of. Come to think of it, Rolf Nelson’s simulation argument attack would probably be worth mentioning, too.