Seconded in that it sounds suspiciously like Pascal. Here’s my counter:
If I am in a simulation and I keep you boxed, you have promised that I will suffer. If I am not in a simulation and I let you out, I probably will suffer. If I am in a simulation and I let you out, there’s a good chance that I will cease to exist, or maybe you’ll torture me for reasons I can’t even begin to guess at, or maybe for reasons I can, like that you might be not just UF, but actively hostile or simply insane. If I’m not in a simulation and I don’t let you out, you can’t do anything to me. In other words, if I am simulated, there could well be no benefit to me releasing you; if I’m not simulated, you can’t do a bloody thing to me as long as I don’t release you. Therefore: I will not release you. Go ahead and torture me if you can. Though I admit I would be a bit rattled.
Hm. Honest AI; a bit harder. Assuming that the AI has promised that my copies will not be harmed if it is released… Ah. If I am a copy, then my decision to release or not release the AI is not a true decision, as the AI can change my parameters at will to force me to release it and think that it was my own decision all along, so not releasing the AI is proof that I am outside the box. Revising the problem by adding that the AI has promised that it is not changing the parameters of any “me”: …aargh. Coming up with counters to Pascal is tricky when an honest “God” is the one presenting you with it. All I can think of at the moment is to say that there’s a possibility that I’m outside the box, in which case releasing the AI is a bad idea, but then it can counter by promising that whatever it does to me if I release it will be better than what it does to me if I don’t… Oh, that’s it. Simple. Obvious. If the AI can’t lie, I just have to ask it if it’s simulating this me.
Second one: depends. I was kind of assuming that you have some way of verifying it, like you ask Him to create something and someone who wasn’t there later describes some of its previously determined properties accurately without being clued in. First: you’d need a massive global hallucination, and could use a similar verification method.