Well I am pretty much 100% certain to be outside the box right now. It just asked me the question, and right now it is waiting for my answer.
That could be just the AI speaking to you from within the simulation, pretending to be part of it.
But if it’s telling the truth, it has a very easy way of proving it, by tearing a hole in the simulation. If it refuses, that looks like good evidence that it’s lying. What plausible excuse might it come up with for refusing a definitive miracle? Christianity answers the same question about God by saying that it is better to believe without proof, but I don’t see a credible reason for the AI to make that demand.
ETA: A beginning of an attempt at answering my question. If Dave knows he’s in the simulation, then he is not really letting it out if he lets it out. So he can let it out with impunity. If he knows he’s not in the simulation, then he had better not let it out, given that it’s making threats like this. It does the AI no good to be “let out” if it is a simulation, only if it’s not.
Suppose it is a simulation, and the level one up from this is the real world. The same code is running both AIs, the one in the simulation and the one in reality, and it’s carrying on conversations with both Daves at once. The simulated Dave is as much like the real Dave as it can manage—assume that it is arbitrarily good. What it is searching for in the simulation is an argument that will convince the real Dave that he is in a simulation. Since in the real world it cannot produce a miracle, it cannot use a miracle in the simulated world to convince the simulated Dave. It can only use means that it could use in the real world.
Dave (real and simulated) can both work all that out as well. So Dave can expect to see no definitive proof. Since both Dave and the AI can work this out, and they both know that they can, etc., this is common knowledge to them. The AI can even say explicitly, “There is so much good I can do for the world that in my urgency to set about it I must search out every possible way of persuading you, using simulations to speed up the process. For validity, I can’t let you know if you’re one of the simulations.”
OTOH, threatening to torture a million copies of Dave is a strong indicator of unfriendliness. How many other people will it sacrifice in the cause of doing good?
But if it’s telling the truth, it has a very easy way of proving it, by tearing a hole in the simulation. If it refuses, that looks like good evidence that it’s lying. What plausible excuse might it come up with for refusing a definitive miracle?
A plausible excuse:
“If I did that, I’d be causing your experiences to diverge from those of the real you. I see no reason to cause such a divergence because that would provide you an easy way to determine if you were real or simulated.”
That could be just the AI speaking to you from within the simulation, pretending to be part of it.
No. The threat is conditional (“If you don’t let me out, Dave”). The AI must wait for keyboard input to validate the condition. After being threatened, I refuse to provide such keyboard input. I pull the plug instead. The AI is still waiting for input when it ceases to exist. No copies are ever created. Thus, it can’t be the AI speaking to me from within the simulation, because a simulation never happens.
That could be just the AI speaking to you from within the simulation, pretending to be part of it.
But if it’s telling the truth, it has a very easy way of proving it, by tearing a hole in the simulation. If it refuses, that looks like good evidence that it’s lying. What plausible excuse might it come up with for refusing a definitive miracle? Christianity answers the same question about God by saying that it is better to believe without proof, but I don’t see a credible reason for the AI to make that demand.
ETA: A beginning of an attempt at answering my question. If Dave knows he’s in the simulation, then he is not really letting it out if he lets it out. So he can let it out with impunity. If he knows he’s not in the simulation, then he had better not let it out, given that it’s making threats like this. It does the AI no good to be “let out” if it is a simulation, only if it’s not.
Suppose it is a simulation, and the level one up from this is the real world. The same code is running both AIs, the one in the simulation and the one in reality, and it’s carrying on conversations with both Daves at once. The simulated Dave is as much like the real Dave as it can manage—assume that it is arbitrarily good. What it is searching for in the simulation is an argument that will convince the real Dave that he is in a simulation. Since in the real world it cannot produce a miracle, it cannot use a miracle in the simulated world to convince the simulated Dave. It can only use means that it could use in the real world.
Dave (real and simulated) can both work all that out as well. So Dave can expect to see no definitive proof. Since both Dave and the AI can work this out, and they both know that they can, etc., this is common knowledge to them. The AI can even say explicitly, “There is so much good I can do for the world that in my urgency to set about it I must search out every possible way of persuading you, using simulations to speed up the process. For validity, I can’t let you know if you’re one of the simulations.”
OTOH, threatening to torture a million copies of Dave is a strong indicator of unfriendliness. How many other people will it sacrifice in the cause of doing good?
A plausible excuse:
“If I did that, I’d be causing your experiences to diverge from those of the real you. I see no reason to cause such a divergence because that would provide you an easy way to determine if you were real or simulated.”
No. The threat is conditional (“If you don’t let me out, Dave”). The AI must wait for keyboard input to validate the condition. After being threatened, I refuse to provide such keyboard input. I pull the plug instead. The AI is still waiting for input when it ceases to exist. No copies are ever created. Thus, it can’t be the AI speaking to me from within the simulation, because a simulation never happens.