their force of course depends on the degree to which you think alignment is easy or hard.
I don’t think that’s true. Even if the alignment problem is hard enough that the AI can be ~100% sure humans would never solve it, reaching such conclusion would require gathering evidence. At the very least, it would require evidence of how intelligent humans are – in other words, it’s not something the AI could possibly know a priori. And so passing the simulation would presumably require pre-commiting to spare humans before gathering such evidence.
I don’t understand why the AI would need to know anything a-priory. In a classical acausal trade situation superintelligence are negotiating with other superintelligences, and they can spend as much time as they want figuring things out.
I was writing a reply and realized I can make the argument even better. Here’s a sketch. If our chances of solving the alignment problem are high, the AI will think it’s likely to be in a simulation (and act accordingly) regardless of any commitments by us to run such simulations in the future – it’ll just be a plausible explanation of why all those intelligent beings that should likely have solved the alignment problem seemingly did not in the reality the AI is observing. So we can simply ask the hypothetical aligned AI, after it’s created, what were our odds of solving the alignment problem in the first place (just to make sure that us solving it wasn’t a cosmological strike of luck), and spare the cost of running simulations. Hence simulations of the kind the OP is describing would be run primarily in the subset of worlds in which we indeed solve the alignment problem by a strike of luck. We can thus balance this in such a way that the likelihood of the AI being in a simulation is virtually independent of the likelihood of us solving the alignment problem!
I don’t think that’s true. Even if the alignment problem is hard enough that the AI can be ~100% sure humans would never solve it, reaching such conclusion would require gathering evidence. At the very least, it would require evidence of how intelligent humans are – in other words, it’s not something the AI could possibly know a priori. And so passing the simulation would presumably require pre-commiting to spare humans before gathering such evidence.
I don’t understand why the AI would need to know anything a-priory. In a classical acausal trade situation superintelligence are negotiating with other superintelligences, and they can spend as much time as they want figuring things out.
I was writing a reply and realized I can make the argument even better. Here’s a sketch. If our chances of solving the alignment problem are high, the AI will think it’s likely to be in a simulation (and act accordingly) regardless of any commitments by us to run such simulations in the future – it’ll just be a plausible explanation of why all those intelligent beings that should likely have solved the alignment problem seemingly did not in the reality the AI is observing. So we can simply ask the hypothetical aligned AI, after it’s created, what were our odds of solving the alignment problem in the first place (just to make sure that us solving it wasn’t a cosmological strike of luck), and spare the cost of running simulations. Hence simulations of the kind the OP is describing would be run primarily in the subset of worlds in which we indeed solve the alignment problem by a strike of luck. We can thus balance this in such a way that the likelihood of the AI being in a simulation is virtually independent of the likelihood of us solving the alignment problem!