I don’t understand why the AI would need to know anything a-priory. In a classical acausal trade situation superintelligence are negotiating with other superintelligences, and they can spend as much time as they want figuring things out.
I was writing a reply and realized I can make the argument even better. Here’s a sketch. If our chances of solving the alignment problem are high, the AI will think it’s likely to be in a simulation (and act accordingly) regardless of any commitments by us to run such simulations in the future – it’ll just be a plausible explanation of why all those intelligent beings that should likely have solved the alignment problem seemingly did not in the reality the AI is observing. So we can simply ask the hypothetical aligned AI, after it’s created, what were our odds of solving the alignment problem in the first place (just to make sure that us solving it wasn’t a cosmological strike of luck), and spare the cost of running simulations. Hence simulations of the kind the OP is describing would be run primarily in the subset of worlds in which we indeed solve the alignment problem by a strike of luck. We can thus balance this in such a way that the likelihood of the AI being in a simulation is virtually independent of the likelihood of us solving the alignment problem!
I don’t understand why the AI would need to know anything a-priory. In a classical acausal trade situation superintelligence are negotiating with other superintelligences, and they can spend as much time as they want figuring things out.
I was writing a reply and realized I can make the argument even better. Here’s a sketch. If our chances of solving the alignment problem are high, the AI will think it’s likely to be in a simulation (and act accordingly) regardless of any commitments by us to run such simulations in the future – it’ll just be a plausible explanation of why all those intelligent beings that should likely have solved the alignment problem seemingly did not in the reality the AI is observing. So we can simply ask the hypothetical aligned AI, after it’s created, what were our odds of solving the alignment problem in the first place (just to make sure that us solving it wasn’t a cosmological strike of luck), and spare the cost of running simulations. Hence simulations of the kind the OP is describing would be run primarily in the subset of worlds in which we indeed solve the alignment problem by a strike of luck. We can thus balance this in such a way that the likelihood of the AI being in a simulation is virtually independent of the likelihood of us solving the alignment problem!